Skip to main content

sanitize_data

FunctionSource

sanitized_data: Annotations = sanitize_data(
data: Annotations | Sequence[th.AnnoItem],
deduplicate: "add" | "drop" = "drop",
)

Perform the full sanitization on the annotation data.

The sanitization will ensure that:

  1. The current timestamp is the time when this method is used.
  2. All items in data are dictionaries typed by AnnoItem.
  3. Any item with a None comment will be sanitized as an item without the comment.
  4. The position of the annotation mark will always have positive width and height. (Negative values means that the starting location is reversed.)
  5. Annotation items will be deduplicated, i.e. the IDs will be sanitized.

This method can be used when the annotations need to be saved as a file. It may take time to run, so it may not be suitable to sanitize the annotations in the real time.

Aliases

This function can be acquired by

import dash_picture_annotation as dpa


dpa.sanitize_data
dpa.utilities.sanitize_data

Arguments

Requires

ArgumentTypeRequired
Description
dataAnnotations | [AnnoItem]The annotation data that will be sanitized. Note that this method will not change the input data.
deduplicate"add" | "drop"The deduplicate method for the annotation IDs. "add" means that preserving the duplicated ID by adding a postfix. "drop" means that dropping all annotation items with duplicated IDs after the first found item.

Returns

ArgumentType
Description
sanitized_dataAnnotationsThe sanitized copy of the data.

Examples

Sanitize a collection of data (drop mode)

The following codes will sanitize the data, but all data items with repeated IDs will be dropped. For any repeated ID, only the first occuring item will be preserved.

sanitize_data_drop.py
import json
import dash_picture_annotation as dpa


with open("./data-input.json", "r") as fobj:
data = json.load(fobj)

with open("./sanitized-data.json", "w") as fobj:
json.dump(dpa.sanitize_data(data), fobj, indent=2, ensure_ascii=False)

Sanitize a collection of data (add mode)

The following codes will sanitize the data and preserve all data items. For any item with a repeated ID, will add a random postfix to deduplicate it.

sanitize_data_add.py
import json
import dash_picture_annotation as dpa


with open("./data-input.json", "r") as fobj:
data = json.load(fobj)

with open("./sanitized-data.json", "w") as fobj:
json.dump(dpa.sanitize_data(data, "add"), fobj, indent=2, ensure_ascii=False)