sparseml.pytorch.datasets.detection package

Submodules

sparseml.pytorch.datasets.detection.coco module

class sparseml.pytorch.datasets.detection.coco.CocoDetectionDataset(root: str = '~/.cache/nm_datasets/coco-detection', train: bool = False, rand_trans: bool = False, download: bool = True, year: str = '2017', image_size: int = 300, preprocessing_type: Optional[str] = None, default_boxes: Optional[sparseml.pytorch.utils.ssd_helpers.DefaultBoxes] = None)[source]

Bases: object

Wrapper for the Coco Detection dataset to apply standard transforms for input to detection models. Will return the processed image along with a tuple of its bounding boxes in ltrb format and labels for each box.

If a DefaultBoxes object is provided, then will encode the box and labels using that object returning a tensor of offsets to the default boxes and labels for those boxes and return a three item tuple of the encoded boxes, labels, and their original values.

Parameters
  • root – The root folder to find the dataset at, if not found will download here if download=True

  • train – True if this is for the training distribution, False for the validation

  • rand_trans – True to apply RandomCrop and RandomHorizontalFlip to the data, False otherwise

  • download – True to download the dataset, False otherwise.

  • year – The dataset year, supports years 2014, 2015, and 2017.

  • image_size – the size of the image to output from the dataset

  • preprocessing_type – Type of standard pre-processing to perform. Options are ‘yolo’, ‘ssd’, or None. None defaults to just image normalization with no extra processing of bounding boxes.

  • default_boxes – DefaultBoxes object used to encode bounding boxes and label for model loss computation for SSD models. Only used when preprocessing_type= ‘ssd’. Default object represents the default boxes used in standard SSD 300 implementation.

property default_boxes

The DefaultBoxes object used to encode this datasets bounding boxes

Type

return

sparseml.pytorch.datasets.detection.coco.coco_2017_yolo(root: str = '~/.cache/nm_datasets/coco-detection', train: bool = False, rand_trans: bool = False, download: bool = True, year: str = '2017', image_size: int = 640, preprocessing_type: str = 'yolo')[source]

Wrapper for COCO detection dataset with Dataset Registry values properly created for a Yolo model trained on 80 classes.

Parameters
  • root – The root folder to find the dataset at, if not found will download here if download=True

  • train – True if this is for the training distribution, False for the validation

  • rand_trans – True to apply RandomCrop and RandomHorizontalFlip to the data, False otherwise

  • download – True to download the dataset, False otherwise.

  • year – Only valid option is 2017. default is 2017.

  • image_size – the size of the image to output from the dataset

  • preprocessing_type – Type of standard pre-processing to perform. Only valid option is ‘yolo’. Default is ‘yolo’

sparseml.pytorch.datasets.detection.helpers module

Helper classes and functions for PyTorch detection data loaders

class sparseml.pytorch.datasets.detection.helpers.AnnotatedImageTransforms(transforms: List)[source]

Bases: object

Class for chaining transforms that take two parameters (images and annotations for object detection).

Parameters

transforms – List of transformations that take an image and annotation as their parameters.

property transforms

a list of the transforms performed by this object

Type

return

sparseml.pytorch.datasets.detection.helpers.bounding_box_and_labels_to_yolo_fmt(annotations)[source]
sparseml.pytorch.datasets.detection.helpers.random_horizontal_flip_image_and_annotations(image: PIL.Image.Image, annotations: Tuple[torch.Tensor, torch.Tensor], p: float = 0.5)Tuple[PIL.Image.Image, Tuple[torch.Tensor, torch.Tensor]][source]

Performs a horizontal flip on given image and bounding boxes with probability p.

Parameters
  • image – the image to randomly flip

  • annotations – a tuple of bounding boxes and their labels for this image

  • p – the probability to flip with. Default is 0.5

Returns

A tuple of the randomly flipped image and annotations

sparseml.pytorch.datasets.detection.helpers.ssd_collate_fn(batch: List[Any])Tuple[torch.Tensor, Tuple[torch.Tensor, torch.Tensor, List[Tuple[torch.Tensor, torch.Tensor]]]][source]

Collate function to be used for creating a DataLoader with values transformed by encode_annotation_bounding_boxes.

Parameters

batch – a batch of data points transformed by encode_annotation_bounding_boxes

Returns

the batch stacked as tensors for all values except for the original annotations

sparseml.pytorch.datasets.detection.helpers.ssd_random_crop_image_and_annotations(image: PIL.Image.Image, annotations: Tuple[torch.Tensor, torch.Tensor])Tuple[PIL.Image.Image, Tuple[torch.Tensor, torch.Tensor]][source]

Wraps sparseml.pytorch.utils.ssd_random_crop to work in the AnnotatedImageTransforms pipeline.

Parameters
  • image – the image to crop

  • annotations – a tuple of bounding boxes and their labels for this image

Returns

A tuple of the cropped image and annotations

sparseml.pytorch.datasets.detection.helpers.yolo_collate_fn(batch: List[Any])Tuple[torch.Tensor, Tuple[torch.Tensor, torch.Tensor, List[Tuple[torch.Tensor, torch.Tensor]]]][source]

Collate function to be used for creating a DataLoader with values for Yolo model input.

Parameters

batch – a batch of data points and annotations transformed by bounding_box_and_labels_to_yolo_fmt

Returns

the batch stacked as tensors for all values except for the original annotations

sparseml.pytorch.datasets.detection.voc module

VOC dataset implementations for the object detection field in computer vision. More info for the dataset can be found here.

class sparseml.pytorch.datasets.detection.voc.VOCDetectionDataset(root: str = '~/.cache/nm_datasets/voc-detection', train: bool = True, rand_trans: bool = False, download: bool = True, year: str = '2012', image_size: int = 300, preprocessing_type: Optional[str] = None, default_boxes: Optional[sparseml.pytorch.utils.ssd_helpers.DefaultBoxes] = None)[source]

Bases: torchvision.datasets.voc.VOCDetection

Wrapper for the VOC Detection dataset to apply standard transforms for input to detection models. Will return the processed image along with a tuple of its bounding boxes in ltrb format and labels for each box.

If a DefaultBoxes object is provided, then will encode the box and labels using that object returning a tensor of offsets to the default boxes and labels for those boxes and return a three item tuple of the encoded boxes, labels, and their original values.

Parameters
  • root – The root folder to find the dataset at, if not found will download here if download=True

  • train – True if this is for the training distribution, False for the validation

  • rand_trans – True to apply RandomCrop and RandomHorizontalFlip to the data, False otherwise

  • download – True to download the dataset, False otherwise. Base implementation does not support leaving as false if already downloaded

  • image_size – the size of the image to output from the dataset

  • preprocessing_type – Type of standard pre-processing to perform. Options are ‘yolo’, ‘ssd’, or None. None defaults to just image normalization with no extra processing of bounding boxes.

  • default_boxes – DefaultBoxes object used to encode bounding boxes and label for model loss computation for SSD models. Only used when preprocessing_type= ‘ssd’. Default object represents the default boxes used in standard SSD 300 implementation.

property default_boxes

The DefaultBoxes object used to encode this datasets bounding boxes

Type

return

class sparseml.pytorch.datasets.detection.voc.VOCSegmentationDataset(root: str = '~/.cache/nm_datasets/voc-segmentation', train: bool = True, rand_trans: bool = False, download: bool = True, year: str = '2012', image_size: int = 300)[source]

Bases: torchvision.datasets.voc.VOCSegmentation

Wrapper for the VOC Segmentation dataset to apply standard transforms.

Parameters
  • root – The root folder to find the dataset at, if not found will download here if download=True

  • train – True if this is for the training distribution, False for the validation

  • rand_trans – True to apply RandomCrop and RandomHorizontalFlip to the data, False otherwise

  • download – True to download the dataset, False otherwise.

  • year – The dataset year, supports years 2007 to 2012.

  • image_size – the size of the image to output from the dataset

Module contents

Datasets related to object detection field in computer vision