sparseml.tensorflow_v1.datasets package

Submodules

sparseml.tensorflow_v1.datasets.dataset module

General dataset implementations for TensorFlow

class sparseml.tensorflow_v1.datasets.dataset.Dataset[source]

Bases: object

Generic dataset implementation for TensorFlow. Expected to work with the tf.data APIs

build(batch_size: int, repeat_count: Optional[int] = None, shuffle_buffer_size: Optional[int] = None, prefetch_buffer_size: Optional[int] = None, num_parallel_calls: Optional[int] = None)tensorflow.python.data.ops.dataset_ops.DatasetV1[source]

Create the dataset in the current graph using tf.data APIs

Parameters
  • batch_size – the batch size to create the dataset for

  • repeat_count – the number of times to repeat the dataset, if unset or None, will repeat indefinitely

  • shuffle_buffer_size – None if not shuffling, otherwise the size of the buffer to use for shuffling data

  • prefetch_buffer_size – None if not prefetching, otherwise the size of the buffer to use for buffering

  • num_parallel_calls – the number of parallel calls to run the processor function with

Returns

a tf.data.Dataset instance

build_input_fn(batch_size: int, repeat_count: Optional[int] = None, shuffle_buffer_size: Optional[int] = None, prefetch_buffer_size: Optional[int] = None, num_parallel_calls: Optional[int] = None)Callable[Tuple[Dict[str, tensorflow.python.framework.ops.Tensor], Dict[str, tensorflow.python.framework.ops.Tensor]]][source]

Create an input_fn to be used with Estimators. Invocation of the input_fn will create the dataset in the current graph as well as return a tuple containing (a dictionary of feature tensors, a dictionary of label tensors).

Parameters
  • batch_size – the batch size to create the dataset for

  • repeat_count – the number of times to repeat the dataset, if unset or None, will repeat indefinitely

  • shuffle_buffer_size – None if not shuffling, otherwise the size of the buffer to use for shuffling data

  • prefetch_buffer_size – None if not prefetching, otherwise the size of the buffer to use for buffering

  • num_parallel_calls – the number of parallel calls to run the processor function with

Returns

a callable representing the input_fn for an Estimator

abstract creator()tensorflow.python.data.ops.dataset_ops.DatasetV1[source]

Implemented by sub classes to create a tf.data dataset for the given impl.

Returns

a created tf.data dataset

abstract format_iterator_batch(iter_batch: Tuple[tensorflow.python.framework.ops.Tensor, ])Tuple[Dict[str, tensorflow.python.framework.ops.Tensor], Dict[str, tensorflow.python.framework.ops.Tensor]][source]

Implemented by sub classes to parse the output from make_one_shot_iterator into a features and labels dict to be used with Estimators

Parameters

iter_batch – the batch ref returned from the iterator

Returns

a tuple containing (a dictionary of feature tensors, a dictionary of label tensors)

abstract name_scope()str[source]

Implemented by sub classes to get a name scope for building the dataset in the graph

Returns

the name scope the dataset should be built under in the graph

abstract processor(*args, **kwargs)[source]

Implemented by sub classes to parallelize and map processing functions for loading the data of the dataset into memory.

Parameters
  • args – generic inputs for processing

  • kwargs – generic inputs for processing

Returns

the processed tensors

sparseml.tensorflow_v1.datasets.dataset.create_split_iterators_handle(split_datasets: Iterable)Tuple[Any, Any, List][source]

Create an iterators handle for switching between datasets easily while training.

Parameters

split_datasets – the datasets to create the splits and handle for

Returns

a tuple containing the handle that should be set with a feed dict, the iterator used to get the next batch, and a list of the iterators created from the split_datasets

sparseml.tensorflow_v1.datasets.helpers module

General utilities for dataset implementations for TensorFlow

sparseml.tensorflow_v1.datasets.helpers.center_square_crop(padding: int = 0, name: str = 'center_square_crop')[source]

Take a square crop centered in the a image

Parameters
  • padding – additional padding to apply to all sides of the image to crop away

  • name – name for the scope to put the ops under

Returns

the callable function for square crop op, takes in the image and outputs the cropped image

sparseml.tensorflow_v1.datasets.helpers.random_scaling_crop(scale_range: Tuple[int, int] = (0.08, 1.0), ratio_range: Tuple[int, int] = (0.75, 1.3333333333333333), name: str = 'random_scaling_crop')[source]

Random crop implementation which also randomly scales the crop taken as well as the aspect ratio of the crop.

Parameters
  • scale_range – the (min, max) of the crop scales to take from the orig image

  • ratio_range – the (min, max) of the aspect ratios to take from the orig image

  • name – name for the scope to put the ops under

Returns

the callable function for random scaling crop op, takes in the image and outputs randomly cropped image

sparseml.tensorflow_v1.datasets.helpers.resize(image_size: Tuple[int, int], name: str = 'resize')[source]

Resize an image tensor to the desired size

Parameters
  • image_size – a tuple containing the height, width to resize to

  • name – name for the scope to put the ops under

Returns

the callable function for resize op, takes in the image and outputs the resized image

sparseml.tensorflow_v1.datasets.registry module

Code related to the TensorFlow dataset registry for easily creating datasets.

class sparseml.tensorflow_v1.datasets.registry.DatasetRegistry[source]

Bases: object

Registry class for creating datasets

static attributes(key: str)Dict[str, Any][source]
Parameters

key – the dataset key (name) to create

Returns

the specified attributes for the dataset

static create(key: str, *args, **kwargs)[source]

Create a new dataset for the given key

Parameters

key – the dataset key (name) to create

Returns

the instantiated model

static register(key: Union[str, List[str]], attributes: Dict[str, Any])[source]

Register a dataset with the registry. Should be used as a decorator

Parameters
  • key – the model key (name) to create

  • attributes – the specified attributes for the dataset

Returns

the decorator

Module contents

Code for creating and loading datasets in TensorFlow