sparseml.tensorflow_v1.datasets package

Submodules

sparseml.tensorflow_v1.datasets.dataset module

General dataset implementations for TensorFlow

class sparseml.tensorflow_v1.datasets.dataset. Dataset [source]

Bases: object

Generic dataset implementation for TensorFlow. Expected to work with the tf.data APIs

build ( batch_size : int , repeat_count : Optional [ int ] = None , shuffle_buffer_size : Optional [ int ] = None , prefetch_buffer_size : Optional [ int ] = None , num_parallel_calls : Optional [ int ] = None ) tensorflow.python.data.ops.dataset_ops.DatasetV1 [source]

Create the dataset in the current graph using tf.data APIs

Parameters
  • batch_size – the batch size to create the dataset for

  • repeat_count – the number of times to repeat the dataset, if unset or None, will repeat indefinitely

  • shuffle_buffer_size – None if not shuffling, otherwise the size of the buffer to use for shuffling data

  • prefetch_buffer_size – None if not prefetching, otherwise the size of the buffer to use for buffering

  • num_parallel_calls – the number of parallel calls to run the processor function with

Returns

a tf.data.Dataset instance

build_input_fn ( batch_size : int , repeat_count : Optional [ int ] = None , shuffle_buffer_size : Optional [ int ] = None , prefetch_buffer_size : Optional [ int ] = None , num_parallel_calls : Optional [ int ] = None ) Callable [ Tuple [ Dict [ str , tensorflow.python.framework.ops.Tensor ] , Dict [ str , tensorflow.python.framework.ops.Tensor ] ] ] [source]

Create an input_fn to be used with Estimators. Invocation of the input_fn will create the dataset in the current graph as well as return a tuple containing (a dictionary of feature tensors, a dictionary of label tensors).

Parameters
  • batch_size – the batch size to create the dataset for

  • repeat_count – the number of times to repeat the dataset, if unset or None, will repeat indefinitely

  • shuffle_buffer_size – None if not shuffling, otherwise the size of the buffer to use for shuffling data

  • prefetch_buffer_size – None if not prefetching, otherwise the size of the buffer to use for buffering

  • num_parallel_calls – the number of parallel calls to run the processor function with

Returns

a callable representing the input_fn for an Estimator

abstract creator ( ) tensorflow.python.data.ops.dataset_ops.DatasetV1 [source]

Implemented by sub classes to create a tf.data dataset for the given impl.

Returns

a created tf.data dataset

abstract format_iterator_batch ( iter_batch : Tuple [ tensorflow.python.framework.ops.Tensor , ] ) Tuple [ Dict [ str , tensorflow.python.framework.ops.Tensor ] , Dict [ str , tensorflow.python.framework.ops.Tensor ] ] [source]

Implemented by sub classes to parse the output from make_one_shot_iterator into a features and labels dict to be used with Estimators

Parameters

iter_batch – the batch ref returned from the iterator

Returns

a tuple containing (a dictionary of feature tensors, a dictionary of label tensors)

abstract name_scope ( ) str [source]

Implemented by sub classes to get a name scope for building the dataset in the graph

Returns

the name scope the dataset should be built under in the graph

abstract processor ( * args , ** kwargs ) [source]

Implemented by sub classes to parallelize and map processing functions for loading the data of the dataset into memory.

Parameters
  • args – generic inputs for processing

  • kwargs – generic inputs for processing

Returns

the processed tensors

sparseml.tensorflow_v1.datasets.dataset. create_split_iterators_handle ( split_datasets : Iterable ) Tuple [ Any , Any , List ] [source]

Create an iterators handle for switching between datasets easily while training.

Parameters

split_datasets – the datasets to create the splits and handle for

Returns

a tuple containing the handle that should be set with a feed dict, the iterator used to get the next batch, and a list of the iterators created from the split_datasets

sparseml.tensorflow_v1.datasets.helpers module

General utilities for dataset implementations for TensorFlow

sparseml.tensorflow_v1.datasets.helpers. center_square_crop ( padding : int = 0 , name : str = 'center_square_crop' ) [source]

Take a square crop centered in the a image

Parameters
  • padding – additional padding to apply to all sides of the image to crop away

  • name – name for the scope to put the ops under

Returns

the callable function for square crop op, takes in the image and outputs the cropped image

sparseml.tensorflow_v1.datasets.helpers. random_scaling_crop ( scale_range : Tuple [ int , int ] = (0.08, 1.0) , ratio_range : Tuple [ int , int ] = (0.75, 1.3333333333333333) , name : str = 'random_scaling_crop' ) [source]

Random crop implementation which also randomly scales the crop taken as well as the aspect ratio of the crop.

Parameters
  • scale_range – the (min, max) of the crop scales to take from the orig image

  • ratio_range – the (min, max) of the aspect ratios to take from the orig image

  • name – name for the scope to put the ops under

Returns

the callable function for random scaling crop op, takes in the image and outputs randomly cropped image

sparseml.tensorflow_v1.datasets.helpers. resize ( image_size : Tuple [ int , int ] , name : str = 'resize' ) [source]

Resize an image tensor to the desired size

Parameters
  • image_size – a tuple containing the height, width to resize to

  • name – name for the scope to put the ops under

Returns

the callable function for resize op, takes in the image and outputs the resized image

sparseml.tensorflow_v1.datasets.registry module

Code related to the TensorFlow dataset registry for easily creating datasets.

class sparseml.tensorflow_v1.datasets.registry. DatasetRegistry [source]

Bases: object

Registry class for creating datasets

static attributes ( key : str ) Dict [ str , Any ] [source]
Parameters

key – the dataset key (name) to create

Returns

the specified attributes for the dataset

static create ( key : str , * args , ** kwargs ) [source]

Create a new dataset for the given key

Parameters

key – the dataset key (name) to create

Returns

the instantiated model

static register ( key : Union [ str , List [ str ] ] , attributes : Dict [ str , Any ] ) [source]

Register a dataset with the registry. Should be used as a decorator

Parameters
  • key – the model key (name) to create

  • attributes – the specified attributes for the dataset

Returns

the decorator

Module contents

Code for creating and loading datasets in TensorFlow