deepsparse package
Subpackages
Submodules
deepsparse.benchmark module
deepsparse.cpu module
Functionality for detecting the details of the currently available cpu
-
class
deepsparse.cpu.
architecture
( * args , ** kwargs ) [source] -
Bases:
dict
A class containing all the architecture details for the current CPU.
- Members include (but are not limited to):
-
vendor - a string name of vendor) isa - a string containing avx2, avx512 or unknown) vnni - a boolean indicating VNNI support) num_sockets - integer number of physical sockets available_sockets - integer number of sockets available for use cores_per_socket - integer number of physical cores per socket available_cores_per_socket - integer number of available cores per socket threads_per_core - integer physical number of threads per core available_threads_per_core - integer available number of threads per core L1_instruction_cache_size - L1 instruction cache size in bytes L1_data_cache_size - L1 data cache size in bytes L2_cache_size - L2 cache size in bytes L3_cache_size - L3 cache size in bytes
-
property
num_available_physical_cores
-
the total number of cores available on the current machine
- Type
-
return
-
property
num_physical_cores
-
the total number of cores on the current machine
- Type
-
return
-
property
num_threads
-
the total number of hyperthreads on the current machine
- Type
-
return
-
override_isa
( value : str ) [source] -
Set the isa to the desired value.
- Parameters
-
value – the value to update the isa to
-
property
threads_per_socket
-
the number of hyperthreads available per socket on the current machine
- Type
-
return
-
deepsparse.cpu.
cpu_architecture
( ) → deepsparse.cpu.architecture [source] -
Detect the CPU details on linux systems If any other OS is used, an exception will be raised.
- Specifically:
-
-
the number of physical cores available per socket on the system
-
detects the vector instruction set available (avx2, avx512)
-
if vnni is available
-
NM_ARCH environment variable can be used to override the instruction set detection
- Returns
-
an instance of the architecture class
-
deepsparse.cpu.
cpu_avx2_compatible
( ) → bool [source] -
- Returns
-
True if the current cpu has the AVX2 or AVX512 instruction sets, used for running neural networks performantly (if AVX2 only then less performant compared to strictly AVX512)
-
deepsparse.cpu.
cpu_avx512_compatible
( ) → bool [source] -
- Returns
-
True if the current cpu has the AVX512 instruction set, used for running neural networks performantly
-
deepsparse.cpu.
cpu_details
( ) → Tuple [ int , str , bool ] [source] -
Detect the CPU details on linux systems If any other OS is used, will raise an exception
- Specifically:
-
-
the number of physical cores available on the system
-
detects the vector instruction set available (avx2, avx512)
-
if vnni is available
-
NM_ARCH environment variable can be used to override the avx instruction set detection
- Returns
-
a tuple containing the detected cpu information (number of physical cores available, avx instruction set, vnni support)
-
deepsparse.cpu.
cpu_neon_compatible
( ) → bool [source] -
- Returns
-
True if the current cpu has the NEON instruction set, used for running neural networks performantly
-
deepsparse.cpu.
cpu_quantization_compatible
( ) → bool [source] -
- Returns
-
True if the current cpu has the AVX2, AVX512, NEON or SVE instruction sets, used for running quantized neural networks performantly. (AVX2 < AVX512 < VNNI)
-
deepsparse.cpu.
cpu_sve_compatible
( ) → bool [source] -
- Returns
-
True if the current cpu has the SVE instruction set, used for running neural networks performantly
deepsparse.engine module
Code related to interfacing with a Neural Network in the DeepSparse Engine using python
-
class
deepsparse.engine.
Context
( num_cores : Optional [ int ] = None , num_streams : Optional [ int ] = None ) [source] -
Bases:
object
Contexts can be used to run multiple instances of the MultiModelEngine with the same scheduler. This allows one scheduler to manage the resources of the system effectively, keeping engines that are running different models from fighting over system resources.
- Parameters
-
-
num_cores – The number of physical cores to run the model on. If more cores are requested than are available on a single socket, the engine will try to distribute them evenly across as few sockets as possible.
-
num_streams – The max number of requests the model can handle concurrently.
-
-
property
num_cores
-
property
num_streams
-
property
scheduler
-
property
value
-
class
deepsparse.engine.
Engine
( model : Union [ str , sparsezoo.objects.model.Model , sparsezoo.objects.file.File ] , batch_size : int , num_cores : Optional [ int ] = None , num_streams : Optional [ int ] = None , scheduler : Optional [ deepsparse.engine.Scheduler ] = None , input_shapes : Optional [ List [ List [ int ] ] ] = None ) [source] -
Bases:
object
Create a new DeepSparse Engine that compiles the given onnx file for GPU class performance on commodity CPUs.
Note 1: Engines are compiled for a specific batch size and for a specific number of CPU cores.
Example:# create an engine for batch size 1 on all available coresengine = Engine(“path/to/onnx”, batch_size=1, num_cores=None)- Parameters
-
-
model – Either a path to the model’s onnx file, a SparseZoo model stub prefixed by ‘zoo:’, a SparseZoo Model object, or a SparseZoo ONNX File object that defines the neural network
-
batch_size – The batch size of the inputs to be used with the engine
-
num_cores – The number of physical cores to run the model on. If more cores are requested than are available on a single socket, the engine will try to distribute them evenly across as few sockets as possible.
-
num_streams – The max number of requests the model can handle concurrently.
-
scheduler – The kind of scheduler to execute with. Pass None for the default.
-
input_shapes – The list of shapes to set the inputs to. Pass None to use model as-is.
-
-
property
batch_size
-
The batch size of the inputs to be used with the model
- Type
-
return
-
benchmark
( inp : List [ numpy.ndarray ] , num_iterations : int = 20 , num_warmup_iterations : int = 5 , include_inputs : bool = False , include_outputs : bool = False , show_progress : bool = False ) → deepsparse.benchmark.results.BenchmarkResults [source] -
A convenience function for quickly benchmarking the instantiated model on a given input in the DeepSparse Engine. After executing, will return the summary statistics for benchmarking.
- Parameters
-
-
inp – The list of inputs to pass to the engine for benchmarking. The expected order is the inputs order as defined in the ONNX graph.
-
num_iterations – The number of iterations to run benchmarking for. Default is 20
-
num_warmup_iterations – T number of iterations to warm up engine before benchmarking. These executions will not be counted in the benchmark results that are returned. Useful and recommended to bring the system to a steady state. Default is 5
-
include_inputs – If True, inputs from forward passes during benchmarking will be added to the results. Default is False
-
include_outputs – If True, outputs from forward passes during benchmarking will be added to the results. Default is False
-
show_progress – If True, will display a progress bar. Default is False
-
- Returns
-
the results of benchmarking
-
benchmark_loader
( loader : Iterable [ List [ numpy.ndarray ] ] , num_iterations : int = 20 , num_warmup_iterations : int = 5 , include_inputs : bool = False , include_outputs : bool = False , show_progress : bool = False ) → deepsparse.benchmark.results.BenchmarkResults [source] -
A convenience function for quickly benchmarking the instantiated model on a give DataLoader in the DeepSparse Engine. After executing, will return the summary statistics for benchmarking.
- Parameters
-
-
loader – An iterator of inputs to pass to the engine for benchmarking. The expected order of each input is as defined in the ONNX graph.
-
num_iterations – The number of iterations to run benchmarking for. Default is 20
-
num_warmup_iterations – T number of iterations to warm up engine before benchmarking. These executions will not be counted in the benchmark results that are returned. Useful and recommended to bring the system to a steady state. Default is 5
-
include_inputs – If True, inputs from forward passes during benchmarking will be added to the results. Default is False
-
include_outputs – If True, outputs from forward passes during benchmarking will be added to the results. Default is False
-
show_progress – If True, will display a progress bar. Default is False
-
- Returns
-
the results of benchmarking
-
property
cpu_avx_type
-
The detected cpu avx type that neural magic is running with. One of {avx2, avx512}. AVX instructions give significant execution speedup with avx512 > avx2.
- Type
-
return
-
property
cpu_vnni
-
True if vnni support was detected on the cpu, False otherwise. VNNI gives performance benefits for quantized networks.
- Type
-
return
-
mapped_run
( inp : List [ numpy.ndarray ] , val_inp : bool = True ) → Dict [ str , numpy.ndarray ] [source] -
Run given inputs through the model for inference. Returns the result as a dictionary of numpy arrays corresponding to the output names of the model as defined in the ONNX graph.
Note 1: this function can add some a performance hit in certain cases. If using, please validate that you do not incur a performance hit by comparing with the regular run func
See @run for more details on specific setup for the inputs.
Example:engine = Engine(“path/to/onnx”, batch_size=1)inp = [numpy.random.rand(1, 3, 224, 224).astype(numpy.float32)]out = engine.mapped_run(inp)assert isinstance(out, Dict)- Parameters
-
-
inp – The list of inputs to pass to the engine for inference. The expected order is the inputs order as defined in the ONNX graph.
-
val_inp – Validate the input to the model to ensure numpy array inputs are setup correctly for the DeepSparse Engine
-
- Returns
-
The dictionary of outputs from the model after executing over the inputs
-
property
model_path
-
The local path to the model file the current instance was compiled from
- Type
-
return
-
property
num_cores
-
The number of physical cores the current instance is running on
- Type
-
return
-
property
num_streams
-
The max count of streams the current instance can handle concurrently.
- Type
-
return
-
run
( inp : List [ numpy.ndarray ] , val_inp : bool = True ) → List [ numpy.ndarray ] [source] -
Run given inputs through the model for inference. Returns the result as a list of numpy arrays corresponding to the outputs of the model as defined in the ONNX graph.
Note 1: the input dimensions must match what is defined in the ONNX graph. To avoid extra time in memory shuffles, the best use case is to format both the onnx and the input into channels first format; ex: [batch, height, width, channels] => [batch, channels, height, width]
Note 2: the input type for the numpy arrays must match what is defined in the ONNX graph. Generally float32 is most common, but int8 and int16 are used for certain layer and input types such as with quantized models.
Note 3: the numpy arrays must be contiguous in memory, use numpy.ascontiguousarray(array) to fix if not.
Example:engine = Engine(“path/to/onnx”, batch_size=1, num_cores=None)inp = [numpy.random.rand(1, 3, 224, 224).astype(numpy.float32)]out = engine.run(inp)assert isinstance(out, List)- Parameters
-
-
inp – The list of inputs to pass to the engine for inference. The expected order is the inputs order as defined in the ONNX graph.
-
val_inp – Validate the input to the model to ensure numpy array inputs are setup correctly for the DeepSparse Engine
-
- Returns
-
The list of outputs from the model after executing over the inputs
-
property
scheduler
-
The kind of scheduler to execute with
- Type
-
return
-
timed_run
( inp : List [ numpy.ndarray ] , val_inp : bool = False ) → Tuple [ List [ numpy.ndarray ] , float ] [source] -
Convenience method for timing a model inference run. Returns the result as a tuple containing (the outputs from @run, time take)
See @run for more details.
Example:engine = Engine(“path/to/onnx”, batch_size=1, num_cores=None)inp = [numpy.random.rand(1, 3, 224, 224).astype(numpy.float32)]out, time = engine.timed_run(inp)assert isinstance(out, List)assert isinstance(time, float)- Parameters
-
-
inp – The list of inputs to pass to the engine for inference. The expected order is the inputs order as defined in the ONNX graph.
-
val_inp – Validate the input to the model to ensure numpy array inputs are setup correctly for the DeepSparse Engine
-
- Returns
-
The list of outputs from the model after executing over the inputs
-
class
deepsparse.engine.
MultiModelEngine
( model : Union [ str , sparsezoo.objects.model.Model , sparsezoo.objects.file.File ] , batch_size : int , context : deepsparse.engine.Context , input_shapes : Optional [ List [ List [ int ] ] ] = None ) [source] -
Bases:
deepsparse.engine.Engine
The MultiModelEngine, together with the Context class, can be used to run multiple models on the same computer at once. The interface and behavior are both very similar to the Engine class. The main difference is instead of taking in a scheduler and a number of cores as arguments to the constructor, the MultiModelEngine takes in a Context. The Context contains a shared scheduler along with other runtime information that will be used across instances of the MultiModelEngine to provide optimal performance when running multiple models concurrently.
- Parameters
-
-
model – Either a path to the model’s onnx file, a SparseZoo model stub prefixed by ‘zoo:’, a SparseZoo Model object, or a SparseZoo ONNX File object that defines the neural network
-
batch_size – The batch size of the inputs to be used with the engine
-
context – See above. This object should be constructed with the desired number of cores and passed into each instance of the MultiModelEngine.
-
input_shapes – The list of shapes to set the inputs to. Pass None to use model as-is.
-
-
class
deepsparse.engine.
Scheduler
( value ) [source] -
Bases:
enum.Enum
Scheduler kinds to determine Engine execution strategy. For most synchronous cases, the default single_stream is recommended. For running a model server or parallel inferences, try multi_stream for maximum utilization of hardware.
-
default: maps to single_stream
-
single_stream: requests from separate threads execute serially
-
multi_stream: requests from separate threads execute in parallel
-
elastic: requests from separate threads are distributed across NUMA nodes
-
default
= 'single_stream'
-
elastic
= 'elastic'
-
multi_stream
= 'multi_stream'
-
single_stream
= 'single_stream'
-
-
deepsparse.engine.
analyze_model
( model : Union [ str , sparsezoo.objects.model.Model , sparsezoo.objects.file.File ] , inp : List [ numpy.ndarray ] , batch_size : int = 1 , num_cores : Optional [ int ] = None , num_iterations : int = 20 , num_warmup_iterations : int = 5 , optimization_level : int = 1 , imposed_as : Optional [ float ] = None , imposed_ks : Optional [ float ] = None , scheduler : Optional [ deepsparse.engine.Scheduler ] = None , input_shapes : Optional [ List [ List [ int ] ] ] = None ) → dict [source] -
Function to analyze a model’s performance in the DeepSparse Engine. The model must be defined in an ONNX graph and stored in a local file. Gives defaults of batch_size == 1 and num_cores == None (will use all physical cores available on a single socket).
Note 1: Analysis is currently only supported on a single socket.
- Parameters
-
-
model – Either a path to the model’s onnx file, a SparseZoo model stub prefixed by ‘zoo:’, a SparseZoo Model object, or a SparseZoo ONNX File object that defines the neural network graph definition to analyze
-
inp – The list of inputs to pass to the engine for analyzing inference. The expected order is the inputs order as defined in the ONNX graph.
-
batch_size – The batch size of the inputs to be used with the model
-
num_cores – The number of physical cores to run the model on. Pass None or 0 to run on the max number of cores for the current machine; default None
-
num_iterations – The number of times to repeat execution of the model while analyzing, default is 20
-
num_warmup_iterations – The number of times to repeat execution of the model before analyzing, default is 5
-
optimization_level – The amount of graph optimizations to perform. The current choices are either 0 (minimal) or 1 (all), default is 1
-
imposed_as – Imposed activation sparsity, defaults to None. Will force the activation sparsity from all ReLu layers in the graph to match this desired sparsity level (percentage of 0’s in the tensor). Beneficial for seeing how AS affects the performance of the model.
-
imposed_ks – Imposed kernel sparsity, defaults to None. Will force all prunable layers in the graph to have weights with this desired sparsity level (percentage of 0’s in the tensor). Beneficial for seeing how pruning affects the performance of the model.
-
scheduler – The kind of scheduler to execute with. Pass None for the default.
-
- Returns
-
the analysis structure containing the performance details of each layer
-
deepsparse.engine.
benchmark_model
( model : Union [ str , sparsezoo.objects.model.Model , sparsezoo.objects.file.File ] , inp : List [ numpy.ndarray ] , batch_size : int = 1 , num_cores : Optional [ int ] = None , num_streams : Optional [ int ] = None , num_iterations : int = 20 , num_warmup_iterations : int = 5 , include_inputs : bool = False , include_outputs : bool = False , show_progress : bool = False , scheduler : Optional [ deepsparse.engine.Scheduler ] = None , input_shapes : Optional [ List [ List [ int ] ] ] = None ) → deepsparse.benchmark.results.BenchmarkResults [source] -
Convenience function to benchmark a model in the DeepSparse Engine from an ONNX file for inference. Gives defaults of batch_size == 1 and num_cores == None (will use all physical cores available on a single socket).
Note 1: Benchmarking is currently only supported on a single socket.
- Parameters
-
-
model – Either a path to the model’s onnx file, a SparseZoo model stub prefixed by ‘zoo:’, a SparseZoo Model object, or a SparseZoo ONNX File object that defines the neural network
-
batch_size – The batch size of the inputs to be used with the model
-
num_cores – The number of physical cores to run the model on. Pass None or 0 to run on the max number of cores for the current machine; default None
-
num_streams – The max number of requests the model can handle concurrently. None or 0 implies a scheduler-defined default value; default None
-
inp – The list of inputs to pass to the engine for benchmarking. The expected order is the inputs order as defined in the ONNX graph.
-
num_iterations – The number of iterations to run benchmarking for. Default is 20
-
num_warmup_iterations – T number of iterations to warm up engine before benchmarking. These executions will not be counted in the benchmark results that are returned. Useful and recommended to bring the system to a steady state. Default is 5
-
include_inputs – If True, inputs from forward passes during benchmarking will be added to the results. Default is False
-
include_outputs – If True, outputs from forward passes during benchmarking will be added to the results. Default is False
-
show_progress – If True, will display a progress bar. Default is False
-
scheduler – The kind of scheduler to execute with. Pass None for the default.
-
- Returns
-
the results of benchmarking
-
deepsparse.engine.
compile_model
( model : Union [ str , sparsezoo.objects.model.Model , sparsezoo.objects.file.File ] , batch_size : int = 1 , num_cores : Optional [ int ] = None , num_streams : Optional [ int ] = None , scheduler : Optional [ deepsparse.engine.Scheduler ] = None , input_shapes : Optional [ List [ List [ int ] ] ] = None ) → deepsparse.engine.Engine [source] -
Convenience function to compile a model in the DeepSparse Engine from an ONNX file for inference. Gives defaults of batch_size == 1, num_cores == None (will use all physical cores available on the system).
- Parameters
-
-
model – Either a path to the model’s onnx file, a SparseZoo model stub prefixed by ‘zoo:’, a SparseZoo Model object, or a SparseZoo ONNX File object that defines the neural network
-
batch_size – The batch size of the inputs to be used with the model
-
num_cores – The number of physical cores to run the model on. Pass None or 0 to run on the max number of cores for the current machine; default None
-
num_streams – The max number of requests the model can handle concurrently. None or 0 implies a scheduler-defined default value; default None
-
scheduler – The kind of scheduler to execute with. Pass None for the default.
-
input_shapes – The list of shapes to set the inputs to. Pass None to use model as-is.
-
- Returns
-
The created Engine after compiling the model
deepsparse.generated-version module
deepsparse.generated_version module
deepsparse.version module
Functionality for storing and setting the version info for DeepSparse. If a file named ‘generated_version.py’ exists, read version info from there, otherwise fall back to defaults.
Module contents
The DeepSparse package used to achieve GPU class performance for Neural Networks on commodity CPUs.