Neural Magic LogoNeural Magic Logo
Products
menu-icon
Products
DeepSparse EngineSparseMLSparseZoo
User Guide
DeepSparse Engine
Diagnostics/Debugging

Logging Guidance for Diagnostics and Debugging

This page explains the diagnostics and debugging features available in DeepSparse Engine.

Unlike traditional software, debugging utilities available to the machine learning community are scarce. Complicated with deployment pipeline design issues, model weights, model architecture, and unoptimized models, debugging performance issues can be very dynamic in your data science ecosystem. Reviewing a log file can be your first line of defense in pinpointing performance issues with optimizing your inference.

The DeepSparse Engine ships with diagnostic logging so you can capture real-time monitoring information at model runtime and self-diagnose issues. If you are seeking technical support, we recommend capturing log information first, as described below. You can decide what to share, whether certain parts of the log or the entire content.

Note: Our logs may reveal your inference network’s macro-architecture, including a general list of operators (such as convolution and pooling) and connections between them. Weights, trained parameters, or dataset parameters will not be captured. Consult Neural Magic’s various legal policies at https://neuralmagic.com/legal/ which include our privacy statement and software agreements. Your use of the software serves as your consent to these practices.

Performance Tuning

An initial decision point to make in troubleshooting performance issues before enabling logs is whether to prevent threads from migrating from their cores. The default behavior is to disable thread binding (or pinning), allowing your OS to manage the allocation of threads to cores. There is a performance hit associated with this if the DeepSparseEngine is the main process running on your machine. If you want to enable thread binding for the possible performance benefit, set:

NM_BIND_THREADS_TO_CORES=1

Note 1: If the DeepSparse Engine is not the only major process running on your machine, binding threads may hurt performance of the other major process(es) by monopolizing system resources.

Note 2: If you use OpenMP or TBB (Thread Building Blocks) in your application, then enabling thread binding may result in severe performance degradation due to conflicts between Neural Magic thread pool and OpenMP/TBB thread pools.

Enabling Logs and Controlling the Amount of Logs Produced by the DeepSparse Engine

Logs are controlled by setting the NM_LOGGING_LEVEL environment variable.

Specify within your shell one of the following verbosity levels (in increasing order of verbosity:

fatal, error, warn, and diagnose with diagnose as a common default for all logs that will output to stderr:

1 NM_LOGGING_LEVEL=diagnose
2 export NM_LOGGING_LEVEL

Alternatively, you can output the logging level by

NM_LOGGING_LEVEL=diagnose <some command>

To enable diagnostic logs on a per-run basis, specify it manually before each script execution. For example, if you normally run:

python run_model.py

Then, to enable diagnostic logs, run:

NM_LOGGING_LEVEL=diagnose python run_model.py

To enable logging for your entire shell instance, execute within your shell:

export NM_LOGGING_LEVEL=diagnose

By default, logs will print out to the stderr of your process. If you would like to output to a file, add 2> <name_of_log>.txt to the end of your command.

Parsing an Example Log

If you want to see an example log with NM_LOGGING_LEVEL=diagnose, a truncated sample output is provided at the end of this guide. It will show a super_resolution network, where Neural Magic only supports running 70% of it.

Different portions of the log are explained below.

Viewing the Whole Graph

Once a model is in our system, it is parsed to determine what operations it contains. Each operation is made a node and assigned a unique number Its operation type is displayed:

1 Printing GraphViewer torch-jit-export:
2 Node 0: Conv
3 Node 1: Relu
4 Node 2: Conv
5 Node 3: Relu
6 Node 4: Conv
7 Node 5: Relu
8 Node 6: Conv
9 Node 7: Reshape
10 Node 8: Transpose
11 Node 9: Reshape

Finding Supported Nodes for Our Optimized Engine

After the whole graph is loaded in, nodes are analyzed to determine whether they are supported by our optimized runtime engine. Notable "unsupported" operators are indicated by looking for Unsupported [type of node] in the log. For example, this is an unsupported Reshape node that produces a 6D tensor:

[nm_ort 7f4fbbd3f740 >DIAGNOSE< unsupported /home/jdoe/code/nyann/src/onnxruntime_neuralmagic/supported/ops.cc:60] Unsupported Reshape , const shape greater than 5D

Compiling Each Subgraph

Once all the nodes are located that are supported within the optimized engine, the graphs are split into maximal subgraphs and each one is compiled. ​To find the start of each subgraph compilation, look for == Beginning new subgraph ==. First, the nodes are displayed in the subgraph: ​

1 Printing subgraph:
2 Node 0: Conv
3 Node 1: Relu
4 Node 2: Conv
5 Node 3: Relu
6 Node 4: Conv
7 Node 5: Relu
8 Node 6: Conv

Simplifications are then performed on the graph to get it in an ideal state for complex optimizations, which are logged:

1[nm_ort 7f4fbbd3f740 >DIAGNOSE< supported_subgraphs /home/jdoe/code/nyann/src/onnxruntime_neuralmagic/supported/subgraphs.cc:706] == Translating subgraph NM_Subgraph_1 to NM intake graph.
2[nm_ort 7f4fbbd3f740 >DIAGNOSE< supported_subgraphs /home/jdoe/code/nyann/src/onnxruntime_neuralmagic/supported/subgraphs.cc:715] ( L1 graph
3 ( values:
4 (10 float [ 1, 64, 224, 224 ])
5 (11 float [ 1, 64, 224, 224 ])
6 (12 float [ 1, 64, 224, 224 ])
7 (13 float [ 1, 32, 224, 224 ])
8 (14 float [ 1, 32, 224, 224 ])
9 (15 float [ 1, 9, 224, 224 ])
10 (9 float [ 1, 64, 224, 224 ])
11 (conv1.bias float [ 64 ])
12 (conv1.weight float [ 64, 1, 5, 5 ])
13 (conv2.bias float [ 64 ])
14 (conv2.weight float [ 64, 64, 3, 3 ])
15 (conv3.bias float [ 32 ])
16 (conv3.weight float [ 32, 64, 3, 3 ])
17 (conv4.bias float [ 9 ])
18 (conv4.weight float [ 9, 32, 3, 3 ])
19 (input float [ 1, 1, 224, 224 ])
20 )
21 ( operations:
22 (Constant conv1.bias (constant float [ 64 ]))
23 (Constant conv1.weight (constant float [ 64, 1, 5, 5 ]))
24 (Constant conv2.bias (constant float [ 64 ]))
25 (Constant conv2.weight (constant float [ 64, 64, 3, 3 ]))
26 (Constant conv3.bias (constant float [ 32 ]))
27 (Constant conv3.weight (constant float [ 32, 64, 3, 3 ]))
28 (Constant conv4.bias (constant float [ 9 ]))
29 (Constant conv4.weight (constant float [ 9, 32, 3, 3 ]))
30 (Input input (io 0))
31 (Conv input -> 9 (conv kernel = [ 64, 1, 5, 5 ] bias = [ 64 ] padding = {{2, 2}, {2, 2}} strides = {1, 1}))
32 (Elementwise 9 -> 10 (calc Relu))
33 (Conv 10 -> 11 (conv kernel = [ 64, 64, 3, 3 ] bias = [ 64 ] padding = {{1, 1}, {1, 1}} strides = {1, 1}))
34 (Elementwise 11 -> 12 (calc Relu))
35 (Conv 12 -> 13 (conv kernel = [ 32, 64, 3, 3 ] bias = [ 32 ] padding = {{1, 1}, {1, 1}} strides = {1, 1}))
36 (Elementwise 13 -> 14 (calc Relu))
37 (Conv 14 -> 15 (conv kernel = [ 9, 32, 3, 3 ] bias = [ 9 ] padding = {{1, 1}, {1, 1}} strides = {1, 1}))
38 (Output 15 (io 0))
39 )
40)

Determining the Number of Cores and Batch Size

This log detail describes the batch size and number of cores that Neural Magic is optimizing against. Look for == Compiling NM_Subgraph as in this example:

[nm_ort 7f4fbbd3f740 >DIAGNOSE< supported_subgraphs /home/jdoe/code/nyann/src/onnxruntime_neuralmagic/supported/subgraphs.cc:723] == Compiling NM_Subgraph_1 with batch size 1 using 18 cores.

Obtaining Subgraph Statistics

Locating == NM Execution Provider supports shows how many subgraphs we compiled and what percentage of the network we managed to support running:

1[nm_ort 7f4fbbd3f740 >DIAGNOSE< operator() /home/jdoe/code/nyann/src/onnxruntime_neuralmagic/nm_execution_provider.cc:122] Created 1 compiled subgraphs.
2[nm_ort 7f4fbbd3f740 >DIAGNOSE< validate_minimum_supported_fraction /home/jdoe/code/nyann/src/onnxruntime_neuralmagic/utility/graph_util.cc:321] == NM Execution Provider supports 70% of the network

Viewing Runtime Execution Times

​For each subgraph Neural Magic optimizes, the execution time is reported by ORT NM EP compute_func: for each run as follows:

[nm_ort 7f4fbbd3f740 >DIAGNOSE< operator() /home/jdoe/code/nyann/src/onnxruntime_neuralmagic/nm_execution_provider.cc:265] ORT NM EP compute_func: 6.478 ms

Full Example Log, Verbose Level = diagnose

The following is an example log with NM_LOGGING_LEVEL=diagnose running a super_resolution network, where we only support running 70% of it. Different portions of the log are explained in Parsing an Example Log.

1onnx_filename : test-models/cv-resolution/super_resolution/none-bsd300-onnx-repo/model.onnx
2[ INFO neuralmagic.py: 112 - neuralmagic_create() ] Construct network from ONNX = test-models/cv-resolution/super_resolution/none-bsd300-onnx-repo/model.onnx
3NeuralMagic WAND version: 1.0.0.96ce2f6cb23b8ab377012ed9ef38d3da3b9f5313 (optimized) (system=avx512, binary=avx512)
4[nm_ort 7f4fbbd3f740 >DIAGNOSE< operator() /home/jdoe/code/nyann/src/onnxruntime_neuralmagic/nm_execution_provider.cc:104] == NMExecutionProvider::GetCapability ==
5Printing GraphViewer torch-jit-export:
6Node 0: Conv
7Node 1: Relu
8Node 2: Conv
9Node 3: Relu
10Node 4: Conv
11Node 5: Relu
12Node 6: Conv
13Node 7: Reshape
14Node 8: Transpose
15Node 9: Reshape
16
17[nm_ort 7f4fbbd3f740 >DIAGNOSE< unsupported /home/jdoe/code/nyann/src/onnxruntime_neuralmagic/supported/ops.cc:60] Unsupported Reshape , const shape greater than 5D
18[nm_ort 7f4fbbd3f740 >DIAGNOSE< construct_subgraphs /home/jdoe/code/nyann/src/onnxruntime_neuralmagic/supported/subgraphs.cc:595] == Constructing subgraphs from graph info
19[nm_ort 7f4fbbd3f740 >WARN< construct_subgraphs /home/jdoe/code/nyann/src/onnxruntime_neuralmagic/supported/subgraphs.cc:604] Cannot support patterns, defaulting to non-pattern-matched subgraphs
20[nm_ort 7f4fbbd3f740 >DIAGNOSE< supported_subgraphs /home/jdoe/code/nyann/src/onnxruntime_neuralmagic/supported/subgraphs.cc:644] == Beginning new subgraph ==
21[nm_ort 7f4fbbd3f740 >DIAGNOSE< supported_subgraphs /home/jdoe/code/nyann/src/onnxruntime_neuralmagic/supported/subgraphs.cc:667] Runtime inputs for subgraph:
22[nm_ort 7f4fbbd3f740 >DIAGNOSE< supported_subgraphs /home/jdoe/code/nyann/src/onnxruntime_neuralmagic/supported/subgraphs.cc:679] input (required)
23[nm_ort 7f4fbbd3f740 >DIAGNOSE< supported_subgraphs /home/jdoe/code/nyann/src/onnxruntime_neuralmagic/supported/subgraphs.cc:684] Printing subgraph:
24Node 0: Conv
25Node 1: Relu
26Node 2: Conv
27Node 3: Relu
28Node 4: Conv
29Node 5: Relu
30Node 6: Conv
31
32[nm_ort 7f4fbbd3f740 >DIAGNOSE< supported_subgraphs /home/jdoe/code/nyann/src/onnxruntime_neuralmagic/supported/subgraphs.cc:706] == Translating subgraph NM_Subgraph_1 to NM intake graph.
33[nm_ort 7f4fbbd3f740 >DIAGNOSE< supported_subgraphs /home/jdoe/code/nyann/src/onnxruntime_neuralmagic/supported/subgraphs.cc:715] ( L1 graph
34 ( values:
35 (10 float [ 1, 64, 224, 224 ])
36 (11 float [ 1, 64, 224, 224 ])
37 (12 float [ 1, 64, 224, 224 ])
38 (13 float [ 1, 32, 224, 224 ])
39 (14 float [ 1, 32, 224, 224 ])
40 (15 float [ 1, 9, 224, 224 ])
41 (9 float [ 1, 64, 224, 224 ])
42 (conv1.bias float [ 64 ])
43 (conv1.weight float [ 64, 1, 5, 5 ])
44 (conv2.bias float [ 64 ])
45 (conv2.weight float [ 64, 64, 3, 3 ])
46 (conv3.bias float [ 32 ])
47 (conv3.weight float [ 32, 64, 3, 3 ])
48 (conv4.bias float [ 9 ])
49 (conv4.weight float [ 9, 32, 3, 3 ])
50 (input float [ 1, 1, 224, 224 ])
51 )
52 ( operations:
53 (Constant conv1.bias (constant float [ 64 ]))
54 (Constant conv1.weight (constant float [ 64, 1, 5, 5 ]))
55 (Constant conv2.bias (constant float [ 64 ]))
56 (Constant conv2.weight (constant float [ 64, 64, 3, 3 ]))
57 (Constant conv3.bias (constant float [ 32 ]))
58 (Constant conv3.weight (constant float [ 32, 64, 3, 3 ]))
59 (Constant conv4.bias (constant float [ 9 ]))
60 (Constant conv4.weight (constant float [ 9, 32, 3, 3 ]))
61 (Input input (io 0))
62 (Conv input -> 9 (conv kernel = [ 64, 1, 5, 5 ] bias = [ 64 ] padding = {{2, 2}, {2, 2}} strides = {1, 1}))
63 (Elementwise 9 -> 10 (calc Relu))
64 (Conv 10 -> 11 (conv kernel = [ 64, 64, 3, 3 ] bias = [ 64 ] padding = {{1, 1}, {1, 1}} strides = {1, 1}))
65 (Elementwise 11 -> 12 (calc Relu))
66 (Conv 12 -> 13 (conv kernel = [ 32, 64, 3, 3 ] bias = [ 32 ] padding = {{1, 1}, {1, 1}} strides = {1, 1}))
67 (Elementwise 13 -> 14 (calc Relu))
68 (Conv 14 -> 15 (conv kernel = [ 9, 32, 3, 3 ] bias = [ 9 ] padding = {{1, 1}, {1, 1}} strides = {1, 1}))
69 (Output 15 (io 0))
70 )
71)
72
73[nm_ort 7f4fbbd3f740 >DIAGNOSE< supported_subgraphs /home/jdoe/code/nyann/src/onnxruntime_neuralmagic/supported/subgraphs.cc:723] == Compiling NM_Subgraph_1 with batch size 1 using 18 cores.
74[7f4fbbd3f740 >DIAGNOSE< allocate_buffers_pass ./src/include/wand/engine/units/planner.hpp:49] compiler: total buffer size = 25690112/33918976, ratio = 0.757396
75[nm_ort 7f4fbbd3f740 >DIAGNOSE< supported_subgraphs /home/jdoe/code/nyann/src/onnxruntime_neuralmagic/supported/subgraphs.cc:644] == Beginning new subgraph ==
76[nm_ort 7f4fbbd3f740 >WARN< supported_subgraphs /home/jdoe/code/nyann/src/onnxruntime_neuralmagic/supported/subgraphs.cc:652] Filtered subgraph was empty, ignoring subgraph.
77[nm_ort 7f4fbbd3f740 >DIAGNOSE< operator() /home/jdoe/code/nyann/src/onnxruntime_neuralmagic/nm_execution_provider.cc:122] Created 1 compiled subgraphs.
78[nm_ort 7f4fbbd3f740 >DIAGNOSE< validate_minimum_supported_fraction /home/jdoe/code/nyann/src/onnxruntime_neuralmagic/utility/graph_util.cc:321] == NM Execution Provider supports 70% of the network
79[nm_ort 7f4fbbd3f740 >DIAGNOSE< operator() /home/jdoe/code/nyann/src/onnxruntime_neuralmagic/nm_execution_provider.cc:129] == End NMExecutionProvider::GetCapability ==
80[nm_ort 7f4fbbd3f740 >DIAGNOSE< operator() /home/jdoe/code/nyann/src/onnxruntime_neuralmagic/nm_execution_provider.cc:140] == NMExecutionProvider::Compile ==
81[nm_ort 7f4fbbd3f740 >DIAGNOSE< operator() /home/jdoe/code/nyann/src/onnxruntime_neuralmagic/nm_execution_provider.cc:157] Graph #0: 1 inputs and 1 outputs
82[nm_ort 7f4fbbd3f740 >DIAGNOSE< operator() /home/jdoe/code/nyann/src/onnxruntime_neuralmagic/nm_execution_provider.cc:276] == End NMExecutionProvider::Compile ==
83Generating 1 random inputs
84 -- 1 random input of shape = [1, 1, 224, 224]
85[ INFO execute.py: 242 - nm_exec_test_iters() ] Starting tests
86[ INFO neuralmagic.py: 121 - neuralmagic_execute() ] Executing TEST_1
87[ INFO neuralmagic.py: 124 - neuralmagic_execute() ] [1] input_data.shape = (1, 1, 224, 224)
88[ INFO neuralmagic.py: 126 - neuralmagic_execute() ] -- START
89[nm_ort 7f4fbbd3f740 >DIAGNOSE< operator() /home/jdoe/code/nyann/src/onnxruntime_neuralmagic/nm_execution_provider.cc:265] ORT NM EP compute_func: 6.478 ms
90[ INFO neuralmagic.py: 130 - neuralmagic_execute() ] -- FINISH
91[ INFO neuralmagic.py: 132 - neuralmagic_execute() ] [output] output_data.shape = (1, 1, 672, 672)
Benchmarking ONNX Models with the DeepSparse Engine
Using the numactl Utility to Control Resource Utilization with the DeepSparse Engine