Tools¶
This page describes the various SDK tools and feature for Linux/Android and Windows developers. For the integration flow of different developers, please refer to Overview page for further information.
Category |
Tool |
Developer |
|||||
|---|---|---|---|---|---|---|---|
Linux/Android |
Windows |
||||||
Ubuntu |
WSL x86 |
Device |
WSL x86 |
Windows x86_64 |
Windows on Snapdragon |
||
YES |
YES |
YES |
YES |
YES** |
|||
YES |
YES |
YES |
|||||
YES |
YES |
YES |
|||||
YES |
YES |
YES |
YES |
YES** |
|||
YES |
YES |
YES |
YES |
YES** |
|||
YES |
YES |
YES |
YES |
YES |
|||
YES |
YES |
YES |
YES |
||||
YES |
YES |
YES |
|||||
YES |
YES |
YES |
YES |
YES |
YES |
||
YES |
YES |
YES |
YES |
YES |
|||
YES |
YES |
YES |
YES |
||||
YES |
YES**** |
||||||
YES |
YES |
YES |
YES |
YES** |
|||
YES |
YES |
YES*** |
YES |
||||
YES |
YES |
||||||
YES |
YES |
YES |
|||||
YES |
YES |
YES |
YES* |
YES* |
|||
YES |
|||||||
YES |
|||||||
YES |
YES |
YES |
|||||
Note
Note
Note
Extension naming of library: For Windows developers, please replace all ‘.so’ files with the analogous ‘.dll’ file in the following sections. Please refer to Platform Differences for more details.
For more detailed information on converters please refer to Converters.
[*] libQnnGpuProfilingReader.dll is not supported on Windows platform for qnn-profile-viewer.
[**] Requires the python scripts and the executables from the Windows x86_64 binary folder(bin\x86_64-windows-msvc).
[***] Accuracy debugger on Windows x86 system is tested only for CPU runtime currently.
[****] The Accuracy Evaluator on Windows for Snapdragon has been tested and verified for both CPU and HTP runtimes.
PyTorch models and preprocessing/postprocessing stages that depend upon the torch library are currently not supported in the Windows version of the Accuracy Evaluator.
TFlite conversion using qairt-converter is not supported for Windows x86_64 and Windows on Snapdragon due TVM library dependency.
Model Conversion¶
qnn-tensorflow-converter¶
The qnn-tensorflow-converter tool converts a model from the TensorFlow framework to a CPP file representing the model as a series of QNN API calls. Additionally, a binary file containing static weights of the model is produced.
usage: qnn-tensorflow-converter -d INPUT_NAME INPUT_DIM --out_node OUT_NAMES
[--input_type INPUT_NAME INPUT_TYPE]
[--input_dtype INPUT_NAME INPUT_DTYPE] [--input_encoding ...]
[--input_layout INPUT_NAME INPUT_LAYOUT] [--custom_io CUSTOM_IO]
[--show_unconsumed_nodes] [--saved_model_tag SAVED_MODEL_TAG]
[--saved_model_signature_key SAVED_MODEL_SIGNATURE_KEY]
[--quantization_overrides QUANTIZATION_OVERRIDES]
[--keep_quant_nodes] [--disable_batchnorm_folding]
[--expand_lstm_op_structure]
[--keep_disconnected_nodes] [--input_list INPUT_LIST]
[--param_quantizer PARAM_QUANTIZER] [--act_quantizer ACT_QUANTIZER]
[--algorithms ALGORITHMS [ALGORITHMS ...]]
[--bias_bitwidth BIAS_BITWIDTH] [--bias_bw BIAS_BW]
[--act_bitwidth ACT_BITWIDTH] [--act_bw ACT_BW]
[--weights_bitwidth WEIGHTS_BITWIDTH] [--weight_bw WEIGHT_BW]
[--float_bias_bitwidth FLOAT_BIAS_BITWIDTH] [--ignore_encodings]
[--use_per_channel_quantization] [--use_per_row_quantization]
[--enable_per_row_quantized_bias]
[--float_fallback] [--use_native_input_files] [--use_native_dtype]
[--use_native_output_files] [--disable_relu_squashing]
[--restrict_quantization_steps ENCODING_MIN, ENCODING_MAX]
--input_network INPUT_NETWORK [--debug [DEBUG]]
[-o OUTPUT_PATH] [--copyright_file COPYRIGHT_FILE]
[--float_bitwidth FLOAT_BITWIDTH] [--float_bw FLOAT_BW]
[--float_bias_bw FLOAT_BIAS_BW] [--overwrite_model_prefix]
[--exclude_named_tensors] [--op_package_lib OP_PACKAGE_LIB]
[--converter_op_package_lib CONVERTER_OP_PACKAGE_LIB]
[-p PACKAGE_NAME | --op_package_config CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]]
[-h] [--arch_checker]
Script to convert TF model into QNN
required arguments:
-d INPUT_NAME INPUT_DIM, --input_dim INPUT_NAME INPUT_DIM
The names and dimensions of the network input layers specified in the format
[input_name comma-separated-dimensions], for example:
'data' 1,224,224,3
Note that the quotes should always be included in order to
handlespecial characters, spaces, etc.
For multiple inputs specify multiple --input_dim on the command line like:
--input_dim 'data1' 1,224,224,3 --input_dim 'data2' 1,50,100,3
--out_node OUT_NODE, --out_name OUT_NAMES
Name of the graph's output nodes. Multiple output nodes should be
provided separately like:
--out_node out_1 --out_node out_2
--input_network INPUT_NETWORK, -i INPUT_NETWORK
Path to the source framework model.
optional arguments:
--input_type INPUT_NAME INPUT_TYPE, -t INPUT_NAME INPUT_TYPE
Type of data expected by each input op/layer. Type for each input is
|default| if not specified. For example: "data" image.Note that the quotes
should always be included in order to handle special characters, spaces,etc.
For multiple inputs specify multiple --input_type on the command line.
Eg:
--input_type "data1" image --input_type "data2" opaque
These options get used by DSP runtime and following descriptions state how
input will be handled for each option.
Image:
Input is float between 0-255 and the input's mean is 0.0f and the input's
max is 255.0f. We will cast the float to uint8ts and pass the uint8ts to the
DSP.
Default:
Pass the input as floats to the dsp directly and the DSP will quantize it.
Opaque:
Assumes input is float because the consumer layer(i.e next layer) requires
it as float, therefore it won't be quantized.
Choices supported:
image
default
opaque
--input_dtype INPUT_NAME INPUT_DTYPE
The names and datatype of the network input layers specified in the format
[input_name datatype], for example:
'data' 'float32'.
Default is float32 if not specified.
Note that the quotes should always be included in order to handle special
characters, spaces, etc.
For multiple inputs specify multiple --input_dtype on the command line like:
--input_dtype 'data1' 'float32' --input_dtype 'data2' 'float32'
--input_encoding INPUT_ENCODING [INPUT_ENCODING ...], -e INPUT_ENCODING [INPUT_ENCODING ...]
Usage: --input_encoding "INPUT_NAME" INPUT_ENCODING_IN
[INPUT_ENCODING_OUT]
Input encoding of the network inputs. Default is bgr.
e.g.
--input_encoding "data" rgba
Quotes must wrap the input node name to handle special characters,
spaces, etc. To specify encodings for multiple inputs, invoke
--input_encoding for each one.
e.g.
--input_encoding "data1" rgba --input_encoding "data2" other
Optionally, an output encoding may be specified for an input node by
providing a second encoding. The default output encoding is bgr.
e.g.
--input_encoding "data3" rgba rgb
Input encoding types:
image color encodings: bgr,rgb, nv21, nv12, ...
time_series: for inputs of rnn models;
other: not available above or is unknown.
Supported encodings:
bgr
rgb
rgba
argb32
nv21
nv12
time_series
other
--input_layout INPUT_NAME INPUT_LAYOUT, -l INPUT_NAME INPUT_LAYOUT
Layout of each input tensor. If not specified, it will use the default
based on the Source Framework, shape of input and input encoding.
Accepted values are-
NCDHW, NDHWC, NCHW, NHWC, NFC, NCF, NTF, TNF, NF, NC, F, NONTRIVIAL
N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature, T = Time
NDHWC/NCDHW used for 5d inputs
NHWC/NCHW used for 4d image-like inputs
NFC/NCF used for inputs to Conv1D or other 1D ops
NTF/TNF used for inputs with time steps like the ones used for LSTM op
NF used for 2D inputs, like the inputs to Dense/FullyConnected layers
NC used for 2D inputs with 1 for batch and other for Channels (rarely used)
F used for 1D inputs, e.g. Bias tensor
NONTRIVIAL for everything elseFor multiple inputs specify multiple
--input_layout on the command line.
Eg:
--input_layout "data1" NCHW --input_layout "data2" NCHW
--custom_io CUSTOM_IO
Use this option to specify a yaml file for custom IO
--show_unconsumed_nodes
Displays a list of unconsumed nodes, if there any are found. Nodes which are
unconsumed do not violate the structural fidelity of thegenerated graph.
--saved_model_tag SAVED_MODEL_TAG
Specify the tag to seletet a MetaGraph from savedmodel. ex:
--saved_model_tag serve. Default value will be 'serve' when it is not
assigned.
--saved_model_signature_key SAVED_MODEL_SIGNATURE_KEY
Specify signature key to select input and output of the model. ex:
--saved_model_signature_key serving_default. Default value will be
'serving_default' when it is not assigned
--disable_batchnorm_folding
--expand_lstm_op_structure
Enables optimization that breaks the LSTM op to equivalent math ops
--keep_disconnected_nodes
Disable Optimization that removes Ops not connected to the main graph.
This optimization uses output names provided over commandline OR
inputs/outputs extracted from the Source model to determine the main graph
--debug [DEBUG] Run the converter in debug mode.
-o OUTPUT_PATH, --output_path OUTPUT_PATH
Path where the converted Output model should be saved.If not specified, the
converter model will be written to a file with same name as the input model
--copyright_file COPYRIGHT_FILE
Path to copyright file. If provided, the content of the file will be added
to the output model.
--float_bitwidth FLOAT_BITWIDTH
Selects the bitwidth to use when using float for parameters (weights/bias)
and activations for all ops or a specific op (via encodings) selected
through encoding; 32 (default) or 16.
--float_bw FLOAT_BW Deprecated; use --float_bitwidth.
--float_bias_bw FLOAT_BIAS_BW
Deprecated; use --float_bias_bitwidth.
--overwrite_model_prefix
If option passed, model generator will use the output path name to use as
model prefix to name functions in <qnn_model_name>.cpp. (Useful for running
multiple models at once) eg: ModelName_composeGraphs. Default is to use
generic "QnnModel_".
--exclude_named_tensors
Remove using source framework tensorNames; instead use a counter for naming
tensors. Note: This can potentially help to reduce the final model library
that will be generated(Recommended for deploying model). Default is False.
-h, --help show this help message and exit
Quantizer Options:
--quantization_overrides QUANTIZATION_OVERRIDES
Use this option to specify a json file with parameters to use for
quantization. These will override any quantization data carried from
conversion (eg TF fake quantization) or calculated during the normal
quantization process. Format defined as per AIMET specification.
--keep_quant_nodes Use this option to keep activation quantization nodes in the graph rather
than stripping them.
--input_list INPUT_LIST
Path to a file specifying the input data. This file should be a plain text
file, containing one or more absolute file paths per line. Each path is
expected to point to a binary file containing one input in the "raw" format,
ready to be consumed by the quantizer without any further preprocessing.
Multiple files per line separated by spaces indicate multiple inputs to the
network. See documentation for more details. Must be specified for
quantization. All subsequent quantization options are ignored when this is
not provided.
--param_quantizer PARAM_QUANTIZER
Optional parameter to indicate the weight/bias quantizer to use. Must be followed by one of the following options:
"tf": Uses the real min/max of the data and specified bitwidth (default).
"enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
"adjusted": Deprecated.
"symmetric": Ensures min and max have the same absolute values about zero.
Data will be stored as int#_t data such that the offset is always 0.
--act_quantizer ACT_QUANTIZER
Optional parameter to indicate the activation quantizer to use. Must be followed by one of the following options:
"tf": Uses the real min/max of the data and specified bitwidth (default).
"enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
"adjusted": Deprecated.
"symmetric": Ensures min and max have the same absolute values about zero.
Data will be stored as int#_t data such that the offset is always 0.
--algorithms ALGORITHMS [ALGORITHMS ...]
Use this option to enable new optimization algorithms. Usage is:
--algorithms <algo_name1> ... The available optimization algorithms are:
"cle" - Cross layer equalization includes a number of methods for equalizing
weights and biases across layers in order to rectify imbalances that cause
quantization errors.
--bias_bitwidth BIAS_BITWIDTH
Selects the bitwidth to use when quantizing the biases; 8 (default) or 32.
--bias_bw BIAS_BW Deprecated; use --bias_bitwidth.
--act_bitwidth ACT_BITWIDTH
Selects the bitwidth to use when quantizing the activations; 8 (default) or 16.
--act_bw ACT_BW Deprecated; use --act_bitwidth.
--weights_bitwidth WEIGHTS_BITWIDTH
Selects the bitwidth to use when quantizing the weights; 4 or 8 (default).
--weight_bw WEIGHT_BW
Deprecated; use --weights_bitwidth.
--float_bias_bitwidth FLOAT_BIAS_BITWIDTH
Selects the bitwidth to use when biases are in float; 32 or 16.
--ignore_encodings Use only quantizer generated encodings, ignoring any user or model provided
encodings.
Note: Cannot use --ignore_encodings with --quantization_overrides
--use_per_channel_quantization
Enables per-channel quantization for convolution-based op weights.
This replaces the built-in model QAT encodings when used for a given weight.
--use_per_row_quantization
Enables row wise quantization of Matmul and FullyConnected ops.
--enable_per_row_quantized_bias
Enables row wise quantization of bias for FullyConnected op, when weights are per-row quantized.
--float_fallback Enables fallback to floating point (FP) instead of fixed point.
This option can be paired with --float_bitwidth to indicate the bitwidth for FP (by default 32).
If this option is enabled, --input_list must not be provided and --ignore_encodings must not be provided.
The external quantization encodings (encoding file/FakeQuant encodings) might be missing
quantization parameters for some interim tensors. First it will try to fill the gaps by
propagating across math-invariant functions. If the quantization params are still missing,
it applies fallback to nodes to floating point.
--use_native_input_files
Boolean flag to indicate how to read input files:
1. float (default): reads inputs as floats and quantizes if necessary based
on quantization parameters in the model.
2. native: reads inputs assuming the data type to be native to the
model. For ex., uint8_t.
--use_native_dtype Note: This option is deprecated, use --use_native_input_files option in
future.
Boolean flag to indicate how to read input files:
1. float (default): reads inputs as floats and quantizes if necessary based
on quantization parameters in the model.
2. native: reads inputs assuming the data type to be native to the
model. For ex., uint8_t.
--use_native_output_files
Use this option to indicate the data type of the output files
1. float (default): output the file as floats.
2. native: outputs the file that is native to the model. For ex.,
uint8_t.
--disable_relu_squashing
Disables squashing of ReLU against convolution-based ops for quantized models.
--restrict_quantization_steps ENCODING_MIN, ENCODING_MAX
Specifies the number of steps to use for computing quantization encodings
such that scale = (max - min) / number of quantization steps.
The option should be passed as a space separated pair of hexadecimal string
minimum and maximum values. i.e. --restrict_quantization_steps "MIN MAX".
Please note that this is a hexadecimal string literal and not a signed
integer, to supply a negative value an explicit minus sign is required.
E.g.--restrict_quantization_steps "-0x80 0x7F" indicates an example 8 bit range,
--restrict_quantization_steps "-0x8000 0x7F7F" indicates an example 16
bit range. This argument is required for 16-bit Matmul operations.
Custom Op Package Options:
--op_package_lib OP_PACKAGE_LIB, -opl OP_PACKAGE_LIB
Use this argument to pass an op package library for quantization. Must be in
the form
<op_package_lib_path:interfaceProviderName> and be separated by a
comma for multiple package libs
-p PACKAGE_NAME, --package_name PACKAGE_NAME
A global package name to be used for each node in the Model.cpp file.
Defaults to Qnn header defined package name
--converter_op_package_lib CONVERTER_OP_PACKAGE_LIB, -cpl CONVERTER_OP_PACKAGE_LIB
Absolute path to converter op package library compiled by the OpPackage
generator. Must be separated by a comma for multiple package libraries.
Note: Libraries must follow the same order as the xml files.
E.g.1: --converter_op_package_lib absolute_path_to/libExample.so
E.g.2: -cpl absolute_path_to/libExample1.so,absolute_path_to/libExample2.so
--op_package_config OP_PACKAGE_CONFIG [OP_PACKAGE_CONFIG ...], -opc OP_PACKAGE_CONFIG [OP_PACKAGE_CONFIG ...]
Path to a Qnn Op Package XML configuration file that contains user defined
custom operations.
Architecture Checker Options(Experimental):
--arch_checker Note: This option will be soon deprecated. Use the qnn-architecture-checker tool to achieve the same result.
Note: Only one of: {'package_name', 'op_package_config'} can be specified
Basic command line usage looks like:
$ qnn-tensorflow-converter -i <path>/frozen_graph.pb
-d <network_input_name> <dims>
--out_node <network_output_name>
-o <optional_output_path>
--allow_unconsumed_nodes # optional, but most likely will be need for larger models
-p <optional_package_name> # Defaults to "qti.aisw"
qnn-tflite-converter¶
The qnn-tflite-converter tool converts a TFLite model to a CPP file representing the model as a series of QNN API calls. Additionally, a binary file containing static weights of the model is produced.
usage: qnn-tflite-converter [-d INPUT_NAME INPUT_DIM] [--signature_name SIGNATURE_NAME]
[--out_node OUT_NAMES] [--input_type INPUT_NAME INPUT_TYPE]
[--input_dtype INPUT_NAME INPUT_DTYPE] [--input_encoding ...]
[--input_layout INPUT_NAME INPUT_LAYOUT] [--custom_io CUSTOM_IO]
[--dump_relay DUMP_RELAY]
[--quantization_overrides QUANTIZATION_OVERRIDES] [--keep_quant_nodes]
[--disable_batchnorm_folding] [--expand_lstm_op_structure]
[--keep_disconnected_nodes]
[--input_list INPUT_LIST] [--param_quantizer PARAM_QUANTIZER]
[--act_quantizer ACT_QUANTIZER]
[--algorithms ALGORITHMS [ALGORITHMS ...]]
[--bias_bitwidth BIAS_BITWIDTH] [--bias_bw BIAS_BW]
[--act_bitwidth ACT_BITWIDTH] [--act_bw ACT_BW]
[--weights_bitwidth WEIGHTS_BITWIDTH] [--weight_bw WEIGHT_BW]
[--float_bias_bitwidth FLOAT_BIAS_BITWIDTH] [--ignore_encodings]
[--use_per_channel_quantization] [--use_per_row_quantization]
[--enable_per_row_quantized_bias]
[--float_fallback] [--use_native_input_files] [--use_native_dtype]
[--use_native_output_files] [--disable_relu_squashing]
[--restrict_quantization_steps ENCODING_MIN, ENCODING_MAX]
--input_network INPUT_NETWORK [--debug [DEBUG]]
[-o OUTPUT_PATH] [--copyright_file COPYRIGHT_FILE]
[--float_bitwidth FLOAT_BITWIDTH] [--float_bw FLOAT_BW]
[--float_bias_bw FLOAT_BIAS_BW] [--overwrite_model_prefix]
[--exclude_named_tensors] [--op_package_lib OP_PACKAGE_LIB]
[--converter_op_package_lib CONVERTER_OP_PACKAGE_LIB]
[-p PACKAGE_NAME | --op_package_config CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]]
[-h] [--arch_checker]
Script to convert TFLite model into QNN
required arguments:
--input_network INPUT_NETWORK, -i INPUT_NETWORK
Path to the source framework model.
optional arguments:
-d INPUT_NAME INPUT_DIM, --input_dim INPUT_NAME INPUT_DIM
The names and dimensions of the network input layers specified in the format
[input_name comma-separated-dimensions], for example:
'data' 1,224,224,3
Note that the quotes should always be included in order to handle special
characters, spaces, etc.
For multiple inputs specify multiple --input_dim on the command line like:
--input_dim 'data1' 1,224,224,3 --input_dim 'data2' 1,50,100,3
--signature_name SIGNATURE_NAME, -sn SIGNATURE_NAME
Specifies a specific subgraph signature to convert.
--out_node OUT_NAMES, --out_name OUT_NAMES
Name of the graph's output Tensor Names. Multiple output names should be
provided separately like:
--out_name out_1 --out_name out_2
--input_type INPUT_NAME INPUT_TYPE, -t INPUT_NAME INPUT_TYPE
Type of data expected by each input op/layer. Type for each input is
|default| if not specified. For example: "data" image.Note that the quotes
should always be included in order to handle special characters, spaces,etc.
For multiple inputs specify multiple --input_type on the command line.
Eg:
--input_type "data1" image --input_type "data2" opaque
These options get used by DSP runtime and following descriptions state how
input will be handled for each option.
Image:
Input is float between 0-255 and the input's mean is 0.0f and the input's
max is 255.0f. We will cast the float to uint8ts and pass the uint8ts to the
DSP.
Default:
Pass the input as floats to the dsp directly and the DSP will quantize it.
Opaque:
Assumes input is float because the consumer layer(i.e next layer) requires
it as float, therefore it won't be quantized.
Choices supported:
image
default
opaque
--input_dtype INPUT_NAME INPUT_DTYPE
The names and datatype of the network input layers specified in the format
[input_name datatype], for example:
'data' 'float32'
Default is float32 if not specified
Note that the quotes should always be included in order to handlespecial
characters, spaces, etc.
For multiple inputs specify multiple --input_dtype on the command line like:
--input_dtype 'data1' 'float32' --input_dtype 'data2' 'float32'
--input_encoding INPUT_ENCODING [INPUT_ENCODING ...], -e INPUT_ENCODING [INPUT_ENCODING ...]
Usage: --input_encoding "INPUT_NAME" INPUT_ENCODING_IN
[INPUT_ENCODING_OUT]
Input encoding of the network inputs. Default is bgr.
e.g.
--input_encoding "data" rgba
Quotes must wrap the input node name to handle special characters,
spaces, etc. To specify encodings for multiple inputs, invoke
--input_encoding for each one.
e.g.
--input_encoding "data1" rgba --input_encoding "data2" other
Optionally, an output encoding may be specified for an input node by
providing a second encoding. The default output encoding is bgr.
e.g.
--input_encoding "data3" rgba rgb
Input encoding types:
image color encodings: bgr,rgb, nv21, nv12, ...
time_series: for inputs of rnn models;
other: not available above or is unknown.
Supported encodings:
bgr
rgb
rgba
argb32
nv21
nv12
time_series
other
--input_layout INPUT_NAME INPUT_LAYOUT, -l INPUT_NAME INPUT_LAYOUT
Layout of each input tensor. If not specified, it will use the default
based on the Source Framework, shape of input and input encoding.
Accepted values are-
NCDHW, NDHWC, NCHW, NHWC, NFC, NCF, NTF, TNF, NF, NC, F, NONTRIVIAL
N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature, T = Time
NDHWC/NCDHW used for 5d inputs
NHWC/NCHW used for 4d image-like inputs
NFC/NCF used for inputs to Conv1D or other 1D ops
NTF/TNF used for inputs with time steps like the ones used for LSTM op
NF used for 2D inputs, like the inputs to Dense/FullyConnected layers
NC used for 2D inputs with 1 for batch and other for Channels (rarely used)
F used for 1D inputs, e.g. Bias tensor
NONTRIVIAL for everything elseFor multiple inputs specify multiple
--input_layout on the command line.
Eg:
--input_layout "data1" NCHW --input_layout "data2" NCHW
--custom_io CUSTOM_IO
Use this option to specify a yaml file for custom IO.
--dump_relay DUMP_RELAY
Dump Relay ASM and Params at the path provided with the argument
Usage: --dump_relay <path_to_dump>
--show_unconsumed_nodes
Displays a list of unconsumed nodes, if there any are
found. Nodes which are unconsumed do not violate the
structural fidelity of the generated graph.
--disable_batchnorm_folding
--expand_lstm_op_structure
Enables optimization that breaks the LSTM op to equivalent math ops
--keep_disconnected_nodes
Disable Optimization that removes Ops not connected to the main graph.
This optimization uses output names provided over commandline OR
inputs/outputs extracted from the Source model to determine the main graph
-o OUTPUT_PATH, --output_path OUTPUT_PATH
Path where the converted Output model should be saved.If not specified, the
converter model will be written to a file with same name as the input model
--copyright_file COPYRIGHT_FILE
Path to copyright file. If provided, the content of the file will be added
to the output model.
--float_bitwidth FLOAT_BITWIDTH
Selects the bitwidth to use when using float for parameters (weights/bias)
and activations for all ops or a specific op (via encodings) selected
through encoding; 32 (default) or 16.
--float_bw FLOAT_BW Deprecated; use --float_bitwidth.
--float_bias_bw FLOAT_BIAS_BW
Deprecated; use --float_bias_bitwidth.
--overwrite_model_prefix
If option passed, model generator will use the output path name to use as
model prefix to name functions in <qnn_model_name>.cpp. (Useful for running
multiple models at once) eg: ModelName_composeGraphs. Default is to use
generic "QnnModel_".
--exclude_named_tensors
Remove using source framework tensorNames; instead use a counter for naming
tensors. Note: This can potentially help to reduce the final model library
that will be generated(Recommended for deploying model). Default is False.
-h, --help show this help message and exit
Quantizer Options:
--quantization_overrides QUANTIZATION_OVERRIDES
Use this option to specify a json file with parameters to use for
quantization. These will override any quantization data carried from
conversion (eg TF fake quantization) or calculated during the normal
quantization process. Format defined as per AIMET specification.
--keep_quant_nodes Use this option to keep activation quantization nodes in the graph rather
than stripping them.
--input_list INPUT_LIST
Path to a file specifying the input data. This file should be a plain text
file, containing one or more absolute file paths per line. Each path is
expected to point to a binary file containing one input in the "raw" format,
ready to be consumed by the quantizer without any further preprocessing.
Multiple files per line separated by spaces indicate multiple inputs to the
network. See documentation for more details. Must be specified for
quantization. All subsequent quantization options are ignored when this is
not provided.
--param_quantizer PARAM_QUANTIZER
Optional parameter to indicate the weight/bias quantizer to use. Must be followed by one of the following options:
"tf": Uses the real min/max of the data and specified bitwidth (default).
"enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
"adjusted": Deprecated.
"symmetric": Ensures min and max have the same absolute values about zero.
Data will be stored as int#_t data such that the offset is always 0.
--act_quantizer ACT_QUANTIZER
Optional parameter to indicate the activation quantizer to use. Must be followed by one of the following options:
"tf": Uses the real min/max of the data and specified bitwidth (default).
"enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
"adjusted": Deprecated.
"symmetric": Ensures min and max have the same absolute values about zero.
Data will be stored as int#_t data such that the offset is always 0.
--algorithms ALGORITHMS [ALGORITHMS ...]
Use this option to enable new optimization algorithms. Usage is:
--algorithms <algo_name1> ... The available optimization algorithms are:
"cle" - Cross layer equalization includes a number of methods for equalizing
weights and biases across layers in order to rectify imbalances that cause
quantization errors.
--bias_bitwidth BIAS_BITWIDTH
Selects the bitwidth to use when quantizing the biases; 8 (default) or 32.
--bias_bw BIAS_BW Deprecated; use --bias_bitwidth.
--act_bitwidth ACT_BITWIDTH
Selects the bitwidth to use when quantizing the activations; 8 (default) or 16.
--act_bw ACT_BW Deprecated; use --act_bitwidth.
--weights_bitwidth WEIGHTS_BITWIDTH
Selects the bitwidth to use when quantizing the weights; 4 or 8 (default).
--weight_bw WEIGHT_BW
Deprecated; use --weights_bitwidth.
--float_bias_bitwidth FLOAT_BIAS_BITWIDTH
Selects the bitwidth to use when biases are in float; 32 or 16.
--ignore_encodings Use only quantizer generated encodings, ignoring any user or model provided
encodings.
Note: Cannot use --ignore_encodings with --quantization_overrides
--use_per_channel_quantization
Enables per-channel quantization for convolution-based op weights.
This replaces the built-in model QAT encodings when used for a given weight.
--use_per_row_quantization
Enables row wise quantization of Matmul and FullyConnected ops.
--enable_per_row_quantized_bias
Enables row wise quantization of bias for FullyConnected op, when weights are per-row quantized.
--float_fallback Enables fallback to floating point (FP) instead of fixed point.
This option can be paired with --float_bitwidth to indicate the bitwidth for FP (by default 32).
If this option is enabled, --input_list must not be provided and --ignore_encodings must not be provided.
The external quantization encodings (encoding file/FakeQuant encodings) might be missing
quantization parameters for some interim tensors. First it will try to fill the gaps by
propagating across math-invariant functions. If the quantization params are still missing,
it applies fallback to nodes to floating point.
--use_native_input_files
Boolean flag to indicate how to read input files:
1. float (default): reads inputs as floats and quantizes if necessary based
on quantization parameters in the model.
2. native: reads inputs assuming the data type to be native to the
model. For ex., uint8_t.
--use_native_dtype Note: This option is deprecated, use --use_native_input_files option in
future.
Boolean flag to indicate how to read input files:
1. float (default): reads inputs as floats and quantizes if necessary based
on quantization parameters in the model.
2. native: reads inputs assuming the data type to be native to the
model. For ex., uint8_t.
--use_native_output_files
Use this option to indicate the data type of the output files
1. float (default): output the file as floats.
2. native: outputs the file that is native to the model. For ex.,
uint8_t.
--disable_relu_squashing
Disables squashing of ReLU against convolution-based ops for quantized models.
--restrict_quantization_steps ENCODING_MIN, ENCODING_MAX
Specifies the number of steps to use for computing quantization encodings
such that scale = (max - min) / number of quantization steps.
The option should be passed as a space separated pair of hexadecimal string
minimum and maximum values. i.e. --restrict_quantization_steps "MIN MAX".
Please note that this is a hexadecimal string literal and not a signed
integer, to supply a negative value an explicit minus sign is required.
E.g.--restrict_quantization_steps "-0x80 0x7F" indicates an example 8 bit range,
--restrict_quantization_steps "-0x8000 0x7F7F" indicates an example 16
bit range.
Custom Op Package Options:
--op_package_lib OP_PACKAGE_LIB, -opl OP_PACKAGE_LIB
Use this argument to pass an op package library for quantization. Must be in
the form <op_package_lib_path:interfaceProviderName> and be separated by a
comma for multiple package libs
--converter_op_package_lib CONVERTER_OP_PACKAGE_LIB, -cpl CONVERTER_OP_PACKAGE_LIB
Absolute path to converter op package library compiled by the OpPackage
generator. Must be separated by a comma for multiple package libraries.
Note: Libraries must follow the same order as the xml files.
E.g.1: --converter_op_package_lib absolute_path_to/libExample.so
E.g.2: -cpl absolute_path_to/libExample1.so,absolute_path_to/libExample2.so
-p PACKAGE_NAME, --package_name PACKAGE_NAME
A global package name to be used for each node in the Model.cpp file.
Defaults to Qnn header defined package name
--op_package_config OP_PACKAGE_CONFIG [OP_PACKAGE_CONFIG ...], -opc OP_PACKAGE_CONFIG [OP_PACKAGE_CONFIG ...]
Path to a Qnn Op Package XML configuration file that contains user defined
custom operations.
Architecture Checker Options(Experimental):
--arch_checker Note: This option will be soon deprecated. Use the qnn-architecture-checker tool to achieve the same result.
Note: Only one of: {'package_name', 'op_package_config'} can be specified
Basic command line usage looks like:
$ qnn-tflite-converter -i <path>/model.tflite
-d <optional_network_input_name> <dims>
-o <optional_output_path>
-p <optional_package_name> # Defaults to "qti.aisw"
qnn-pytorch-converter¶
The qnn-pytorch-converter tool converts a PyTorch model to a CPP file representing the model as a series of QNN API calls. Additionally, a binary file containing static weights of the model is produced.
usage: qnn-pytorch-converter -d INPUT_NAME INPUT_DIM [--out_node OUT_NAMES]
[--input_type INPUT_NAME INPUT_TYPE]
[--input_dtype INPUT_NAME INPUT_DTYPE] [--input_encoding ...]
[--input_layout INPUT_NAME INPUT_LAYOUT] [--custom_io CUSTOM_IO]
[--preserve_io [PRESERVE_IO [PRESERVE_IO ...]]]
[--dump_relay DUMP_RELAY] [--dry_run] [--dump_out_names]
[--pytorch_custom_op_lib PYTORCH_CUSTOM_OP_LIB]
[--quantization_overrides QUANTIZATION_OVERRIDES] [--keep_quant_nodes]
[--disable_batchnorm_folding] [--expand_lstm_op_structure]
[--keep_disconnected_nodes]
[--input_list INPUT_LIST] [--param_quantizer PARAM_QUANTIZER]
[--act_quantizer ACT_QUANTIZER]
[--algorithms ALGORITHMS [ALGORITHMS ...]]
[--bias_bitwidth BIAS_BITWIDTH] [--bias_bw BIAS_BW]
[--act_bitwidth ACT_BITWIDTH] [--act_bw ACT_BW]
[--weights_bitwidth WEIGHTS_BITWIDTH] [--weight_bw WEIGHT_BW]
[--float_bias_bitwidth FLOAT_BIAS_BITWIDTH] [--ignore_encodings]
[--use_per_channel_quantization] [--use_per_row_quantization]
[--enable_per_row_quantized_bias]
[--float_fallback] [--use_native_input_files] [--use_native_dtype]
[--use_native_output_files] [--disable_relu_squashing]
[--restrict_quantization_steps ENCODING_MIN, ENCODING_MAX]
--input_network INPUT_NETWORK [--debug [DEBUG]]
[-o OUTPUT_PATH] [--copyright_file COPYRIGHT_FILE]
[--float_bitwidth FLOAT_BITWIDTH] [--float_bw FLOAT_BW]
[--float_bias_bw FLOAT_BIAS_BW] [--overwrite_model_prefix]
[--exclude_named_tensors] [--op_package_lib OP_PACKAGE_LIB]
[--converter_op_package_lib CONVERTER_OP_PACKAGE_LIB]
[-p PACKAGE_NAME | --op_package_config CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]]
[-h] [--arch_checker]
Script to convert PyTorch model into QNN
required arguments:
-d INPUT_NAME INPUT_DIM, --input_dim INPUT_NAME INPUT_DIM
The names and dimensions of the network input layers specified in the format
[input_name comma-separated-
dimensions], for example:
'data' 1,3,224,224
Note that the quotes should always be included in order to handle special
characters, spaces, etc.
For multiple inputs specify multiple --input_dim on the command line like:
--input_dim 'data1' 1,3,224,224 --input_dim 'data2' 1,50,100,3
--input_network INPUT_NETWORK, -i INPUT_NETWORK
Path to the source framework model.
optional arguments:
--out_node OUT_NAMES, --out_name OUT_NAMES
Name of the graph's output Tensor Names. Multiple output names should be
provided separately like:
--out_name out_1 --out_name out_2
--input_type INPUT_NAME INPUT_TYPE, -t INPUT_NAME INPUT_TYPE
Type of data expected by each input op/layer. Type for each input is
|default| if not specified. For example: "data" image.Note that the quotes
should always be included in order to handle special characters, spaces, etc.
For multiple inputs specify multiple --input_type on the command line.
Eg:
--input_type "data1" image --input_type "data2" opaque
These options get used by DSP runtime and following descriptions state how
input will be handled for each option.
Image:
Input is float between 0-255 and the input's mean is 0.0f and the input's
max is 255.0f. We will cast the float to uint8ts and pass the uint8ts to the
DSP.
Default:
Pass the input as floats to the dsp directly and the DSP will quantize it.
Opaque:
Assumes
input is float because the consumer layer(i.e next layer) requires
it as float, therefore it won't be quantized.
Choices supported:
image
default
opaque
--input_dtype INPUT_NAME INPUT_DTYPE
The names and datatype of the network input layers specified in the format
[input_name datatype], for example:
'data' 'float32'
Default is float32 if not specified
Note that the quotes should always be included in order to handlespecial
characters, spaces, etc.
For multiple inputs specify multiple --input_dtype on the command line like:
--input_dtype 'data1' 'float32' --input_dtype 'data2' 'float32'
--input_encoding INPUT_ENCODING [INPUT_ENCODING ...], -e INPUT_ENCODING [INPUT_ENCODING ...]
Usage: --input_encoding "INPUT_NAME" INPUT_ENCODING_IN
[INPUT_ENCODING_OUT]
Input encoding of the network inputs. Default is bgr.
e.g.
--input_encoding "data" rgba
Quotes must wrap the input node name to handle special characters,
spaces, etc. To specify encodings for multiple inputs, invoke
--input_encoding for each one.
e.g.
--input_encoding "data1" rgba --input_encoding "data2" other
Optionally, an output encoding may be specified for an input node by
providing a second encoding. The default output encoding is bgr.
e.g.
--input_encoding "data3" rgba rgb
Input encoding types:
image color encodings: bgr,rgb, nv21, nv12, ...
time_series: for inputs of rnn models;
other: not available above or is unknown.
Supported encodings:
bgr
rgb
rgba
argb32
nv21
nv12
time_series
other
--input_layout INPUT_NAME INPUT_LAYOUT, -l INPUT_NAME INPUT_LAYOUT
Layout of each input tensor. If not specified, it will use the default
based on the Source Framework, shape of input and input encoding.
Accepted values are-
NCDHW, NDHWC, NCHW, NHWC, NFC, NCF, NTF, TNF, NF, NC, F, NONTRIVIAL
N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature, T = Time
NDHWC/NCDHW used for 5d inputs
NHWC/NCHW used for 4d image-like inputs
NFC/NCF used for inputs to Conv1D or other 1D ops
NTF/TNF used for inputs with time steps like the ones used for LSTM op
NF used for 2D inputs, like the inputs to Dense/FullyConnected layers
NC used for 2D inputs with 1 for batch and other for Channels (rarely used)
F used for 1D inputs, e.g. Bias tensor
NONTRIVIAL for everything elseFor multiple inputs specify multiple
--input_layout on the command line.
Eg:
--input_layout "data1" NCHW --input_layout "data2" NCHW
--custom_io CUSTOM_IO
Use this option to specify a yaml file for custom IO.
--preserve_io [PRESERVE_IO [PRESERVE_IO ...]]
Use this option to preserve IO layout and datatype. The different ways of
using this option are as follows:
--preserve_io layout <space separated list of names of inputs and
outputs of the graph>
--preserve_io datatype <space separated list of names of inputs and
outputs of the graph>
In this case, user should also specify the string - layout or datatype in
the command to indicate that converter needs to
preserve the layout or datatype. e.g.
--preserve_io layout input1 input2 output1
--preserve_io datatype input1 input2 output1
Optionally, the user may choose to preserve the layout and/or datatype for
all the inputs and outputs of the graph.
This can be done in the following two ways:
--preserve_io layout
--preserve_io datatype
Additionally, the user may choose to preserve both layout and datatypes for
all IO tensors by just passing the option as follows:
--preserve_io
Note: Only one of the above usages are allowed at a time.
Note: --custom_io gets higher precedence than --preserve_io.
--dump_relay DUMP_RELAY
Dump Relay ASM and Params at the path provided with the argument
Usage: --dump_relay <path_to_dump>
--dry_run Evaluates the model without actually converting any ops, and
returns unsupported ops if any.
--dump_out_names Dump output names mapped from QNN CPP stored names to converter used
names and save to file 'model_output_names.json'.
--pytorch_custom_op_lib PYTORCH_CUSTOM_OP_LIB, -pcl PYTORCH_CUSTOM_OP_LIB
Absolute path to the PyTorch library containing the custom op definition.
Multiple custom op libraries must be comma-separated.
For PyTorch custom op details, refer to:
https://pytorch.org/tutorials/advanced/torch_script_custom_ops.html
For custom C++ extension details, refer to:
https://pytorch.org/tutorials/advanced/cpp_extension.html
Eg. 1: --pytorch_custom_op_lib absolute_path_to/Example.so
Eg. 2: -pcl absolute_path_to/Example1.so,absolute_path_to/Example2.so
--disable_batchnorm_folding
--expand_lstm_op_structure
Enables optimization that breaks the LSTM op to equivalent math ops
--keep_disconnected_nodes
Disable Optimization that removes Ops not connected to the main graph.
This optimization uses output names provided over commandline OR
inputs/outputs extracted from the Source model to determine the main graph
--debug [DEBUG] Run the converter in debug mode.
-o OUTPUT_PATH, --output_path OUTPUT_PATH
Path where the converted Output model should be saved.If not specified, the
converter model will be written to a file with same name as the input model
--copyright_file COPYRIGHT_FILE
Path to copyright file. If provided, the content of the file will be added
to the output model.
--float_bitwidth FLOAT_BITWIDTH
Selects the bitwidth to use when using float for parameters (weights/bias)
and activations for all ops or a specific op (via encodings) selected
through encoding; 32 (default) or 16.
--float_bw FLOAT_BW Deprecated; use --float_bitwidth.
--float_bias_bw FLOAT_BIAS_BW
Deprecated; use --float_bias_bitwidth.
--overwrite_model_prefix
If option passed, model generator will use the output path name to use as
model prefix to name functions in <qnn_model_name>.cpp. (Useful for running
multiple models at once) eg: ModelName_composeGraphs. Default is to use
generic "QnnModel_".
--exclude_named_tensors
Remove using source framework tensorNames; instead use a counter for naming
tensors. Note: This can potentially help to reduce the final model library
that will be generated(Recommended for deploying model). Default is False.
-h, --help show this help message and exit
Quantizer Options:
--quantization_overrides QUANTIZATION_OVERRIDES
Use this option to specify a json file with parameters to use for
quantization. These will override any quantization data carried from
conversion (eg TF fake quantization) or calculated during the normal
quantization process. Format defined as per AIMET specification.
--keep_quant_nodes Use this option to keep activation quantization nodes in the graph rather
than stripping them.
--input_list INPUT_LIST
Path to a file specifying the input data. This file should be a plain text
file, containing one or more absolute file paths per line. Each path is
expected to point to a binary file containing one input in the "raw" format,
ready to be consumed by the quantizer without any further preprocessing.
Multiple files per line separated by spaces indicate multiple inputs to the
network. See documentation for more details. Must be specified for
quantization. All subsequent quantization options are ignored when this is
not provided.
--param_quantizer PARAM_QUANTIZER
Optional parameter to indicate the weight/bias quantizer to use. Must be followed by one of the following options:
"tf": Uses the real min/max of the data and specified bitwidth (default).
"enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
"adjusted": Deprecated.
"symmetric": Ensures min and max have the same absolute values about zero.
Data will be stored as int#_t data such that the offset is always 0.
--act_quantizer ACT_QUANTIZER
Optional parameter to indicate the activation quantizer to use. Must be followed by one of the following options:
"tf": Uses the real min/max of the data and specified bitwidth (default).
"enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
"adjusted": Deprecated.
"symmetric": Ensures min and max have the same absolute values about zero.
Data will be stored as int#_t data such that the offset is always 0.
--algorithms ALGORITHMS [ALGORITHMS ...]
Use this option to enable new optimization algorithms. Usage is:
--algorithms <algo_name1> ... The available optimization algorithms are:
"cle" - Cross layer equalization includes a number of methods for equalizing
weights and biases across layers in order to rectify imbalances that cause
quantization errors.
--bias_bitwidth BIAS_BITWIDTH
Selects the bitwidth to use when quantizing the biases; 8 (default) or 32.
--bias_bw BIAS_BW Deprecated; use --bias_bitwidth.
--act_bitwidth ACT_BITWIDTH
Selects the bitwidth to use when quantizing the activations; 8 (default) or 16.
--act_bw ACT_BW Deprecated; use --act_bitwidth.
--weights_bitwidth WEIGHTS_BITWIDTH
Selects the bitwidth to use when quantizing the weights; 4 or 8 (default).
--weight_bw WEIGHT_BW
Deprecated; use --weights_bitwidth.
--float_bias_bitwidth FLOAT_BIAS_BITWIDTH
Selects the bitwidth to use when biases are in float; 32 or 16.
--ignore_encodings Use only quantizer generated encodings, ignoring any user or model provided
encodings.
Note: Cannot use --ignore_encodings with --quantization_overrides
--use_per_channel_quantization
Enables per-channel quantization for convolution-based op weights.
This replaces the built-in model QAT encodings when used for a given weight.
--use_per_row_quantization
Enables row wise quantization of Matmul and FullyConnected ops.
--enable_per_row_quantized_bias
Enables row wise quantization of bias for FullyConnected op, when weights are per-row quantized.
--float_fallback Enables fallback to floating point (FP) instead of fixed point.
This option can be paired with --float_bitwidth to indicate the bitwidth for FP (by default 32).
If this option is enabled, --input_list must not be provided and --ignore_encodings must not be provided.
The external quantization encodings (encoding file/FakeQuant encodings) might be missing
quantization parameters for some interim tensors. First it will try to fill the gaps by
propagating across math-invariant functions. If the quantization params are still missing,
it applies fallback to nodes to floating point.
--use_native_input_files
Boolean flag to indicate how to read input files:
1. float (default): reads inputs as floats and quantizes if necessary based
on quantization parameters in the model.
2. native: reads inputs assuming the data type to be native to the
model. For ex., uint8_t.
--use_native_dtype Note: This option is deprecated, use --use_native_input_files option in
future.
Boolean flag to indicate how to read input files:
1. float (default): reads inputs as floats and quantizes if necessary based
on quantization parameters in the model.
2. native: reads inputs assuming the data type to be native to the
model. For ex., uint8_t.
--use_native_output_files
Use this option to indicate the data type of the output files
1. float (default): output the file as floats.
2. native: outputs the file that is native to the model. For ex.,
uint8_t.
--disable_relu_squashing
Disables squashing of ReLU against convolution-based ops for quantized models.
--restrict_quantization_steps ENCODING_MIN, ENCODING_MAX
Specifies the number of steps to use for computing quantization encodings
such that scale = (max - min) / number of quantization steps.
The option should be passed as a space separated pair of hexadecimal string
minimum and maximum values. i.e. --restrict_quantization_steps "MIN MAX".
Please note that this is a hexadecimal string literal and not a signed
integer, to supply a negative value an explicit minus sign is required.
E.g.--restrict_quantization_steps "-0x80 0x7F" indicates an example 8 bit range,
--restrict_quantization_steps "-0x8000 0x7F7F" indicates an example 16
bit range.
Custom Op Package Options:
--op_package_lib OP_PACKAGE_LIB, -opl OP_PACKAGE_LIB
Use this argument to pass an op package library for quantization. Must be in
the form <op_package_lib_path:interfaceProviderName> and be separated by a
comma for multiple package libs
--converter_op_package_lib CONVERTER_OP_PACKAGE_LIB, -cpl CONVERTER_OP_PACKAGE_LIB
Absolute path to converter op package library compiled by the OpPackage
generator. Must be separated by a comma for multiple package libraries.
Note: Libraries must follow the same order as the xml files.
E.g.1: --converter_op_package_lib absolute_path_to/libExample.so
E.g.2: -cpl absolute_path_to/libExample1.so,absolute_path_to/libExample2.so
-p PACKAGE_NAME, --package_name PACKAGE_NAME
A global package name to be used for each node in the Model.cpp file.
Defaults to Qnn header defined package name
--op_package_config CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...], -opc CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]
Path to a Qnn Op Package XML configuration file that contains user defined
custom operations.
Architecture Checker Options(Experimental):
--arch_checker Note: This option will be soon deprecated. Use the qnn-architecture-checker tool to achieve the same result.
Note: Only one of: {‘package_name’, ‘op_package_config’} can be specified
Basic command line usage looks like:
$ qnn-pytorch-converter -i <path>/model.pt
-d <network_input_name> <dims>
-o <optional_output_path>
-p <optional_package_name> # Defaults to "qti.aisw"
qnn-onnx-converter¶
The qnn-onnx-converter tool converts a model from the ONNX framework to a CPP file representing the model as a series of QNN API calls. Additionally, a binary file containing static weights of the model is produced.
usage: qnn-onnx-converter [--out_node OUT_NAMES] [--input_type INPUT_NAME INPUT_TYPE]
[--input_dtype INPUT_NAME INPUT_DTYPE] [--input_encoding [ ...]]
[--input_layout INPUT_NAME INPUT_LAYOUT] [--custom_io CUSTOM_IO]
[--preserve_io [PRESERVE_IO ...]]
[--dump_qairt_io_config_yaml [DUMP_QAIRT_IO_CONFIG_YAML]]
[--enable_framework_trace] [--dry_run [DRY_RUN]] [-d INPUT_NAME INPUT_DIM]
[-n] [-b BATCH] [-s SYMBOL_NAME VALUE]
[--dump_custom_io_config_template DUMP_CUSTOM_IO_CONFIG_TEMPLATE]
[--quantization_overrides QUANTIZATION_OVERRIDES] [--keep_quant_nodes]
[--disable_batchnorm_folding] [--expand_lstm_op_structure]
[--keep_disconnected_nodes] [--preserve_onnx_output_order]
[--apply_masked_softmax {compressed,uncompressed}]
[--packed_masked_softmax_inputs PACKED_MASKED_SOFTMAX_INPUTS [PACKED_MASKED_SOFTMAX_INPUTS ...]]
[--packed_max_seq PACKED_MAX_SEQ] [--input_list INPUT_LIST]
[--param_quantizer PARAM_QUANTIZER] [--act_quantizer ACT_QUANTIZER]
[--algorithms ALGORITHMS [ALGORITHMS ...]] [--bias_bitwidth BIAS_BITWIDTH]
[--bias_bw BIAS_BITWIDTH] [--act_bitwidth ACT_BITWIDTH]
[--act_bw ACT_BITWIDTH] [--weights_bitwidth WEIGHTS_BITWIDTH]
[--weight_bw WEIGHTS_BITWIDTH] [--ignore_encodings]
[--use_per_channel_quantization] [--use_per_row_quantization]
[--enable_per_row_quantized_bias] [--float_fallback]
[--use_native_input_files] [--use_native_dtype]
[--use_native_output_files] [--disable_relu_squashing]
[--restrict_quantization_steps ENCODING_MIN, ENCODING_MAX]
[--pack_4_bit_weights] [--keep_weights_quantized]
[--act_quantizer_calibration ACT_QUANTIZER_CALIBRATION]
[--param_quantizer_calibration PARAM_QUANTIZER_CALIBRATION]
[--act_quantizer_schema ACT_QUANTIZER_SCHEMA]
[--param_quantizer_schema PARAM_QUANTIZER_SCHEMA]
[--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE]
[--dump_qairt_quantizer_command DUMP_QAIRT_QUANTIZER_COMMAND]
[--quantizer_log QUANTIZER_LOG]
[--quantizer_log_level {LogLevel.NONE,LogLevel.TRACE,LogLevel.INFO}]
--input_network INPUT_NETWORK [--debug [DEBUG]] [-o OUTPUT_PATH]
[--copyright_file COPYRIGHT_FILE] [--float_bitwidth FLOAT_BITWIDTH]
[--float_bw FLOAT_BW] [--float_bias_bitwidth FLOAT_BIAS_BITWIDTH]
[--float_bias_bw FLOAT_BIAS_BW] [--overwrite_model_prefix]
[--exclude_named_tensors] [--model_version MODEL_VERSION]
[--op_package_lib OP_PACKAGE_LIB]
[--converter_op_package_lib CONVERTER_OP_PACKAGE_LIB]
[-p PACKAGE_NAME | --op_package_config CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]]
[--arch_checker] [-h] [--validate_models]
Script to convert ONNX model into QNN
required arguments:
--input_network INPUT_NETWORK, -i INPUT_NETWORK
Path to the source framework model.
optional arguments:
--out_node OUT_NAMES, --out_name OUT_NAMES
Name of the graph's output Tensor Names. Multiple output names should be
provided separately like:
--out_name out_1 --out_name out_2
--input_type INPUT_NAME INPUT_TYPE, -t INPUT_NAME INPUT_TYPE
Type of data expected by each input op/layer. Type for each input is
|default| if not specified. For example: "data" image.Note that the quotes
should always be included in order to handle special characters, spaces,etc.
For multiple inputs specify multiple --input_type on the command line.
Eg:
--input_type "data1" image --input_type "data2" opaque
These options get used by DSP runtime and following descriptions state how
input will be handled for each option.
Image:
Input is float between 0-255 and the input's mean is 0.0f and the input's
max is 255.0f. We will cast the float to uint8ts and pass the uint8ts to the
DSP.
Default:
Pass the input as floats to the dsp directly and the DSP will quantize it.
Opaque:
Assumes input is float because the consumer layer(i.e next layer) requires
it as float, therefore it won't be quantized.
Choices supported:
image
default
opaque
--input_dtype INPUT_NAME INPUT_DTYPE
The names and datatype of the network input layers specified in the format
[input_name datatype], for example:
'data' 'float32'
Default is float32 if not specified
Note that the quotes should always be included in order to handlespecial
characters, spaces, etc.
For multiple inputs specify multiple --input_dtype on the command line like:
--input_dtype 'data1' 'float32' --input_dtype 'data2' 'float32'
--input_encoding INPUT_ENCODING [INPUT_ENCODING ...], -e INPUT_ENCODING [INPUT_ENCODING ...]
Usage: --input_encoding "INPUT_NAME" INPUT_ENCODING_IN
[INPUT_ENCODING_OUT]
Input encoding of the network inputs. Default is bgr.
e.g.
--input_encoding "data" rgba
Quotes must wrap the input node name to handle special characters,
spaces, etc. To specify encodings for multiple inputs, invoke
--input_encoding for each one.
e.g.
--input_encoding "data1" rgba --input_encoding "data2" other
Optionally, an output encoding may be specified for an input node by
providing a second encoding. The default output encoding is bgr.
e.g.
--input_encoding "data3" rgba rgb
Input encoding types:
image color encodings: bgr,rgb, nv21, nv12, ...
time_series: for inputs of rnn models;
other: not available above or is unknown.
Supported encodings:
bgr
rgb
rgba
argb32
nv21
nv12
time_series
other
--input_layout INPUT_NAME INPUT_LAYOUT, -l INPUT_NAME INPUT_LAYOUT
Layout of each input tensor. If not specified, it will use the default
based on the Source Framework, shape of input and input encoding.
Accepted values are-
NCDHW, NDHWC, NCHW, NHWC, HWIO, OIHW, NFC, NCF, NTF, TNF, NF, NC, F,
NONTRIVIAL
N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature, T =
Time
NDHWC/NCDHW used for 5d inputs
NHWC/NCHW used for 4d image-like inputs
NFC/NCF used for inputs to Conv1D or other 1D ops
NTF/TNF used for inputs with time steps like the ones used for LSTM op
NF used for 2D inputs, like the inputs to Dense/FullyConnected layers
NC used for 2D inputs with 1 for batch and other for Channels (rarely used)
F used for 1D inputs, e.g. Bias tensor
NONTRIVIAL for everything elseFor multiple inputs specify multiple
--input_layout on the command line.
Eg:
--input_layout "data1" NCHW --input_layout "data2" NCHW
--custom_io CUSTOM_IO
Use this option to specify a yaml file for custom IO.
--preserve_io [PRESERVE_IO ...]
Use this option to preserve IO layout and datatype. The different ways of
using this option are as follows:
--preserve_io layout <space separated list of names of inputs and
outputs of the graph>
--preserve_io datatype <space separated list of names of inputs and
outputs of the graph>
In this case, user should also specify the string - layout or datatype in
the command to indicate that converter needs to
preserve the layout or datatype. e.g.
--preserve_io layout input1 input2 output1
--preserve_io datatype input1 input2 output1
Optionally, the user may choose to preserve the layout and/or datatype for
all the inputs and outputs of the graph.
This can be done in the following two ways:
--preserve_io layout
--preserve_io datatype
Additionally, the user may choose to preserve both layout and datatypes for
all IO tensors by just passing the option as follows:
--preserve_io
Note: Only one of the above usages are allowed at a time.
Note: --custom_io gets higher precedence than --preserve_io.
--dump_qairt_io_config_yaml [DUMP_QAIRT_IO_CONFIG_YAML]
Use this option to dump a yaml file which contains the equivalent I/O
configurations of QAIRT Converter along with the QAIRT Converter Command and
can be passed to QAIRT Converter using the option --io_config.
--enable_framework_trace
Use this option to enable converter to trace the op/tensor change
information.
Currently framework op trace is supported only for ONNX converter.
--dry_run [DRY_RUN] Evaluates the model without actually converting any ops, and returns
unsupported ops/attributes as well as unused inputs and/or outputs if any.
Leave empty or specify "info" to see dry run as a table, or specify "debug"
to show more detailed messages only"
-d INPUT_NAME INPUT_DIM, --input_dim INPUT_NAME INPUT_DIM
The name and dimension of all the input buffers to the network specified in
the format [input_name comma-separated-dimensions],
for example: 'data' 1,224,224,3.
Note that the quotes should always be included in order to handle special
characters, spaces, etc.
For scalar inputs, use a single dimension `0` to indicate that the input is a scalar value.
For multiple inputs specify multiple --input_dim on the command line like:
--input_dim 'data1' 1,224,224,3 --input_dim 'data2' 0
NOTE: This feature works only with Onnx 1.6.0 and above
-n, --no_simplification
Do not attempt to simplify the model automatically. This may prevent some
models from properly converting
when sequences of unsupported static operations are present.
-b BATCH, --batch BATCH
The batch dimension override. This will take the first dimension of all
inputs and treat it as a batch dim, overriding it with the value provided
here. For example:
--batch 6
will result in a shape change from [1,3,224,224] to [6,3,224,224].
If there are inputs without batch dim this should not be used and each input
should be overridden independently using -d option for input dimension
overrides.
-s SYMBOL_NAME VALUE, --define_symbol SYMBOL_NAME VALUE
This option allows overriding specific input dimension symbols. For instance
you might see input shapes specified with variables such as :
data: [1,3,height,width]
To override these simply pass the option as:
--define_symbol height 224 --define_symbol width 448
which results in dimensions that look like:
data: [1,3,224,448]
--dump_custom_io_config_template DUMP_CUSTOM_IO_CONFIG_TEMPLATE
Dumps the yaml template for Custom I/O configuration. This file canbe edited
as per the custom requirements and passed using the option --custom_ioUse
this option to specify a yaml file to which the custom IO config template is
dumped.
--disable_batchnorm_folding
--expand_lstm_op_structure
Enables optimization that breaks the LSTM op to equivalent math ops
--keep_disconnected_nodes
Disable Optimization that removes Ops not connected to the main graph.
This optimization uses output names provided over commandline OR
inputs/outputs extracted from the Source model to determine the main graph
--preserve_onnx_output_order
Preserve the ONNX output order in the converted graph. Note: This may
slightly impact performance.
--debug [DEBUG] Run the converter in debug mode.
-o OUTPUT_PATH, --output_path OUTPUT_PATH
Path where the converted Output model should be saved.If not specified, the
converter model will be written to a file with same name as the input model
--copyright_file COPYRIGHT_FILE
Path to copyright file. If provided, the content of the file will be added
to the output model.
--float_bitwidth FLOAT_BITWIDTH
Use the --float_bitwidth option to convert the graph to the specified float
bitwidth, either 32 (default) or 16.
--float_bw FLOAT_BW Note: --float_bw is deprecated, use --float_bitwidth.
--float_bias_bitwidth FLOAT_BIAS_BITWIDTH
Use the --float_bias_bitwidth option to select the bitwidth to use for float
bias tensor
--float_bias_bw FLOAT_BIAS_BW
Note: --float_bias_bw is deprecated, use --float_bias_bitwidth.
--overwrite_model_prefix
If option passed, model generator will use the output path name to use as
model prefix to name functions in <qnn_model_name>.cpp. (Useful for running
multiple models at once) eg: ModelName_composeGraphs. Default is to use
generic "QnnModel_".
--exclude_named_tensors
Remove using source framework tensorNames; instead use a counter for naming
tensors. Note: This can potentially help to reduce the final model library
that will be generated(Recommended for deploying model). Default is False.
--model_version MODEL_VERSION
User-defined ASCII string to identify the model, only first 64 bytes will be
stored
-h, --help show this help message and exit
--validate_models Validate the original onnx model against optimized onnx model.
Constant inputs with all value 1s will be generated and will be used
by both models and their outputs are checked against each other.
The {'option_strings': ['--validate_models'], 'dest': 'validate_models',
'nargs': 0, 'const': True, 'default': False, 'type': None, 'choices': None,
'required': False, 'help': 'Validate the original onnx model against
optimized onnx model.\nConstant inputs with all value 1s will be generated
and will be used \nby both models and their outputs are checked against each
other.\nThe % average error and 90th percentile of output differences will
be calculated for this.\nNote: Usage of this flag will incur extra time due
to inference of the models.', 'metavar': None, 'container':
<argparse._ArgumentGroup object at 0x7f08ba7c5ae0>, 'prog': 'qnn-onnx-
converter'}verage error and 90th percentile of output differences will be
calculated for this.
Note: Usage of this flag will incur extra time due to inference of the
models.
Custom Op Package Options:
--op_package_lib OP_PACKAGE_LIB, -opl OP_PACKAGE_LIB
Use this argument to pass an op package library for quantization. Must be in
the form <op_package_lib_path:interfaceProviderName> and be separated by a
comma for multiple package libs
--converter_op_package_lib CONVERTER_OP_PACKAGE_LIB, -cpl CONVERTER_OP_PACKAGE_LIB
Absolute path to converter op package library compiled by the OpPackage
generator. Must be separated by a comma for multiple package libraries.
Note: Order of converter op package libraries must follow the order of xmls.
Ex1: --converter_op_package_lib absolute_path_to/libExample.so
Ex2: -cpl absolute_path_to/libExample1.so,absolute_path_to/libExample2.so
-p PACKAGE_NAME, --package_name PACKAGE_NAME
A global package name to be used for each node in the Model.cpp file.
Defaults to Qnn header defined package name
--op_package_config CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...], -opc CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]
Path to a Qnn Op Package XML configuration file that contains user defined
custom operations.
Quantizer Options:
--quantization_overrides QUANTIZATION_OVERRIDES
Use this option to specify a json file with parameters to use for
quantization. These will override any quantization data carried from
conversion (eg TF fake quantization) or calculated during the normal
quantization process. Format defined as per AIMET specification.
--keep_quant_nodes Use this option to keep activation quantization nodes in the graph rather
than stripping them.
--input_list INPUT_LIST
Path to a file specifying the input data. This file should be a plain text
file, containing one or more absolute file paths per line. Each path is
expected to point to a binary file containing one input in the "raw" format,
ready to be consumed by the quantizer without any further preprocessing.
Multiple files per line separated by spaces indicate multiple inputs to the
network. See documentation for more details. Must be specified for
quantization. All subsequent quantization options are ignored when this is
not provided.
--param_quantizer PARAM_QUANTIZER
Optional parameter to indicate the weight/bias quantizer to use. Must be
followed by one of the following options:
"tf": Uses the real min/max of the data and specified bitwidth (default).
"enhanced": Uses an algorithm useful for quantizing models with long tails
present in the weight distribution.
"adjusted": Note: "adjusted" mode is deprecated.
"symmetric": Ensures min and max have the same absolute values about zero.
Data will be stored as int#_t data such that the offset is always 0.Note:
Legacy option --param_quantizer will be deprecated, use
--param_quantizer_calibration instead
--act_quantizer ACT_QUANTIZER
Optional parameter to indicate the activation quantizer to use. Must be
followed by one of the following options:
"tf": Uses the real min/max of the data and specified bitwidth (default).
"enhanced": Uses an algorithm useful for quantizing models with long tails
present in the weight distribution.
"adjusted": Note: "adjusted" mode is deprecated.
"symmetric": Ensures min and max have the same absolute values about zero.
Data will be stored as int#_t data such that the offset is always 0.Note:
Legacy option --act_quantizer will be deprecated, use
--act_quantizer_calibration instead
--algorithms ALGORITHMS [ALGORITHMS ...]
Use this option to enable new optimization algorithms. Usage is:
--algorithms <algo_name1> ... The available optimization algorithms are:
"cle" - Cross layer equalization includes a number of methods for equalizing
weights and biases across layers in order to rectify imbalances that cause
quantization errors.
--bias_bitwidth BIAS_BITWIDTH
Use the --bias_bitwidth option to select the bitwidth to use when quantizing
the biases, either 8 (default) or 32.
--bias_bw BIAS_BITWIDTH
Note: --bias_bw is deprecated, use --bias_bitwidth.
--act_bitwidth ACT_BITWIDTH
Use the --act_bitwidth option to select the bitwidth to use when quantizing
the activations, either 8 (default) or 16.
--act_bw ACT_BITWIDTH
Note: --act_bw is deprecated, use --act_bitwidth.
--weights_bitwidth WEIGHTS_BITWIDTH
Use the --weights_bitwidth option to select the bitwidth to use when
quantizing the weights, either 4 or 8 (default).
--weight_bw WEIGHTS_BITWIDTH
Note: --weight_bw is deprecated, use --weights_bitwidth.
--ignore_encodings Use only quantizer generated encodings, ignoring any user or model provided
encodings.
Note: Cannot use --ignore_encodings with --quantization_overrides
--use_per_channel_quantization
Use this option to enable per-channel quantization for convolution-based op
weights.
Note: This will replace built-in model QAT encodings when used for a given
weight.
--use_per_row_quantization
Use this option to enable rowwise quantization of Matmul and FullyConnected
ops.
--enable_per_row_quantized_bias
Use this option to enable rowwise quantization of bias for FullyConnected
op, when weights are per-row quantized.
--float_fallback Use this option to enable fallback to floating point (FP) instead of fixed
point.
This option can be paired with --float_bitwidth to indicate the bitwidth for
FP (by default 32).
If this option is enabled, then input list must not be provided and
--ignore_encodings must not be provided.
The external quantization encodings (encoding file/FakeQuant encodings)
might be missing quantization parameters for some interim tensors.
First it will try to fill the gaps by propagating across math-invariant
functions. If the quantization params are still missing,
then it will apply fallback to nodes to floating point.
--use_native_input_files
Boolean flag to indicate how to read input files:
1. float (default): reads inputs as floats and quantizes if necessary based
on quantization parameters in the model.
2. native: reads inputs assuming the data type to be native to the
model. For ex., uint8_t.
--use_native_dtype Note: This option is deprecated, use --use_native_input_files option in
future.
Boolean flag to indicate how to read input files:
1. float (default): reads inputs as floats and quantizes if necessary based
on quantization parameters in the model.
2. native: reads inputs assuming the data type to be native to the
model. For ex., uint8_t.
--use_native_output_files
Use this option to indicate the data type of the output files
1. float (default): output the file as floats.
2. native: outputs the file that is native to the model. For ex.,
uint8_t.
--disable_relu_squashing
Disables squashing of Relu against Convolution based ops for quantized
models
--restrict_quantization_steps ENCODING_MIN, ENCODING_MAX
Specifies the number of steps to use for computing quantization encodings
such that scale = (max - min) / number of quantization steps.
The option should be passed as a space separated pair of hexadecimal string
minimum and maximum valuesi.e. --restrict_quantization_steps "MIN MAX".
Please note that this is a hexadecimal string literal and not a signed
integer, to supply a negative value an explicit minus sign is required.
E.g.--restrict_quantization_steps "-0x80 0x7F" indicates an example 8 bit
range,
--restrict_quantization_steps "-0x8000 0x7F7F" indicates an example 16
bit range.
This argument is required for 16-bit Matmul operations.
--pack_4_bit_weights Store 4-bit quantized weights in packed format in a single byte i.e. two
4-bit quantized tensors can be stored in one byte
--keep_weights_quantized
Use this option to keep the weights quantized even when the output of the op
is in floating point. Bias will be converted to floating point as per the
output of the op. Required to enable wFxp_actFP configurations according to
the provided bitwidth for weights and activations
Note: These modes are not supported by all runtimes. Please check
corresponding Backend OpDef supplement if these are supported
--act_quantizer_calibration ACT_QUANTIZER_CALIBRATION
Specify which quantization calibration method to use for activations
supported values: min-max (default), sqnr, entropy, mse, percentile
This option can be paired with --act_quantizer_schema to override the
quantization
schema to use for activations otherwise default schema(asymmetric) will be
used
--param_quantizer_calibration PARAM_QUANTIZER_CALIBRATION
Specify which quantization calibration method to use for parameters
supported values: min-max (default), sqnr, entropy, mse, percentile
This option can be paired with --param_quantizer_schema to override the
quantization
schema to use for parameters otherwise default schema(asymmetric) will be
used
--act_quantizer_schema ACT_QUANTIZER_SCHEMA
Specify which quantization schema to use for activations
supported values: asymmetric (default), symmetric, unsignedsymmetric
This option cannot be used with legacy quantizer option --act_quantizer
--param_quantizer_schema PARAM_QUANTIZER_SCHEMA
Specify which quantization schema to use for parameters
supported values: asymmetric (default), symmetric, unsignedsymmetric
This option cannot be used with legacy quantizer option --param_quantizer
--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE
Specify the percentile value to be used with Percentile calibration method
The specified float value must lie within 90 and 100, default: 99.99
--dump_qairt_quantizer_command DUMP_QAIRT_QUANTIZER_COMMAND
Use this option to dump a file which contains the equivalent Commandline
input for QAIRT Quantizer
--quantizer_log QUANTIZER_LOG
Enable logging in quantizer v2, logging to the file <QUANTIZER_LOG>.
E.g., --quantizer_log my_model_name.csv will produce the file
my_model_name.csv. See --quantizer_log_level.
--quantizer_log_level {LogLevel.NONE,LogLevel.TRACE,LogLevel.INFO}
Sets the logging level in quantizer v2.
INFO: Emits a file in the CSV format. Requires --quantizer_log
<file_name.csv> to be set. Warnings and errors are emitted to the console.
TRACE: Emits a file in the TXT format. Requires --quantizer_log
<file_name.txt> to be set. Warnings and errors are emitted to the console.
NONE: Default value. No file is emitted. Warnings and errors are emitted to
the console.
Masked Softmax Optimization Options:
--apply_masked_softmax {compressed,uncompressed}
This flag enables the pass that creates a MaskedSoftmax Op and
rewrites the graph to include this Op. MaskedSoftmax Op may not
be supported by all the QNN backends. Please check the
supplemental backend XML for the targeted backend.
This argument takes a string parameter input that selects
the mode of MaskedSoftmax Op.
'compressed' value rewrites the graph with the compressed version of
MaskedSoftmax Op.
'uncompressed' value rewrites the graph with the uncompressed version of
MaskedSoftmax Op.
--packed_masked_softmax_inputs PACKED_MASKED_SOFTMAX_INPUTS [PACKED_MASKED_SOFTMAX_INPUTS ...]
Mention the input ids tensor name which will be packed in the single
inference.
This is applicable only for Compressed MaskedSoftmax Op.
This will create a new input to the graph named 'position_ids'
with same shape as the provided input name in this flag.
During runtime, this input shall be provided with the token
locations for individual sequences so that the same will be
internally passed to positional embedding layer.
E.g. If 2 sequences of length 20 and 30 are packed together
in single batch of 64 tokens then this new input 'position_ids' should have
value [0, 1, ..., 19, 0, 1, ..., 29, 0, 0, 0, ..., 0]
Usage: --packed_masked_softmax input_ids
Packed model will enable the user to pack multiple sequences into
single batch of inference.
--packed_max_seq PACKED_MAX_SEQ
Number of sequences packed in the single input ids and
single attention mask inputs. Applicable only for
Compressed MaskedSoftmax Op.
Architecture Checker Options(Experimental):
--arch_checker Pass this option to enable architecture checker tool.
This is an experimental option for models that are intended to run on HTP
backend.
Note: Only one of: {'op_package_config', 'package_name'} can be specified Note: Only one of:
{'op_package_config', 'package_name'} can be specified
qairt-converter¶
The qairt-converter tool converts a model from the one of Onnx/TensorFlow/TFLite/PyTorch framework to a DLC file representing the QNN graph format that can enable inference on Qualcomm AI IP/HW. The converter auto detects the framework based on the source model extension.
Basic command line usage looks like:
usage: qairt-converter [--source_model_input_shape INPUT_NAME INPUT_DIM]
[--out_tensor_node OUT_NAMES]
[--source_model_input_datatype INPUT_NAME INPUT_DTYPE]
[--source_model_input_layout INPUT_NAME INPUT_LAYOUT]
[--desired_input_layout INPUT_NAME DESIRED_INPUT_LAYOUT]
[--source_model_output_layout OUTPUT_NAME OUTPUT_LAYOUT]
[--desired_output_layout OUTPUT_NAME DESIRED_OUTPUT_LAYOUT]
[--desired_input_color_encoding [ ...]]
[--preserve_io_datatype [PRESERVE_IO_DATATYPE ...]]
[--dump_config_template DUMP_IO_CONFIG_TEMPLATE] [--config IO_CONFIG]
[--dry_run [DRY_RUN]] [--enable_framework_trace] [--remove_unused_inputs]
[--gguf_config GGUF_CONFIG] [--quantizer_log QUANTIZER_LOG]
[--quantizer_log_level {LogLevel.NONE,LogLevel.TRACE,LogLevel.INFO}]
[--quantization_overrides QUANTIZATION_OVERRIDES]
[--lora_weight_list LORA_WEIGHT_LIST]
[--quant_updatable_mode {none,adapter_only,all}] [--onnx_skip_simplification]
[--onnx_override_batch BATCH] [--onnx_define_symbol SYMBOL_NAME VALUE]
[--onnx_validate_models] [--onnx_summary]
[--onnx_perform_sequence_construct_optimizer] [--tf_summary]
[--tf_override_batch BATCH] [--tf_disable_optimization]
[--tf_show_unconsumed_nodes] [--tf_saved_model_tag SAVED_MODEL_TAG]
[--tf_saved_model_signature_key SAVED_MODEL_SIGNATURE_KEY]
[--tf_validate_models] [--tflite_signature_name SIGNATURE_NAME]
[--dump_exported_onnx] --input_network INPUT_NETWORK [--debug [DEBUG]]
[--output_path OUTPUT_PATH] [--copyright_file COPYRIGHT_FILE]
[--float_bitwidth FLOAT_BITWIDTH] [--float_bias_bitwidth FLOAT_BIAS_BITWIDTH]
[--set_model_version MODEL_VERSION] [--export_format EXPORT_FORMAT]
[--converter_op_package_lib CONVERTER_OP_PACKAGE_LIB]
[--package_name PACKAGE_NAME | --op_package_config CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]]
[--target_backend BACKEND] [--target_soc_model SOC_MODEL] [-h]
required arguments:
--input_network INPUT_NETWORK, -i INPUT_NETWORK
Path to the source framework model.
optional arguments:
--source_model_input_shape INPUT_NAME INPUT_DIM, -s INPUT_NAME INPUT_DIM
The name and dimension of all the input buffers to the network specified in
the format [input_name comma-separated-dimensions],
for example: --source_model_input_shape 'data' 1,224,224,3.
Note that the quotes should always be included in order to handle special
characters, spaces, etc.
For scalar inputs, use a single dimension `0` to indicate that the input is
a scalar value. This representation is supported for ONNX models only.
For multiple inputs specify multiple --source_model_input_shape on the commandline like:
--source_model_input_shape 'data1' 1,224,224,3 --source_model_input_shape 'data2' 0
NOTE: Required for TensorFlow and PyTorch. Optional for Onnx and Tflite
In case of Onnx, this feature works only with Onnx 1.6.0 and above
--out_tensor_node OUT_NAMES, --out_tensor_name OUT_NAMES
Name of the graph's output Tensor Names. Multiple output names should be
provided separately like:
--out_tensor_name out_1 --out_tensor_name out_2
NOTE: Required for TensorFlow. Optional for Onnx, Tflite and PyTorch
--source_model_input_datatype INPUT_NAME INPUT_DTYPE
The names and datatype of the network input layers specified in the format
[input_name datatype], for example:
'data' 'float32'
Default is float32 if not specified
Note that the quotes should always be included in order to handlespecial
characters, spaces, etc.
For multiple inputs specify multiple --source_model_input_datatype on the
command line like:
--source_model_input_datatype 'data1' 'float32'
--source_model_input_datatype 'data2' 'float32'
--source_model_input_layout INPUT_NAME INPUT_LAYOUT
Layout of each input tensor. If not specified, it will use the default based
on the Source Framework, shape of input and input encoding.
Accepted values are-
NCDHW, NDHWC, NCHW, NHWC, HWIO, OIHW, NFC, NCF, NTF, TNF, NF, NC, F
N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature,
T = Time, I = Input, O = Output
NDHWC/NCDHW used for 5d inputs
NHWC/NCHW used for 4d image-like inputs
HWIO/IOHW used for Weights of Conv Ops
NFC/NCF used for inputs to Conv1D or other 1D ops
NTF/TNF used for inputs with time steps like the ones used for LSTM op
NF used for 2D inputs, like the inputs to Dense/FullyConnected layers
NC used for 2D inputs with 1 for batch and other for Channels (rarely used)
F used for 1D inputs, e.g. Bias tensor
For multiple inputs specify multiple --source_model_input_layout on the
command line.
Eg:
--source_model_input_layout "data1" NCHW --source_model_input_layout
"data2" NCHW
--desired_input_layout INPUT_NAME DESIRED_INPUT_LAYOUT
Desired Layout of each input tensor. If not specified, it will use the
default based on the Source Framework, shape of input and input encoding.
Accepted values are-
NCDHW, NDHWC, NCHW, NHWC, HWIO, OIHW, NFC, NCF, NTF, TNF, NF, NC, F
N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature,
T = Time, I = Input, O = Output
NDHWC/NCDHW used for 5d inputs
NHWC/NCHW used for 4d image-like inputs
HWIO/IOHW used for Weights of Conv Ops
NFC/NCF used for inputs to Conv1D or other 1D ops
NTF/TNF used for inputs with time steps like the ones used for LSTM op
NF used for 2D inputs, like the inputs to Dense/FullyConnected layers
NC used for 2D inputs with 1 for batch and other for Channels (rarely used)
F used for 1D inputs, e.g. Bias tensor
For multiple inputs specify multiple --desired_input_layout on the command
line.
Eg:
--desired_input_layout "data1" NCHW --desired_input_layout "data2" NCHW
--source_model_output_layout OUTPUT_NAME OUTPUT_LAYOUT
Layout of each output tensor. If not specified, it will use the default
based on the Source Framework, shape of input and input encoding.
Accepted values are-
NCDHW, NDHWC, NCHW, NHWC, HWIO, OIHW, NFC, NCF, NTF, TNF, NF, NC, F
N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature, T =
Time
NDHWC/NCDHW used for 5d inputs
NHWC/NCHW used for 4d image-like inputs
NFC/NCF used for inputs to Conv1D or other 1D ops
NTF/TNF used for inputs with time steps like the ones used for LSTM op
NF used for 2D inputs, like the inputs to Dense/FullyConnected layers
NC used for 2D inputs with 1 for batch and other for Channels (rarely used)
F used for 1D inputs, e.g. Bias tensor
For multiple inputs specify multiple --source_model_output_layout on the
command line.
Eg:
--source_model_output_layout "data1" NCHW --source_model_output_layout
"data2" NCHW
--desired_output_layout OUTPUT_NAME DESIRED_OUTPUT_LAYOUT
Desired Layout of each output tensor. If not specified, it will use the
default based on the Source Framework.
Accepted values are-
NCDHW, NDHWC, NCHW, NHWC, HWIO, OIHW, NFC, NCF, NTF, TNF, NF, NC, F
N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature, T =
Time
NDHWC/NCDHW used for 5d outputs
NHWC/NCHW used for 4d image-like outputs
NFC/NCF used for outputs to Conv1D or other 1D ops
NTF/TNF used for outputs with time steps like the ones used for LSTM op
NF used for 2D outputs, like the outputs to Dense/FullyConnected layers
NC used for 2D outputs with 1 for batch and other for Channels (rarely used)
F used for 1D outputs, e.g. Bias tensor
For multiple outputs specify multiple --desired_output_layout on the command
line.
Eg:
--desired_output_layout "data1" NCHW --desired_output_layout "data2"
NCHW
--desired_input_color_encoding [ ...], -e [ ...]
Usage: --input_color_encoding "INPUT_NAME" INPUT_ENCODING_IN
[INPUT_ENCODING_OUT]
Input encoding of the network inputs. Default is bgr.
e.g.
--input_color_encoding "data" rgba
Quotes must wrap the input node name to handle special characters,
spaces, etc. To specify encodings for multiple inputs, invoke
--input_color_encoding for each one.
e.g.
--input_color_encoding "data1" rgba --input_color_encoding "data2" other
Optionally, an output encoding may be specified for an input node by
providing a second encoding. The default output encoding is bgr.
e.g.
--input_color_encoding "data3" rgba rgb
Input encoding types:
image color encodings: bgr,rgb, nv21, nv12, ...
time_series: for inputs of rnn models;
other: not available above or is unknown.
Supported encodings:
bgr
rgb
rgba
argb32
nv21
nv12
--preserve_io_datatype [PRESERVE_IO_DATATYPE ...]
Use this option to preserve IO datatype. The different ways of using this
option are as follows:
--preserve_io_datatype <space separated list of names of inputs and
outputs of the graph>
e.g.
--preserve_io_datatype input1 input2 output1
The user may choose to preserve the datatype for all the inputs and outputs
of the graph.
--preserve_io_datatype
Note: --config gets higher precedence than --preserve_io_datatype.
--dump_config_template DUMP_IO_CONFIG_TEMPLATE
Dumps the yaml template for I/O configuration. This file can be edited as
per the custom requirements and passed using the option --configUse this
option to specify a yaml file to which the IO config template is dumped.
--config IO_CONFIG Use this option to specify a yaml file for input and output options.
--dry_run [DRY_RUN] Evaluates the model without actually converting any ops, and returns
unsupported ops/attributes as well as unused inputs and/or outputs if any.
--enable_framework_trace
Use this option to enable converter to trace the op/tensor change
information.
Currently framework op trace is supported only for ONNX converter.
--remove_unused_inputs
Use this option to remove the disconnected graph input nodes after the
conversion
--gguf_config GGUF_CONFIG
This is an optional argument that can be used when input network is a GGUF
File.It specifies the path to the config file for building GenAI model.(the
config.json file generated when saving the huggingface model)
--quantizer_log QUANTIZER_LOG
Valid for use with v2.0.0 JSON schema for quantization overrides or when
--use_quantize_v2 is provided. Enable logging in the quantizer, logging to
the file <QUANTIZER_LOG>.
E.g., --quantizer_log my_model_name.csv will produce the file
my_model_name.csv. See --quantizer_log_level.
--quantizer_log_level {LogLevel.NONE,LogLevel.TRACE,LogLevel.INFO}
Sets the logging level in the quantizer. See --quantizer_log.
INFO: Emits a file in the CSV format. Requires --quantizer_log
<file_name.csv> to be set. Warnings and errors are emitted to the console.
TRACE: Emits a file in the TXT format. Requires --quantizer_log
<file_name.txt> to be set. Warnings and errors are emitted to the console.
NONE: Default value. No file is emitted. Warnings and errors are emitted to
the console.
--debug [DEBUG] Run the converter in debug mode.
--output_path OUTPUT_PATH, -o OUTPUT_PATH
Path where the converted Output model should be saved.If not specified, the
converter model will be written to a file with same name as the input model
--copyright_file COPYRIGHT_FILE
Path to copyright file. If provided, the content of the file will be added
to the output model.
--float_bitwidth FLOAT_BITWIDTH
Use the --float_bitwidth option to convert the graph to the specified float
bitwidth, either 32 (default) or 16.
--float_bias_bitwidth FLOAT_BIAS_BITWIDTH
Use the --float_bias_bitwidth option to select the bitwidth to use for float
bias tensor, either 32 or 16 (default '0' if not provided).
--set_model_version MODEL_VERSION
User-defined ASCII string to identify the model, only first 64 bytes will be
stored
--export_format EXPORT_FORMAT
DLC_DEFAULT (default)
- Produce a Float graph given a Float Source graph
- Produce a Quant graph given a Source graph with provided Encodings
DLC_STRIP_QUANT
- Produce a Float Graph with discarding Quant data
-h, --help show this help message and exit
Custom Op Package Options:
--converter_op_package_lib CONVERTER_OP_PACKAGE_LIB, -cpl CONVERTER_OP_PACKAGE_LIB
Absolute path to converter op package library compiled by the OpPackage
generator. Must be separated by a comma for multiple package libraries.
Note: Order of converter op package libraries must follow the order of xmls.
Ex1: --converter_op_package_lib absolute_path_to/libExample.so
Ex2: -cpl absolute_path_to/libExample1.so,absolute_path_to/libExample2.so
--package_name PACKAGE_NAME, -p PACKAGE_NAME
A global package name to be used for each node in the Model.cpp file.
Defaults to Qnn header defined package name
--op_package_config CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...], -opc CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]
Path to a Qnn Op Package XML configuration file that contains user defined
custom operations.
Quantizer Options:
--quantization_overrides QUANTIZATION_OVERRIDES, -q QUANTIZATION_OVERRIDES
Use this option to specify a json file with parameters to use for
quantization. These will override any quantization data carried from
conversion (eg TF fake quantization) or calculated during the normal
quantization process. Format defined as per AIMET specification.
LoRA Converter Options:
--lora_weight_list LORA_WEIGHT_LIST
Path to a file specifying a list of tensor names that should be updateable.
--quant_updatable_mode {none,adapter_only,all}
Specify whether/for which tensors the quantization encodings change across
use-cases. In none mode, no quantization encodings are updatable. In
adapter_only mode quantization encodings for only lora/adapter branch
(Conv->Mul->Conv) change across use-case, the base branch quantization
encodings remain the same. In all mode, all quantization encodings are
updatable.
Onnx Converter Options:
--onnx_skip_simplification, -oss
Do not attempt to simplify the model automatically. This may prevent some
models from
properly converting when sequences of unsupported static operations are
present.
--onnx_override_batch BATCH
The batch dimension override. This will take the first dimension of all
inputs and treat it as a batch dim, overriding it with the value provided
here. For example:
--onnx_override_batch 6
will result in a shape change from [1,3,224,224] to [6,3,224,224].
If there are inputs without batch dim this should not be used and each input
should be overridden independently using -s option for input dimension
overrides.
--onnx_define_symbol SYMBOL_NAME VALUE
This option allows overriding specific input dimension symbols. For instance
you might see input shapes specified with variables such as :
data: [1,3,height,width]
To override these simply pass the option as:
--onnx_define_symbol height 224 --onnx_define_symbol width 448
which results in dimensions that look like:
data: [1,3,224,448]
--onnx_validate_models
Validate the original ONNX model against optimized ONNX model.
Constant inputs with all value 1s will be generated and will be used
by both models and their outputs are checked against each other.
The % average error and 90th percentile of output differences will be
calculated for this.
Note: Usage of this flag will incur extra time due to inference of the
models.
--onnx_summary Summarize the original onnx model and optimized onnx model.
Summary will print the model information such as number of parameters,
number of operators and their count, input-output tensor name, shape and
dtypes.
--onnx_perform_sequence_construct_optimizer
This option allows optimization on SequenceConstruct Op.
When SequenceConstruct op is one of the outputs of the graph, it removes
SequenceConstruct op and makes its inputs as graph outputs to replace the
original output of SequenceConstruct.
--tf_summary Summarize the original TF model and optimized TF model.
Summary will print the model information such as number of parameters,
number of operators and their count, input-output tensor name, shape and
dtypes.
TensorFlow Converter Options:
--tf_override_batch BATCH
The batch dimension override. This will take the first dimension of all
inputs and treat it as a batch dim, overriding it with the value provided
here. For example:
--tf_override_batch 6
will result in a shape change from [1,224,224,3] to [6,224,224,3].
If there are inputs without batch dim this should not be used and each input
should be overridden independently using -s option for input dimension
overrides.
--tf_disable_optimization
Do not attempt to optimize the model automatically.
--tf_show_unconsumed_nodes
Displays a list of unconsumed nodes, if there any are found. Nodeswhich are
unconsumed do not violate the structural fidelity of thegenerated graph.
--tf_saved_model_tag SAVED_MODEL_TAG
Specify the tag to seletet a MetaGraph from savedmodel. ex:
--saved_model_tag serve. Default value will be 'serve' when it is not
assigned.
--tf_saved_model_signature_key SAVED_MODEL_SIGNATURE_KEY
Specify signature key to select input and output of the model. ex:
--tf_saved_model_signature_key serving_default. Default value will be
'serving_default' when it is not assigned
--tf_validate_models Validate the original TF model against optimized TF model.
Constant inputs with all value 1s will be generated and will be used
by both models and their outputs are checked against each other.
The % average error and 90th percentile of output differences will be
calculated for this.
Note: Usage of this flag will incur extra time due to inference of the
models.
Tflite Converter Options:
--tflite_signature_name SIGNATURE_NAME
Use this option to specify a specific Subgraph signature to convert
PyTorch Converter Options:
--dump_exported_onnx Dump the exported Onnx model from input Torchscript model
Backend Options:
--target_backend BACKEND
Use this option to specify the backend on which the model needs to run.
Providing this option will generate a graph optimized for the given backend
and this graph may not run on other backends. The default backend is HTP.
Supported backends are CPU,GPU,DSP,HTP,HTA,LPAI.
--target_soc_model SOC_MODEL
Use this option to specify the SOC on which the model needs to run.
This can be found from SOC info of the device and it starts with strings
such as SDM, SM, QCS, IPQ, SA, QC, SC, SXR, SSG, STP, QRB, or AIC.
NOTE: --target_backend option must be provided to use --target_soc_model
option.
Note: Only one of: {'package_name', 'op_package_config'} can be specified
Model Preparation¶
Quantization Support¶
Quantization is supported through the converter interface and is performed at conversion time. The only required option to enable quantization along with conversion is the –input_list option, which provides the quantizer with the required input data for the given model. The following options are available in each converter listed above to enable and configure quantization:
Quantizer Options:
--quantization_overrides QUANTIZATION_OVERRIDES
Use this option to specify a json file with parameters
to use for quantization. These will override any
quantization data carried from conversion (eg TF fake
quantization) or calculated during the normal
quantization process. Format defined as per AIMET
specification.
--input_list INPUT_LIST
Path to a file specifying the input data. This file
should be a plain text file, containing one or more
absolute file paths per line. Each path is expected to
point to a binary file containing one input in the
"raw" format, ready to be consumed by the quantizer
without any further preprocessing. Multiple files per
line separated by spaces indicate multiple inputs to
the network. See documentation for more details. Must
be specified for quantization. All subsequent
quantization options are ignored when this is not
provided.
--param_quantizer PARAM_QUANTIZER
Optional parameter to indicate the weight/bias
quantizer to use. Must be followed by one of the
following options: "tf": Uses the real min/max of the
data and specified bitwidth (default) "enhanced": Uses
an algorithm useful for quantizing models with long
tails present in the weight distribution "adjusted":
Uses an adjusted min/max for computing the range,
particularly good for denoise models "symmetric":
Ensures min and max have the same absolute values
about zero. Data will be stored as int#_t data such
that the offset is always 0.
--act_quantizer ACT_QUANTIZER
Optional parameter to indicate the activation
quantizer to use. Must be followed by one of the
following options: "tf": Uses the real min/max of the
data and specified bitwidth (default) "enhanced": Uses
an algorithm useful for quantizing models with long
tails present in the weight distribution "adjusted":
Uses an adjusted min/max for computing the range,
particularly good for denoise models "symmetric":
Ensures min and max have the same absolute values
about zero. Data will be stored as int#_t data such
that the offset is always 0.
--algorithms ALGORITHMS [ALGORITHMS ...]
Use this option to enable new optimization algorithms.
Usage is: --algorithms <algo_name1> ... The
available optimization algorithms are: "cle" - Cross
layer equalization includes a number of methods for
equalizing weights and biases across layers in order
to rectify imbalances that cause quantization errors.
--bias_bitwidth BIAS_BITWIDTH
Use the --bias_bitwidth option to select the bitwidth to use
when quantizing the biases, either 8 (default) or 32.
--act_bitwidth ACT_BITWIDTH
Use the --act_bitwidth option to select the bitwidth to use
when quantizing the activations, either 8 (default) or
16.
--weight_bitwidth WEIGHT_BITWIDTH
Use the --weight_bitwidth option to select the bitwidth to
use when quantizing the weights, either 4, 8 (default) or 16.
--float_bitwidth FLOAT_BITWIDTH
Use the --float_bitwidth option to select the bitwidth to use for float
tensors,either 32 (default) or 16.
--float_bias_bitwidth FLOAT_BIAS_BITWIDTH
Use the --float_bias_bitwidth option to select the bitwidth to
use when biases are in float, either 32 or 16.
--ignore_encodings Use only quantizer generated encodings, ignoring any
user or model provided encodings. Note: Cannot use
--ignore_encodings with --quantization_overrides
--use_per_channel_quantization [USE_PER_CHANNEL_QUANTIZATION [USE_PER_CHANNEL_QUANTIZATION ...]]
Use per-channel quantization for
convolution-based op weights. Note: This will replace
built-in model QAT encodings when used for a given
weight.Usage "--use_per_channel_quantization" to
enable or "--use_per_channel_quantization false"
(default) to disable
--use_per_row_quantization [USE_PER_ROW_QUANTIZATION [USE_PER_ROW_QUANTIZATION ...]]
Use this option to enable rowwise quantization of Matmul and
FullyConnected op. Usage "--use_per_row_quantization" to enable
or "--use_per_row_quantization false" (default) to
disable. This option may not be supported by all backends.
Basic command line usage to convert and quantize a model using the TF converter would look like:
$ qnn-tensorflow-converter -i <path>/frozen_graph.pb
-d <network_input_name> <dims>
--out_node <network_output_name>
-o <optional_output_path>
--allow_unconsumed_nodes # optional, but most likely will be need for larger models
-p <optional_package_name> # Defaults to "qti.aisw"
--input_list input_list.txt
This will quantize the network using the default quantizer and bitwidths (8 bits for activations, weights, and biases).
For more detailed information on quantization, options, and algorithms please refer to Quantization.
qairt-quantizer¶
The qairt-quantizer tool converts non-quantized DLC models into quantized DLC models.
Basic command line usage looks like:
usage: qairt-quantizer --input_dlc INPUT_DLC [--output_dlc OUTPUT_DLC] [--input_list INPUT_LIST]
[--enable_float_fallback] [--apply_algorithms ALGORITHMS [ALGORITHMS ...]]
[--bias_bitwidth BIAS_BITWIDTH] [--act_bitwidth ACT_BITWIDTH]
[--weights_bitwidth WEIGHTS_BITWIDTH] [--float_bitwidth FLOAT_BITWIDTH]
[--float_bias_bitwidth FLOAT_BIAS_BITWIDTH] [--ignore_quantization_overrides]
[--use_per_channel_quantization] [--use_per_row_quantization]
[--enable_per_row_quantized_bias]
[--preserve_io_datatype [PRESERVE_IO_DATATYPE ...]]
[--use_native_input_files] [--use_native_output_files]
[--restrict_quantization_steps ENCODING_MIN, ENCODING_MAX]
[--keep_weights_quantized] [--adjust_bias_encoding]
[--act_quantizer_calibration ACT_QUANTIZER_CALIBRATION]
[--param_quantizer_calibration PARAM_QUANTIZER_CALIBRATION]
[--act_quantizer_schema ACT_QUANTIZER_SCHEMA]
[--param_quantizer_schema PARAM_QUANTIZER_SCHEMA]
[--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE]
[--use_aimet_quantizer] [--op_package_lib OP_PACKAGE_LIB]
[--dump_encoding_json] [--config CONFIG_FILE] [--export_stripped_dlc] [-h]
[--target_backend BACKEND] [--target_soc_model SOC_MODEL] [--debug [DEBUG]]
required arguments:
--input_dlc INPUT_DLC, -i INPUT_DLC
Path to the dlc container containing the model for which fixed-point
encoding metadata should be generated. This argument is required
optional arguments:
--output_dlc OUTPUT_DLC, -o OUTPUT_DLC
Path at which the metadata-included quantized model container should be
written.If this argument is omitted, the quantized model will be written at
<unquantized_model_name>_quantized.dlc
--input_list INPUT_LIST, -l INPUT_LIST
Path to a file specifying the input data. This file should be a plain text
file, containing one or more absolute file paths per line. Each path is
expected to point to a binary file containing one input in the "raw" format,
ready to be consumed by the quantizer without any further preprocessing.
Multiple files per line separated by spaces indicate multiple inputs to the
network. See documentation for more details. Must be specified for
quantization. All subsequent quantization options are ignored when this is
not provided.
--enable_float_fallback, -f
Use this option to enable fallback to floating point (FP) instead of fixed
point.
This option can be paired with --float_bitwidth to indicate the bitwidth for
FP (by default 32).
If this option is enabled, then input list must not be provided and
--ignore_quantization_overrides must not be provided.
The external quantization encodings (encoding file/FakeQuant encodings)
might be missing quantization parameters for some interim tensors.
First it will try to fill the gaps by propagating across math-invariant
functions. If the quantization params are still missing,
then it will apply fallback to nodes to floating point.
--apply_algorithms ALGORITHMS [ALGORITHMS ...]
Use this option to enable new optimization algorithms. Usage is:
--apply_algorithms <algo_name1> ... The available optimization algorithms
are: "cle" - Cross layer equalization includes a number of methods for
equalizing weights and biases across layers in order to rectify imbalances
that cause quantization errors.
--bias_bitwidth BIAS_BITWIDTH
Use the --bias_bitwidth option to select the bitwidth to use when quantizing
the biases, either 8 (default) or 32.
--act_bitwidth ACT_BITWIDTH
Use the --act_bitwidth option to select the bitwidth to use when quantizing
the activations, either 8 (default) or 16.
--weights_bitwidth WEIGHTS_BITWIDTH
Use the --weights_bitwidth option to select the bitwidth to use when
quantizing the weights, either 4, 8 (default) or 16.
--float_bitwidth FLOAT_BITWIDTH
Use the --float_bitwidth option to select the bitwidth to use for float
tensors,either 32 (default) or 16.
--float_bias_bitwidth FLOAT_BIAS_BITWIDTH
Use the --float_bias_bitwidth option to select the bitwidth to use when
biases are in float, either 32 or 16 (default '0' if not provided).
--ignore_quantization_overrides
Use only quantizer generated encodings, ignoring any user or model provided
encodings.
Note: Cannot use --ignore_quantization_overrides with
--quantization_overrides (argument of Qairt Converter)
--use_per_channel_quantization
Use this option to enable per-channel quantization for convolution-based op
weights.
Note: This will only be used if built-in model Quantization-Aware Trained
(QAT) encodings are not present for a given weight.
--use_per_row_quantization
Use this option to enable rowwise quantization of Matmul and FullyConnected
ops.
--enable_per_row_quantized_bias
Use this option to enable rowwise quantization of bias for FullyConnected
ops, when weights are per-row quantized.
--preserve_io_datatype [PRESERVE_IO_DATATYPE ...]
Use this option to preserve IO datatype. The different ways of using this
option are as follows:
--preserve_io_datatype <space separated list of names of inputs and
outputs of the graph>
e.g.
--preserve_io_datatype input1 input2 output1
The user may choose to preserve the datatype for all the inputs and outputs
of the graph.
--preserve_io_datatype
--use_native_input_files
Boolean flag to indicate how to read input files.
If not provided, reads inputs as floats and quantizes if necessary based on
quantization parameters in the model. (default)
If provided, reads inputs assuming the data type to be native to the model.
For ex., uint8_t.
--use_native_output_files
Boolean flag to indicate the data type of the output files
If not provided, outputs the file as floats. (default)
If provided, outputs the file that is native to the model. For ex., uint8_t.
--restrict_quantization_steps ENCODING_MIN, ENCODING_MAX
Specifies the number of steps to use for computing quantization encodings
such that scale = (max - min) / number of quantization steps.
The option should be passed as a space separated pair of hexadecimal string
minimum and maximum valuesi.e. --restrict_quantization_steps "MIN MAX".
Please note that this is a hexadecimal string literal and not a signed
integer, to supply a negative value an explicit minus sign is required.
E.g.--restrict_quantization_steps "-0x80 0x7F" indicates an example 8 bit
range,
--restrict_quantization_steps "-0x8000 0x7F7F" indicates an example 16
bit range.
This argument is required for 16-bit Matmul operations.
--keep_weights_quantized
Use this option to keep the weights quantized even when the output of the op
is in floating point. Bias will be converted to floating point as per the
output of the op. Required to enable wFxp_actFP configurations according to
the provided bitwidth for weights and activations
Note: These modes are not supported by all runtimes. Please check
corresponding Backend OpDef supplement if these are supported
--adjust_bias_encoding
Use --adjust_bias_encoding option to modify bias encoding and weight
encoding to ensure that the bias value is in the range of the bias encoding.
This option is only applicable for per-channel quantized weights.
NOTE: This may result in clipping of the weight values
--act_quantizer_calibration ACT_QUANTIZER_CALIBRATION
Specify which quantization calibration method to use for activations
supported values: min-max (default), sqnr, entropy, mse, percentile
This option can be paired with --act_quantizer_schema to override the
quantization
schema to use for activations otherwise default schema(asymmetric) will be
used
--param_quantizer_calibration PARAM_QUANTIZER_CALIBRATION
Specify which quantization calibration method to use for parameters
supported values: min-max (default), sqnr, entropy, mse, percentile
This option can be paired with --param_quantizer_schema to override the
quantization
schema to use for parameters otherwise default schema(asymmetric) will be
used
--act_quantizer_schema ACT_QUANTIZER_SCHEMA
Specify which quantization schema to use for activations
supported values: asymmetric (default), symmetric, unsignedsymmetric
--param_quantizer_schema PARAM_QUANTIZER_SCHEMA
Specify which quantization schema to use for parameters
supported values: asymmetric (default), symmetric, unsignedsymmetric
--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE
Specify the percentile value to be used with Percentile calibration method
The specified float value must lie within 90 and 100, default: 99.99
--use_aimet_quantizer
Use AIMET for Quantization instead of QNN IR quantizer
--op_package_lib OP_PACKAGE_LIB, -opl OP_PACKAGE_LIB
Use this argument to pass an op package library for quantization. Must be in
the form <op_package_lib_path:interfaceProviderName> and be separated by a
comma for multiple package libs
--dump_encoding_json Use this argument to dump encoding of all the tensors in a json file
--config CONFIG_FILE, -c CONFIG_FILE
Use this argument to pass the path of the config YAML file with quantizer
options
--export_stripped_dlc
Use this argument to export a DLC which strips out data not needed for graph
composition
-h, --help show this help message and exit
--debug [DEBUG] Run the quantizer in debug mode.
Backend Options:
--target_backend BACKEND
Use this option to specify the backend on which the model needs to run.
Providing this option will generate a graph optimized for the given backend
and this graph may not run on other backends. The default backend is HTP.
Supported backends are CPU,GPU,DSP,HTP,HTA,LPAI.
--target_soc_model SOC_MODEL
Use this option to specify the SOC on which the model needs to run.
This can be found from SOC info of the device and it starts with strings
such as SDM, SM, QCS, IPQ, SA, QC, SC, SXR, SSG, STP, QRB, or AIC.
NOTE: --target_backend option must be provided to use --target_soc_model
option.
For more information on usage, please refer to SNPE documentation on the snpe-dlc-quant tool.
qnn-model-lib-generator¶
Note
The qnn-model-lib-generator tool compiles QNN model source code into artifacts for a specific target.
usage: qnn-model-lib-generator [-h] [-c <QNN_MODEL>.cpp] [-b <QNN_MODEL>.bin]
[-t LIB_TARGETS ] [-l LIB_NAME] [-o OUTPUT_DIR]
Script compiles provided Qnn Model artifacts for specified targets.
Required argument(s):
-c <QNN_MODEL>.cpp Filepath for the qnn model .cpp file
optional argument(s):
-b <QNN_MODEL>.bin Filepath for the qnn model .bin file
(Note: if not passed, runtime will fail if .cpp needs any items from a .bin file.)
-t LIB_TARGETS Specifies the targets to build the models for. Default: aarch64-android x86_64-linux-clang
-l LIB_NAME Specifies the name to use for libraries. Default: uses name in <model.bin> if provided,
else generic qnn_model.so
-o OUTPUT_DIR Location for saving output libraries.
Note
For Windows users, please execute this tool with python3.
qnn-op-package-generator¶
The qnn-op-package-generator tool is used to generate skeleton code for a QNN op package using an XML config file that describes the attributes of the package. The tool creates the package as a directory containing skeleton source code and makefiles that can be compiled to create a shared library object.
usage: qnn-op-package-generator [-h] --config_path CONFIG_PATH [--debug]
[--output_path OUTPUT_PATH] [-f]
optional arguments:
-h, --help show this help message and exit
required arguments:
--config_path CONFIG_PATH, -p CONFIG_PATH
The path to a config file that defines a QNN Op
package(s).
optional arguments:
--debug Returns debugging information from generating the
package
--output_path OUTPUT_PATH, -o OUTPUT_PATH
Path where the package should be saved
-f, --force-generation
This option will delete the entire existing package
Note appropriate file permissions must be set to use
this option.
--converter_op_package, -cop
Generates Converter Op Package skeleton code needed
by the output shape inference for converters
qnn-context-binary-generator¶
The qnn-context-binary-generator tool is used to create a context binary by using a particular backend and consuming a model library created by the qnn-model-lib-generator.
usage: qnn-context-binary-generator --model QNN_MODEL.so --backend QNN_BACKEND.so
--binary_file BINARY_FILE_NAME
[--model_prefix MODEL_PREFIX]
[--output_dir OUTPUT_DIRECTORY]
[--op_packages ONE_OR_MORE_OP_PACKAGES]
[--config_file CONFIG_FILE.json]
[--profiling_level PROFILING_LEVEL]
[--verbose] [--version] [--help]
REQUIRED ARGUMENTS:
-------------------
--model <FILE> Path to the <qnn_model_name.so> file containing a QNN network.
To create a context binary with multiple graphs, use
comma-separated list of model.so files. The syntax is
<qnn_model_name_1.so>,<qnn_model_name_2.so>.
--backend <FILE> Path to a QNN backend .so library to create the context binary.
--binary_file <VAL> Name of the binary file to save the context binary to with
.bin file extension. If not provided, no backend binary is created.
If absolute path is provided, binary is saved in this path.
Else binary is saved in the same path as --output_dir option.
OPTIONAL ARGUMENTS:
-------------------
--model_prefix Function prefix to use when loading <qnn_model_name.so> file
containing a QNN network. Default: QnnModel.
--output_dir <DIR> The directory to save output to. Defaults to ./output.
--op_packages <VAL> Provide a comma separated list of op packages
and interface providers to register. The syntax is:
op_package_path:interface_provider[,op_package_path:interface_provider...]
--profiling_level <VAL> Enable profiling. Valid Values:
1. basic: captures execution and init time.
2. detailed: in addition to basic, captures per Op timing
for execution.
3. backend: backend-specific profiling level specified
in the backend extension related JSON config file.
--profiling_option <VAL> Set profiling options:
1. optrace: Generates an optrace of the run.
--config_file <FILE> Path to a JSON config file. The config file currently
supports options related to backend extensions and
context priority. Please refer to SDK documentation
for more details.
--enable_intermediate_outputs Enable all intermediate nodes to be output along with
default outputs in the saved context.
Note that options --enable_intermediate_outputs and --set_output_tensors
are mutually exclusive. Only one of the options can be specified at a time.
--set_output_tensors <VAL> Provide a comma-separated list of intermediate output tensor names, for which the outputs
will be written in addition to final graph output tensors.
Note that options --enable_intermediate_outputs and --set_output_tensors
are mutually exclusive. Only one of the options can be specified at a time.
The syntax is: graphName0:tensorName0,tensorName1;graphName1:tensorName0,tensorName1.
In case of a single graph, its name is not necessary and a list of comma separated tensor
names can be provided, e.g.: tensorName0,tensorName1.
The same format can be provided in a .txt file.
--backend_binary <VAL> Name of the binary file to save a backend-specific context binary to with
.bin file extension. If not provided, no backend binary is created.
If absolute path is provided, binary is saved in this path.
Else binary is saved in the same path as --output_dir option.
--log_level Specifies max logging level to be set. Valid settings:
"error", "warn", "info" and "verbose"
--dlc_path <VAL> Paths to a comma separated list of Deep Learning Containers (DLC) from which to load the models.
Necessitates libQnnModelDlc.so as the --model argument.
To compose multiple graphs in the context, use comma-separated list of DLC files.
The syntax is <qnn_model_name_1.dlc>,<qnn_model_name_2.dlc>
Default: None
--input_output_tensor_mem_type <VAL> Specifies mem type to be used for input and output tensors during graph creation.
Valid settings:"raw" and "memhandle"
--platform_options <VAL> Specifies values to pass as platform options. Multiple platform options can be provided
using the syntax: key0:value0;key1:value1;key2:value2
--data_format_config <VAL> Path to a JSON config file, specifying the data formats of certain tensors.
Please refer to SDK documentation for more details.
--adapter_weight_config <VAL> Path to a YAML config file containing adapter weight information for LoRA.
Config should specifiy the use case name, graph name, the location of safetensor weights and encodings,
and optionally whether the use case should be encodings and/or weights only e.g.
use_case:
- name: <use_case>
graph: <graph>
weights: <path_to_safetensors>.safetensors
encodings: <path_to_encodings>.encodings
encodings_only: <true/false>
weights_only: <true/false>
--soc_model <VAL> Specifies simulated soc model value.
A valid soc model value can be chosen from :ref:`Supported Snapdragon Devices<general/overview:Supported Snapdragon devices>`
Default: 0 (use default soc model set by the backend).
--version Print the QNN SDK version.
--help Show this help message.
See qnn-net-run section for more details about --op_packages and --config_file options.
Execution¶
qnn-net-run¶
The qnn-net-run tool is used to consume a model library compiled from the output of the QNN converter, and run it on a particular backend.
DESCRIPTION:
------------
Example application demonstrating how to load and execute a neural network
using QNN APIs.
REQUIRED ARGUMENTS:
-------------------
--model <FILE> Path to the model containing a QNN network.
To compose multiple graphs, use comma-separated list of
model.so files. The syntax is
<qnn_model_name_1.so>,<qnn_model_name_2.so>.
--backend <FILE> Path to a QNN backend to execute the model.
--input_list <FILE> Path to a file listing the inputs for the network.
If there are multiple graphs in model.so, this has
to be comma-separated list of input list files.
When multiple graphs are present, to skip execution of a graph use
"__"(double underscore without quotes) as the file name in the
comma-seperated list of input list files.
--retrieve_context <VAL> Path to cached binary from which to load a saved
context from and execute graphs. --retrieve_context and
--model are mutually exclusive. Only one of the options
can be specified at a time.
OPTIONAL ARGUMENTS:
-------------------
--model_prefix Function prefix to use when loading <qnn_model_name.so>.
Default: QnnModel
--debug Specifies that output from all layers of the network
will be saved. Note that options --debug and --set_output_tensors
are mutually exclusive. Only one of the options can be specified
at a time.This option can not be used when loading a saved context
through --retrieve_context or --retrieve_context_list option.
--output_dir <DIR> The directory to save output to. Defaults to ./output.
--use_native_output_files Specifies that the output files will be generated in the data
type native to the graph. If not specified, output files will
be generated in floating point.
--use_native_input_files Specifies that the input files will be parsed in the data
type native to the graph. If not specified, input files will
be parsed in floating point. Note that options --use_native_input_files
and --native_input_tensor_names are mutually exclusive.
Only one of the options can be specified at a time.
--native_input_tensor_names <VAL> Provide a comma-separated list of input tensor names,
for which the input files would be read/parsed in native format.
Note that options --use_native_input_files and
--native_input_tensor_names are mutually exclusive.
Only one of the options can be specified at a time.
The syntax is: graphName0:tensorName0,tensorName1;graphName1:tensorName0,tensorName1
--op_packages <VAL> Provide a comma-separated list of op packages, interface
providers, and, optionally, targets to register. Valid values
for target are CPU and HTP. The syntax is:
op_package_path:interface_provider:target[,op_package_path:interface_provider:target...]
--profiling_level <VAL> Enable profiling. Valid Values:
1. basic: captures execution and init time.
2. detailed: in addition to basic, captures per Op timing
for execution, if a backend supports it.
3. client: captures only the performance metrics
measured by qnn-net-run.
4. backend: backend-specific profiling level
specified in the backend extension
related JSON config file.
--profiling_option <VAL> Set profiling options:
1. optrace: Generates an optrace of the run.
--perf_profile <VAL> Specifies performance profile to be used. Valid settings are
low_balanced, balanced, default, high_performance,
sustained_high_performance, burst, low_power_saver,
power_saver, high_power_saver, extreme_power_saver
and system_settings.
Note: perf_profile option will override any existing performance settings from backend config.
--config_file <FILE> Path to a JSON config file. The config file currently
supports options related to backend extensions,
context priority and graph configs. Please refer to SDK
documentation for more details.
--log_level <VAL> Specifies max logging level to be set. Valid settings:
error, warn, info, debug, and verbose.
--shared_buffer Specifies creation of shared buffers for graph I/O between the application
and the device/coprocessor associated with a backend directly.
--synchronous Specifies that graphs should be executed synchronously rather than asynchronously.
If a backend does not support asynchronous execution, this flag is unnecessary.
--num_inferences <VAL> Specifies the number of inferences. Loops over the input_list until
the number of inferences has transpired.
--duration <VAL> Specifies the duration of the graph execution in seconds.
Loops over the input_list until this amount of time has transpired.
--keep_num_outputs <VAL> Specifies the number of outputs to be saved.
Once the number of outputs reach the limit, subsequent outputs would be just discarded.
--batch_multiplier <VAL> Specifies the value with which the batch value in input and output tensors dimensions
will be multiplied. The modified input and output tensors will be used only during
the execute graphs. Composed graphs will still use the tensor dimensions from model.
--timeout <VAL> Specifies the value of the timeout for execution of graph in micro seconds. Please note
using this option with a backend that does not support timeout signals results in an error.
--retrieve_context_timeout <VAL> Specifies the value of the timeout for initialization of graph in micro seconds. Please note
using this option with a backend that does not support timeout signals results in an error.
Also note that this option can only be used when loading a saved context through
--retrieve_context or --retrieve_context_list option.
--max_input_cache_tensor_sets <VAL> Specifies the maximum number of input tensor sets that can be cached.
Use value "-1" to cache all the input tensors created.
Note that options --max_input_cache_tensor_sets and --max_input_cache_size_mb are mutually exclusive.
Only one of the options can be specified at a time.
--max_input_cache_size_mb <VAL> Specifies the maximum cache size in mega bytes(MB).
Note that options --max_input_cache_tensor_sets and --max_input_cache_size_mb are mutually exclusive.
Only one of the options can be specified at a time.
--set_output_tensors <VAL> Provide a comma-separated list of intermediate output tensor names, for which the outputs
will be written in addition to final graph output tensors. Note that options --debug and
--set_output_tensors are mutually exclusive. Only one of the options can be specified at a time.
Also note that this option can not be used when graph is retrieved from context binary,
since the graph is already finalized when retrieved from context binary.
The syntax is: graphName0:tensorName0,tensorName1;graphName1:tensorName0,tensorName1.
In case of a single graph, its name is not necessary and a list of comma separated tensor
names can be provided, e.g.: tensorName0,tensorName1.
The same format can be provided in a .txt file.
--use_mmap Specifies that the context binary that is being read should be loaded
using the Memory-mapped (MMAP) file I/O. Please note some platforms
may not support this due to OS limitations in which case an error
is thrown when this option is used.
--validate_binary Specifies that the context binary will be validated before creating a context.
This option can only be used with backends that support binary validation.
--platform_options <VAL> Specifies values to pass as platform options. Multiple platform options can be provided
using the syntax: key0:value0;key1:value1;key2:value2
--graph_profiling_start_delay <VAL> Specifies graph profiling start delay in seconds. Please Note that this option can only be used
in conjunction with graph-level profiling handles.
--dlc_path <VAL> Paths to a comma separated list of Deep Learning Containers (DLC) from which to load the models.
Necessitates libQnnModelDlc.so as the --model argument.
To compose multiple graphs in the context, use comma-separated list of DLC files.
The syntax is <qnn_model_name_1.dlc>,<qnn_model_name_2.dlc>
Default: None
--graph_profiling_num_executions <VAL> Specifies the maximum number of QnnGraph_execute/QnnGraph_executeAsync calls to be profiled.
Please Note that this option can only be used in conjunction with graph-level profiling handles.
--io_tensor_mem_handle_type <VAL> Specifies mem handle type to be used for Input and output tensors during graph execution.
Valid settings: "ion" and "dma_buf".
--device_options <VAL> Specifies values to pass as device options. Multiple device options can be provided using the
syntax: key0:value0;key1:value1;key2:value2
Currently supported options:
device_id:<n> - selects a particular hardware device by ID to execute on. This ID will be used
during QnnDevice creation. A default device will be chosen by the backend if
an ID is not provided. This value will override a device ID selected in a
backend config file.
core_id:<n> - selects a particular core by ID to execute on the selected device. This ID will
be used during QnnDevice creation. A default core will be chosen by the backend
if an ID is not provided. This value will override a core ID selected in a
backend config file.
--retrieve_context_list <VAL> Provide the path to yaml file which contains info regarding multiple contexts. --retrieve_context_list
is mutually exclusive with --retrieve_context, --model and --dlc_path. Please refer to SDK documentation
for more details.
--binary_updates <VAL> Path to yaml that contains paths to binary updates.
Updates are applied after initial graph execution on
a per graph basis.
--version Print the QNN SDK version.
--help Show this help message.
EXIT CODES:
------------
List of exit codes used in qnn-net-run application.
Exit codes 1, 2, 126 – 165 and 255 should be avoided for user-defined exit codes since they have
special purpose as below:
1, 2 : Abnormal termination of a program.
126 - 165 are specifically used to indicate seg faults, bus errors etc..
3 - Application failure reason unknown. See DSP logs (logcat).
4 - Application failure due to invalid application argument.
6 - Application failure during setting log level.
7 - Application failure due to null or invalid function pointer etc.
9 - Application failure during qnn_net_run_HtpVXXHexagon initialization.
10 - Application failure during backend creation.
11 - Application failure during device creation.
12 - Application failure during Op Package registration.
13 - Application failure during creating context.
14 - Application failure during graph prepare.
15 - Application failure during graph finalize.
16 - Application failure during create from binary.
17 - Application failure during graph execution.
18 - Application failure during context free.
19 - Application failure during device free.
20 - Application failure during backend termination.
21 - Application failure during graph execution abort.
22 - Application failure during graph execution timeout.
23 - Application failure during the create from binary with suboptimal cache.
24 - Application failure during backend termination.
25 - Application failure during processing binary section or updating binary section etc.
26 - Application failure during binary update/execution.
See <QNN_SDK_ROOT>/examples/QNN/NetRun folder for reference example on how to use qnn-net-run tool.
Typical arguments:
--backend - The appropriate argument depends on what target and backend you want to run on
Android (aarch64):
<QNN_SDK_ROOT>/lib/aarch64-android/
CPU -
libQnnCpu.soGPU -
libQnnGpu.soHTA -
libQnnHta.soDSP (Hexagon v65) -
libQnnDspV65Stub.soDSP (Hexagon v66) -
libQnnDspV66Stub.soDSP -
libQnnDsp.soHTP (Hexagon v68) -
libQnnHtp.so[Deprecated] HTP Alternate Prepare (Hexagon v68) -
libQnnHtpAltPrepStub.soLPAI (Stub library) -
libQnnLpaiStub.soLPAI -
libQnnLpai.soSaver -
libQnnSaver.soLinux x86:
<QNN_SDK_ROOT>/lib/x86_64-linux-clang/
CPU -
libQnnCpu.soHTP (Hexagon v68) -
libQnnHtp.soLPAI -
libQnnLpai.soSaver -
libQnnSaver.soWindows x86:
<QNN_SDK_ROOT>/lib/x86_64-windows-msvc/
CPU -
QnnCpu.dllLPAI -
QnnLpai.dllSaver -
QnnSaver.dllWoS:
<QNN_SDK_ROOT>/lib/aarch64-windows-msvc/
CPU -
QnnCpu.dllDSP (Hexagon v66) -
QnnDspV66Stub.dllDSP -
QnnDsp.dllHTP (Hexagon v68) -
QnnHtp.dllSaver -
QnnSaver.dll
Note
Hexagon based backend libraries are emulations on x86_64 platforms
--input_list - This argument provides a file containing paths to input files to be used for graph
execution. Input files can be specified with the below format:
<input_layer_name>:=<input_layer_path>[<space><input_layer_name>:=<input_layer_path>] [<input_layer_name>:=<input_layer_path>[<space><input_layer_name>:=<input_layer_path>]] ...
Below is an example containing 3 sets of inputs with layer names “Input_1” and “Input_2”, and files located in the relative path “Placeholder_1/real_input_inputs_1/”:
Input_1:=Placeholder_1/real_input_inputs_1/0-0#e6fb51.rawtensor Input_2:=Placeholder_1/real_input_inputs_1/0-1#8a171b.rawtensor Input_1:=Placeholder_1/real_input_inputs_1/1-0#67c965.rawtensor Input_2:=Placeholder_1/real_input_inputs_1/1-1#54f1ff.rawtensor Input_1:=Placeholder_1/real_input_inputs_1/2-0#b42dc6.rawtensor Input_2:=Placeholder_1/real_input_inputs_1/2-1#346a0e.rawtensor
Note: If the batch dimension of the model is greater than 1, the number of batch elements in the input file has to either match the batch dimension specified in the model or it has to be one. In the latter case, qnn-net-run will combine multiple lines into a single input tensor.
--op_packages - This argument is only needed if you are using custom op packages. The native QNN
ops are already included as part of the backend libraries.
When using custom op packages, each provided op package requires a colon separated command line argument containing the path to the op package shared library (.so) file, as well as the name of the interface provider, formatted as
<op_package_path>:<interface_provider>.The interface_provider argument must be the name of the function in the op package library that satisfies the QnnOpPackage_InterfaceProvider_t interface. In the skeleton code created by
qnn-op-package-generator, this function will be named<package_name><backend>InterfaceProvider.See Generating Op Packages for more information.
--config_file - This argument is only needed if you need to specify context priority or provide backend extensions
related parameters. These parameters are specified through a JSON file. The template of the JSON file is shown below:
{ "backend_extensions" : { "shared_library_path" : "path_to_shared_library", "config_file_path" : "path_to_config_file" }, "context_configs" : { "context_priority" : "low | normal | normal_high | high", "async_execute_queue_depth" : uint32_value, "enable_graphs" : ["<graph_name_1>", "<graph_name_2>", ...], "memory_limit_hint" : uint64_value, "is_persistent_binary" : boolean_value, "cache_compatibility_mode" : "permissive | strict", "spill_fill_buffer" : int64_value, "weights_buffer" : int64_value }, "graph_configs" : [ { "graph_name" : "graph_name_1", "graph_priority" : "low | normal | normal_high | high" "graph_profiling_start_delay" : double_value "graph_profiling_num_executions" : uint64_value } ], "profile_configs" : { "num_max_events" : uint64_value }, "async_graph_execution_config" : { "input_tensors_creation_tasks_limit" : uint32_value, "execute_enqueue_tasks_limit" : uint32_value }, "soc_configs" : { "soc_model" : int32_value } }All the options in the JSON file are optional. context_priority is used to specify priority of the context as a context config. async_execute_queue_depth is used to specify the number of executions that can be in the queue at a given time. While using a context binary, enable_graphs is used to implement the graph selection functionality. memory_limit_hint is used to set the peak memory limit hint of a deserialized context in MBs. is_persistent_binary indicates that the context binary pointer is available during QnnContext_createFromBinary and until QnnContext_free is called. *spill_fill_buffer is used to store spill fill values in a buffer shared between application and backend. *weights_buffer is used to store weights in a buffer shared between application and backend.
Set Cache Compatibility Mode : cache_compatibility_mode specifies the mode used to check whether cache record is optimal for the device. The available modes indicate binary cache compatibility:
“permissive”: Binary cache is compatible if it could run on the device; default.
“strict”: Binary cache is compatible if it could run on the device and fully utilize hardware capability. If it cannot fully utilize hardware, selecting this option results in a recommendation to prepare the cache again. This option returns an error if it is not supported by the selected backend.
Graph Selection : Allows to specify a subset of graphs in a context to be loaded and executed. If enable_graphs is specified, only those graphs are loaded. If a graph name is selected and it doesn’t exist, that would be an error. If enable_graphs is not specified or passed as an empty list, default behaviour continues where all graphs in a context are loaded.
graph_configs can be used to specify asynchronous execution order and depth, if a backend supports asynchronous execution. Every set of graph configs has to be specified along with a graph name. graph_profiling_start_delay is used to set the profiling start delay time in seconds. graph_profiling_num_executions is used to set the maximum number of QnnGraph_execute/QnnGraph_executeAsync calls that will be profiled.
profile_configs can be used to specify the max profile events per profiling handle.
async_graph_execution_config can be used to specify the limits on number of tasks that run in parallel when graphs are executed asynchronously using graphExecuteAsync. input_tensors_creation_tasks_limit specifies the maximum number of tasks in which input tensor sets are populated, which can be used for graph execution. execute_enqueue_tasks_limit specifies the maximum number of tasks in which the backend graphExecuteAsync will be called using the pre-populated input tensors. If unspecified, these values will be set to the specified “async_execute_queue_depth” or 10 which is the default for “async_execute_queue_depth”.
backend_extensions is used to exercise custom options in a particular backend. This can be done by providing an extensions shared library (.so) and a config file, if necessary. This is also required to enable various performance modes, which can be exercised using backend config. Currently, HTP supports it through
libQnnHtpNetRunExtensions.soshared library, DSP supports it throughlibQnnDspNetRunExtensions.soand GPU supports it throughlibQnnGpuNetRunExtensions.so. For different custom options which can be enabled with HTP see HTP Backend Extensionssoc_configs can be used to specify the simulated soc model listed in Supported Snapdragon Devices
--shared_buffer - This argument is only needed to indicate qnn-net-run to use shared buffers for zero-copy use case with
a device/coprocessor associated with a particular backend (for ex., DSP with HTP backend) for graph input and output tensor data.
This option is supported on Android only. qnn-net-run implements this feature using rpcmem APIs, which further create shared
buffers using ION/DMA-BUF memory allocator on Android, available through the shared library libcdsprpc.so. In addition to
specifying this option, for qnn-net-run to be able to discover libcdsprpc.so, the path in which the shared library is present
needs to be appended to LD_LIBRARY_PATH variable.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/vendor/lib64
--retrieve_context_list - This argument is used to specify a YAML file that contains information about multiple contexts,each with its
associated binary path, context configuration, and input files, enabling streamlined setup of contexts.
The template of the YAML file is shown below:
version : 1 contexts: - name: <context_name_1> binaryFilePath: <binary_file_path> contextConfig: context_priority: <low | normal | normal_high | high> async_execute_queue_depth: <uint32_value> enable_graphs: ["<graph_name_1>", "<graph_name_2>", ...] memory_limit_hint: <uint64_value> is_persistent_binary: <boolean_value> cache_compatibility_mode: <permissive | strict> spill_fill_buffer: <int64_value> weights_buffer: <int64_value> inputFileList: - graphName: <string_value> inputFilePath: <input_list_file_path> - name: <context_name_2> binaryFilePath: <binary_file_path> contextConfig: context_priority: <low | normal | normal_high | high> async_execute_queue_depth: <uint32_value> enable_graphs: ["<graph_name_1>", "<graph_name_2>", ...] memory_limit_hint: <uint64_value> is_persistent_binary: <boolean_value> cache_compatibility_mode: <permissive | strict> spill_fill_buffer: <int64_value> weights_buffer: <int64_value> inputFileList: - graphName: <string_value> inputFilePath: <input_list_file_path>version is used to specify the version of the configuration file.contexts is used to specify a list of context configurations.name is used to specify the name of the context.binaryFilePath is used to specify the path to the serialized binary file for the context.contextConfig is used to specify a dictionary containing context configuration options. Check context_config for more details.inputFileList is used to specify a list of graphName and inputFilePath for the context, graphName is used to specify the name of the graph andinputFilePath is used to specify the path to the input file for the graph.
Running Quantized Model on HTP backend with qnn-net-run¶
The HTP backend currently allows to finalize / create an optimized version of a quantized QNN model
offline, on Linux development host (using x86_64-linux-clang backend library) and then execute
the finalized model on device (using hexagon-v68 backend libraries).
First, configure the environment by following instructions in Setup section. Next,
build QNN Model library from your network, using artifacts produced by one of QNN converters.
See Building Example Model for reference.
Lastly, use the qnn-context-binary-generator utility to generate a serialized representation of the
finalized graph to execute the serialized binary on device.
1# Generate the optimized serialized representation of QNN Model on Linux development host.
2$ qnn-context-binary-generator --binary_file qnngraph.serialized.bin \
3 --model <path_to_model_library>/libQnnModel.so \ # a x86_64-linux-clang built quantized QNN model
4 --backend ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnHtp.so \
5 --output_dir <output_dir_for_result_and_qnngraph_serialized_binary> \
To use produced serialized representation of the finalized graph (qnngraph.serialized.bin)
ensure the below binaries are available on the android device:
libQnnHtpV68Stub.so(ARM)libQnnHtpPrepare.so(ARM)libQnnModel.so(ARM)libQnnHtpV68Skel.so(cDSP v68)qnngraph.serialized.bin(serialized binary from run on Linux development host)
See <QNN_SDK_ROOT>/examples/QNN/NetRun/android/android-qnn-net-run.sh script for reference
on how to use qnn-net-run tool on android device.
1# Run the optimized graph on HTP target
2$ qnn-net-run --retrieve_context qnngraph.serialized.bin \
3 --backend <path_to_model_library>/libQnnHtp.so \
4 --output_dir <output_dir_for_result> \
5 --input_list <path_to_input_list.txt>
Running Float Model on HTP backend with qnn-net-run¶
The QNN HTP backend can support running float32 models on select Qualcomm SoCs using float16 math.
First, configure the environment by following instructions in Setup section. Next, build QNN Model library from your network, using artifacts produced by one of QNN converters. See Building Example Model for reference.
Lastly, configure backend_extensions parameters through a JSON file and set custom options for the HTP backend.
Pass this file to qnn-net-run using --config_file argument. backend_extensions take two parameters, an extensions shared library (.so) (for HTP use libQnnHtpNetRunExtensions.so) and
a config file for the backend.
Below is the template for the JSON file:
{ "backend_extensions" : { "shared_library_path" : "path_to_shared_library", "config_file_path" : "path_to_config_file" } }For HTP backend extensions configurations, you can set “vtcm_mb” and “graph_names” through a config file.
Here is an example of the config file:
1{
2 "graphs": [
3 {
4 "vtcm_mb": 8, // Provides performance infrastructure configuration options that are memory specific.
5 // Optional; if not set, QNN HTP defaults to 4.
6
7 "graph_names": [ "qnn_model" ] // Provide the list of names of the graph for the inference as specified when using qnn converter tools
8 // "qnn_model" must be the name of the .cpp file generated during the model conversion (without the .cpp file extension)
9 .....
10 },
11 {
12 ..... // Other graph object
13 }
14 ]
15}
Note
“fp16_relaxed_precision” is deprecated starting from 2.35 release.
See <QNN_SDK_ROOT>/examples/QNN/NetRun/android/android-qnn-net-run.sh script for reference
on how to use qnn-net-run tool on android device.
1# Run the optimized graph on HTP target
2$ qnn-net-run --model <path_to_model_library>/libQnnModel.so \ # a x86_64-linux-clang built float QNN model
3 --backend ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnHtp.so \
4 --config_file <path_to_JSON_file.json> \
5 --output_dir <output_dir_for_result> \
6 --input_list <path_to_input_list.txt>
qnn-throughput-net-run¶
The qnn-throughput-net-run tool is used to exercise the execution of multiple models on a QNN backend or on different backends in a multi-threaded fashion. It allows repeated execution of models on a specified backend for a specified duration or number of iterations.
Usage:
------
qnn-throughput-net-run [--config <config_file>.json]
[--output <results>.json]
REQUIRED argument(s):
--config <FILE>.json Path to the json config file .
OPTIONAL argument(s):
--output <FILE>.json Specify the json file used to save the performance test results.
--version Print the QNN SDK version.
--help Show help message.
Configuration JSON File:
qnn-throughput-net-run uses configuration file as input to run the models on the backends. The configuration json file comprises of four objects (required) - backends, models, contexts and testCase.
Below is an example of a json configuration file. Please refer the following section for detailed information on the four configuration objects backends, models, contexts and testCase.
{
"backends": [
{
"backendName": "cpu_backend",
"backendPath": "libQnnCpu.so",
"profilingLevel": "BASIC",
"backendExtensions": "libQnnHtpNetRunExtensions.so",
"perfProfile": "high_performance"
},
{
"backendName": "gpu_backend",
"backendPath": "libQnnGpu.so",
"profilingLevel": "OFF"
}
],
"models": [
{
"modelName": "model_1",
"modelPath": "libqnn_model_1.so",
"loadFromCachedBinary": false,
"inputPath": "model_1-input_list.txt",
"inputDataType": "FLOAT",
"postProcessor": "MSE",
"outputPath": "model_1-output",
"outputDataType": "FLOAT_ONLY",
"saveOutput": "NATIVE_ALL",
"groundTruthPath": "model_1-golden_list.txt"
},
{
"modelName": "model_2",
"modelPath": "libqnn_model_2.so",
"loadFromCachedBinary": false,
"inputPath": "model_2-input_list.txt",
"inputDataType": "FLOAT",
"postProcessor": "MSE",
"outputPath": "model_2-output",
"outputDataType": "FLOAT_ONLY",
"saveOutput": "NATIVE_LAST"
}
],
"contexts": [
{
"contextName": "cpu_context_1"
},
{
"contextName": "gpu_context_1"
}
],
"testCase": {
"iteration": 5,
"logLevel": "error",
"threads": [
{
"threadName": "cpu_thread_1",
"backend": "cpu_backend",
"context": "cpu_context_1",
"model": "model_1",
"interval": 10,
"loopUnit": "count",
"loop": 1
},
{
"threadName": "gpu_thread_1",
"backend": "gpu_backend",
"context": "gpu_context_1",
"model": "model_2",
"interval": 0,
"loopUnit": "count",
"loop": 10
}
]
}
}
backends : Property value is an array of json objects, where each object contains the needed backend information on which the models are executed. Each object of the array has the following properties as key/value pairs.
Key |
Value Type |
Default Value |
Optional / Required |
Description |
|---|---|---|---|---|
|
|
|
|
Is a unique identifier for the testcase to designate on which backend the model should be run. |
|
|
|
|
Specifies the on device backend .so library file path. |
|
|
|
|
Sets the QNN profiling level for the backend. Possible values: OFF, BASIC, DETAILED.
|
|
|
|
|
Enables backend specific options through optional backend extensions
shared library and config file.
This is required to enable various performance modes which are
exercised using |
|
|
|
|
Specifies performance profile to set. Possible values: |
|
|
|
|
Comma seperated list of custom op packages and interface providers for registration.
|
|
|
|
|
Enables backend specific platform options through QnnBackend_Config_t.
|
models : Property value is an array of json objects, where each object contatins details about a model and corresponding input data and post-processing information. Each object of the array has the following properties as key/value pairs.
Key |
Value Type |
Default Value |
Optional / Required |
Description |
|---|---|---|---|---|
|
|
|
|
Is a unique identifier for the testcase to designate which model to run. |
|
|
|
|
Specifies the <model>.so / <serialized_context>.bin file path. |
|
|
|
|
Set to |
|
|
|
|
Path to a file listing the inputs for the model. If there are multiple graphs in the <model>.so / <serialized_context>.bin, this has to be comma-separated list of input path of individual graph. Syntax: Graph1_input_path[,Graph2_input_path,…] If not set, Random Input Data is used. |
|
|
|
|
Possible values: NATIVE, FLOAT. |
|
|
|
|
Possible values: NONE, MSE, MSE_FLOAT32, MSE_INT8, MSE_INT16. If there are multiple graphs in the <model>.so / <serialized_context>.bin, this has to be comma-separated list of postProcessor values. Syntax: MSE[,NONE,…] MSE will output a mean squared error result for each execution with the golden file specified by the parameter
|
|
|
|
|
If |
|
|
|
|
Possible values: NATIVE_ONLY, FLOAT_ONLY, FLOAT_AND_NATIVE. |
|
|
|
|
Possible values: NONE, NATIVE_LAST,NATIVE_ALL.
|
|
|
|
|
Specifies the golden file path for computing the MSE. If there are multiple graphs in the <model>.so / <serialized_context>.bin, this has to be comma-separated list of ground truth path of individual graph. Syntax: Graph1_ground_truth_path_[,Graph2_ground_truth_path_,…] |
contexts : Property value is an array of json objects, where each object contains all the context information. Each object of the array has the following properties as key/value pairs.
Key |
Value Type |
Default Value |
Optional / Required |
Description |
|---|---|---|---|---|
|
|
|
|
Is a unique identifier for the testcase to designate the context in which a model should be created. |
|
|
|
|
Specifies the priority of the context. Possible values: DEFAULT, LOW, NORMAL, HIGH. |
|
|
|
|
Specfies the queue depth for async execution. |
|
|
|
|
Specifies the cache compatibility check mode; valid values are: “permissive” (default), and “strict”. |
testCase : Property value is a json object that specifies the testing configuration that controls multi-threaded execution.
Key |
Value Type |
Default Value |
Optional / Required |
Description |
|---|---|---|---|---|
|
|
|
|
Number of times the entire use case is repeated. If the value is |
|
|
|
|
Specifies max logging level to be set. Valid settings: |
|
|
|
|
Property value is an array of json objects, where each object contains all the thread details, that are to be executed by the qnn-throughput-net-run. Each object of the array has the below properties listed under threads as key/value pairs. |
threads : Property value is an array containing all the threads and corresponding backend, context
and models information.
Each element of the array can have the following required/optional property.
Key |
Value Type |
Default Value |
Optional / Required |
Description |
|---|---|---|---|---|
|
|
|
|
Is a unique identifier for the testcase to identify the thread and save the output results. |
|
|
|
|
Specifies the backend to be used when this thread executes the graph.
The value specified should match with one of the |
|
|
|
|
Specifies the context to be used when this thread executes the graph.
The value specified should match with one of the |
|
|
|
|
Specifies the model to be used by the thread for execution.
The value specified should match with one of the |
|
|
|
|
Set it to |
|
|
|
|
Set it to |
|
|
|
|
Set it to |
|
|
|
|
Repesents the interval (in microseconds) between each graph execution in the thread. |
|
|
|
|
Possible values: count, second. |
|
|
|
|
Value is taken either as seconds or count based on the value for the |
|
|
|
|
Set it to |
|
|
|
|
Specifies the backend config file to enable backend specific options through |
An example json file sample_config.json file can be found at <QNN_SDK_ROOT>/examples/QNN/ThroughputNetRun.
Analysis¶
qairt-accuracy-evaluator (Beta)¶
The qairt-accuracy-evaluator tool provides a framework to evaluate end-to-end accuracy metrics for a model on a given dataset. In addition, the tool can be used to identify the best quantization options for a model on a given set of inputs.
Dependencies
The QNN Accuracy Evaluator assumes that the platform dependencies and environment setup instructions have been followed as outlined in the Setup page. Certain additional python packages are required by this tool, refer to Optional Python packages.
Note: The qairt-accuracy-evaluator currently supports only ONNX models.
Usage¶
User needs to set QNN_SDK_ROOT environment variable to root directory of QNN SDK. The following environment variables might need to be set with appropriate values: QNN_MODEL_ZOO : Path to model zoo. If not set, an absolute path must be provided explicitly.
Note: This environment variable is required only if the model path supplied is not absolute and relative to the set model zoo path. ADB_PATH : Set the path to the ADB binary. If not set, it is queried and set from its executable path.
To conduct an accuracy analysis of a given model using a specific dataset, the user must create a configuration that specifies the backends, quantization options, and reference inference frameworks. Sample config files can be found at ${QNN_SDK_ROOT}/lib/python/qti/aisw/accuracy_evaluator/configs/samples/model_configs.
The high-level structure of a model config is shown below:
model
info
globals
dataset
preprocessing
inference-engine
adapter # This is only applicable when use_memory_plugins is enabled
postprocessing
verifier
metrics
Users can utilize the info section of the model configuration to provide a brief description of the model or dataset being evaluated and specify the maximum number of calibration inputs for quantization. These fields are optional and default to None. Additionally, users can define constants to be used throughout their configuration. These variables can be overridden from the CLI using the -set_global command, offering convenience and flexibility. Note that the values provided are applicable only within the model configuration and not accessible within the script itself. The evaluator will replace strings (variable names) within the configuration with user-defined values before the start of the evaluation. Additionally, users can enable the memory pipeline by setting the memory_pipeline field in the info section. This approach is recommended for X86-based evaluations or AIC Backend due to its optimized performance. The following parameters control the multi-threading and processing behavior of the memory pipeline. Users can provide these parameters under the info section of the evaluator configuration:
memory_pipeline: Flag to enable or disable memory pipeline.
dump_stages: Users can specify the list of stages they want to dump. For example: dump_stages: [ ‘preproc’, ‘infer’,’postproc’ ]. Note: When an Android-based schema is present for evaluation, we will always dump the preprocessed files to the disk.
max_parallel_evaluations: Users can control the number of parallel evaluations they want to perform during evaluation. By default, num_parallel_evaluations would be
max(number of CPU cores / 2, number of targets/devices supplied).max_parallel_compilation: Users can control the number of parallel compilations they want to perform during evaluation. By default, num_parallel_compilation would be
number of CPU cores / 2.data_chunk_size: Users can specify the number of samples they want to evaluate at a time for Android/remote targets, which might be resource-constrained (storage/timeout). By default, data_chunk_size will be the same as the number of samples in the configured dataset.
User needs to provide all dataset information under the dataset section in the model config file, failing which, an error is thrown. An example of this is shown below:
dataset:
name: COCO2014
path: '/home/ml-datasets/COCO/2014/'
inputlist_file: inputlist.txt
calibration:
type: index
file: calibration-index.txt
Details of the dataset fields is as follows:
Field |
Description |
|---|---|
name |
Name of the dataset |
path |
Base directory of the dataset files |
inputlist_file |
Text file containing all the pre-processed input files relative to the path field, one input per line.
For models having multiple inputs, the inputs in each line have to be comma separated
|
calibration |
|
The inference engine is used to run the model on multiple inference schemas. A sample inference engine section is shown below, followed by the description of the different configurable entries in the inference section.
inference-engine:
model_path: MLPerfModels/ResNetV1.5/modelFiles/ONNX/resnet50_v1.onnx
simplify_model : True
inference_schemas:
- inference_schema:
name: qnn
precision: quant
target_arch: x86_64-linux-clang
backend: htp
tag: qnn_int8_htp_x86
converter_params:
float_bias_bitwidth: 32
quantizer_params:
param_quantizer_schema: symmetric
act_quantizer_calibration: min-max
use_per_channel_quantization: True
backend_extensions:
vtcm_mb: 4
rpc_control_latency: 100
dsp_arch: v75 #mandatory
inputs_info:
- input_tensor_0:
type: float32
shape: ["*", 3, 224, 224]
outputs_info:
- ArgMax_0:
type: int64
shape: ["*"]
- softmax_tensor_0:
type: float32
shape: ["*", 1001]
Details of each configurable entry is given below:
Field |
Description |
|---|---|
model_path |
Absolute or relative path of the model. If the path is relative, it would be taken relative to MODEL_ZOO_PATH, if set, else absolute path is needed. |
simplify_model |
Flag to enable or disable model simplification for ONNX models. By default, this flag is set to True and the model would be simplified. Note: Model simplification would be skipped for models having custom operators or for inference schemas having quantization_overrides parameter configured. |
inference_schemas |
|
input_info |
|
output_info |
|
Note
Command line options available for config mode are as follows:
qairt-acc-evaluator options
options:
-config CONFIG path to model config yaml
-work_dir WORK_DIR working directory path. default is ./qacc_temp
-onnx_symbol ONNX_SYMBOL [ONNX_SYMBOL ...]
Replace onnx symbols in input/output shapes. Can be passed as list of multiple items.
Default replaced by 1. Example: __unk_200:1
-device_id DEVICE_ID Target device id to be provided
-inference_schema_type INFERENCE_SCHEMA_TYPE
run only the inference schemas with this name. Example: qnn, onnxrt
-inference_schema_tag INFERENCE_SCHEMA_TAG
run only this inference schema tag
-cleanup CLEANUP end: deletes the files after all stages are completed.
intermediate: deletes after previous stage outputs are used. (default:'')
-use_memory_plugins Flag to enable memory plugins.
-use_memory_pipeline Flag to enable memory pipeline. use_memory_plugins is ignored.
-silent Run in silent mode. Do not expect any CLI input from user.
-debug Enable debug logs on console and the file. (default: False)
-set_global SET_GLOBAL [SET_GLOBAL ...]
Option used to override global variables provided in the model configuration. Multiple global variables can be specified.
Example: -set_global count:10 -set_global calib:5 (default: None)
Note
Users can accelerate their evaluations using memory pipeline to minimize unnecessary reading and writing of data during evaluation by passing the -use_memory_pipeline flag to the evaluator command. This feature is currently supported for Linux only.
Config file options
- inference_schema:
name: qnn
target_arch: x86_64-linux-clang
backend: cpu
precision: fp32
tag: qnn_cpu_x86
- inference_schema:
name: qnn
target_arch: aarch64-android
backend: cpu
precision: fp32
tag: qnn_cpu_android
- inference_schema:
name: qnn
target_arch: wos
backend: cpu
precision: fp32
tag: qnn_cpu_x86
- inference_schema:
name: qnn
target_arch: aarch64-android
backend: gpu
precision: fp32
tag: qnn_gpu_android
- inference_schema:
name: qnn
target_arch: x86_64-linux-clang
backend: htp
precision: quant
tag: htp_int8
converter_params:
quantization_overrides: "path to the ext quant json"
quantizer_params:
param_quantizer_calibration: min-max | sqnr
param_quantizer_schema: asymmetric | symmetric
use_per_channel_quantization: True | False
use_per_row_quantization: True | False
act_bitwidth: 8 | 16
bias_bitwidth: 8 | 32
weights_bitwidth: 8 | 4
backend_extensions:
dsp_arch: v79 # mandatory
vtcm_mb: 4
rpc_control_latency: 100
- inference_schema:
name: qnn
target_arch: aarch64-android
backend: htp
precision: quant
tag: htp_int8
converter_params:
quantization_overrides: "path to the ext quant json"
quantizer_params:
param_quantizer_calibration: min-max | sqnr
param_quantizer_schema: asymmetric | symmetric
use_per_channel_quantization: True | False
use_per_row_quantization: True | False
act_bitwidth: 8 | 16
bias_bitwidth: 8 | 32
weights_bitwidth: 8 | 4
backend_extensions:
dsp_arch: v79 # mandatory
vtcm_mb: 4
rpc_control_latency: 100
- inference_schema:
name: qnn
target_arch: wos
backend: htp
precision: quant
tag: htp_int8
converter_params:
quantization_overrides: "path to the ext quant json"
quantizer_params:
param_quantizer_calibration: min-max | sqnr
param_quantizer_schema: asymmetric | symmetric
use_per_channel_quantization: True | False
use_per_row_quantization: True | False
act_bitwidth: 8 | 16
bias_bitwidth: 8 | 32
weights_bitwidth: 8 | 4
backend_extensions:
dsp_arch: v79 # mandatory
vtcm_mb: 4
rpc_control_latency: 100
Verifiers
The verifier section provides information about the verifier being used to compare the inference outputs, in case of multiple inference schemas. A sample verifier section is shown below, followed by the description of the different configurable entries in the section.
verifier:
enabled: True
fetch_top: 1
type: average
tol: 0.01
Details of each configurable entry is given below:
Field |
Description |
|---|---|
verifier |
|
Following are the verifiers that can be used to compare the outputs.
cosine - Comparison between two tensors based on the Cosine Similarity score
average - Comparison between two tensors based on the average difference between the two tensors
l1_norm - Comparison between two tensors based on the L1 Norm of the difference
l2_norm - Comparison between two tensors based on the L2 Norm of the difference
standard_deviation - Comparison between two tensors based on the standard deviation difference
mse - Comparison between two tensors based on the Mean Square Error between the tensors
snr - Signal to Noise Ratio between the two tensors
kl_divergence - KL Divergence value between the two tensors
Plugins
Plugins are Python classes used to implement different stages of the inference pipeline, such as dataset handling, preprocessing, postprocessing, and metrics logic.
Dataset and pre-processing plugins perform transformations to the input before they are passed to inference.
Adapter plugins convert the model’s inference outputs into standard formats for use by subsequent postprocessor or metric plugins. Note: This is applicable only when use_memory_plugins is enabled.
Post-processing plugins transform inference outputs.
Metric plugins analyze inference outputs to assess their accuracy
Sample plugins are provided in the SDK at ${QNN_SDK_ROOT}/lib/python/qti/aisw/accuracy_evaluator/plugins.
Users can implement their own plugins (custom plugins) to meet their specific requirements. To include custom plugins, export the CUSTOM_PLUGIN_PATH environment variable pointing to the location of the custom plugin(s), so that they are also included while registering the plugin(s).
export CUSTOM_PLUGIN_PATH=/path/to/custom/plugins/directory
In the model configuration file, plugins are defined as a transformation chain, as shown below:
transformations:
- plugin:
name: resize
params:
dims: 416,416
channel_order: RGB
type: letterbox
- plugin:
name: normalize
- plugin:
name: convert_nchw
Plugins required for dataset transformation are configured in the dataset section as shown below.
dataset:
name: ILSVRC2012
path: '/home/ml-datasets/imageNet/'
inputlist_file: inputlist.txt
annotation_file: ground_truth.txt
calibration:
type: dataset
file: calibration.txt
transformations:
- plugin:
name: filter_dataset
params:
random: False
max_inputs: -1
max_calib: -1
The preprocessing and postprocessing plugins that the user wishes to use are configured in the processing section as shown below:
preprocessing:
transformations:
- plugin:
name: resize
params:
dims: 416,416
channel_order: RGB
type: letterbox
- plugin:
name: normalize
postprocessing:
squash_results: True
transformations:
- plugin:
name: object_detection
params:
dims: 416,416
type: letterbox
dtypes: [float32, float32, float32, float32]
Metric calculation plugins are configured in the metrics section as shown below.
metrics:
transformations:
- plugin:
name: topk
params:
kval: 1,5
softmax_index: 1
round: 7
label_offset: 1
Plugins that need to be executed for a pipeline stage are listed under ‘transformations’ and preceded by the ‘plugin’ keyword. The following table lists details of each configurable entry for a plugin.
Field |
Description |
|---|---|
name |
Name of the plugin |
params |
Parameters expected and required by the plugin |
A complete list of all plugins and their parameters can be found at Accuracy Evaluator Plugins
Sample Command
qairt-accuracy-evaluator -config {path to configs}/qnn_resnet50_config.yaml
Results
The tool displays a table with quantization options ordered by output match based on the selected verifier and also generates a csv file with the same data. The comparator column shows output match percentage/value based on the selected verifier.The quant params column displays the quantization params used for that run. Other columns also show backend, runtime/compile params used. The information is also stored in a csv file at {work_dir}/metrics-info.csv.
Artifacts associated with each of the configured quantization option are stored at ` {work_dir}/infer/schema{i}_qnn_{backend}_{precision}_{j}`. Model outputs are stored at ` {work_dir}/infer/schema{i}_qnn_{backend}_{precision}_{j}/Result_{k}`.
Note
Snapshot of console log has been added for clarity.
Note
Snapshot of csv file has been added for clarity.
qnn-architecture-checker (Beta)¶
Architecture Checker is a tool made for models running with HTP backend, including quantized 8-bit, quantized 16-bit and FP16 models. It outputs a list of issues in the model that keep the model from getting better performance while running on the HTP backend. Architecture checker tool can be invoked with the modifier feature which will apply the recommended modifications for these issues. This will help in visualizing the changes that can be applied to the model to make it a better fit on the HTP backend.
X86-Linux/ WSL Usage:
$ qnn-architecture-checker -i <path>/model.json
-b <optional_path>/model.bin
-o <optional_output_path>
-m <optional_modifier_argument>
X86-Windows/ Windows on Snapdragon Usage:
$ python qnn-architecture-checker -i <path>/model.json
-b <optional_path>/model.bin
-o <optional_output_path>
-m <optional_modifier_argument>
required arguments:
-i INPUT_JSON, --input_json INPUT_JSON
Path to json file
optional arguments:
-b BIN, --bin BIN
Path to a bin file
-o OUTPUT_PATH, --output_path OUTPUT_PATH
Path where the output csv should be saved. If not specified, the output csv will be written to the same path as the input file
-m MODIFY, --modify MODIFY
The query to select the modifications to apply.
--modify or --modify show - To see all the possible modifications. Display list of rule names and details of the modifications.
--modify all - To apply all the possible modifications found for the model.
--modify apply=rule_name1,rule_name2 - To apply modifications for specified rule names. The list of rules should be comma separated without spaces
The output is a csv file and will be saved as <optional_output_path>/<model_name>_architecture_checker.csv. An example output is shown below:
Graph/Node_name |
Issue |
Recommendation |
Type |
Input_tensor_name:[dims] |
Output_tensor_name:[dims] |
Parameters |
Previous node |
Next nodes |
Modification |
Modification_info |
|
|---|---|---|---|---|---|---|---|---|---|---|---|
1 |
Graph |
This model uses 16-bit activation data. 16-bit activation data takes twice the amount of memory than 8-bit activation data does. |
Try to use a smaller datatype to get better performance. E.g., 8-bit |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
2 |
Node_name_1 |
The number of channels in the input/output tensor of this convolution node is low (smaller than 32). |
Try increasing the number of channels in the input/output tensor to 32 or greater to get better performance. |
Conv2d |
input_1:[1, 250, 250, 3], __param_1:[5, 5, 3, 32], convolution_0_bias:[32] |
output_1:[1, 123, 123, 32] |
{‘package’: ‘qti.aisw’, ‘type’: ‘Conv2d’, …} |
[‘previous_node_name’] |
[‘next_node_name1’, ‘next_node_name2’] |
N/A |
N/A |
Sample Command
qnn-architecture-checker --input_json ./model_net.json
--bin ./model.bin
--output_path ./archCheckerOutput
Architecture Checker - Model Modifier
For appying modifications to the model, the Architecture Checker can be invoked with “–modify” or “–modify show” which will display a list of possible modifications. In this case, the Architecture Checker tool will only show the rule names and modification detail. It will run without making any changes to the model and generate the csv output. Using the rule names from the above run, the Architecture Checker can be invoked with “–modify all” or “–modify apply=rule_name1,rule_name2”. In this case, the rule specific changes will be applied to the model and the changes can be viewed in the updated model json. Additionally, the output csv will also contain information related to the modifications.
Consider the below csv output generated after applying “–modify apply=elwisediv” modification on an example model.
Graph/Node_name |
Issue |
Recommendation |
Type |
Input_tensor_name:[dims] |
Output_tensor_name:[dims] |
Parameters |
Previous node |
Next nodes |
Modification |
Modification_info |
|
|---|---|---|---|---|---|---|---|---|---|---|---|
1 |
Node_name_1 |
ElementWiseDivide usually has poor performance compared to ElementWiseMultiply. |
Try replacing ElementWiseDivide with ElementWiseMultiply using the reciprocal value to get better performance. |
Eltwise_Binary |
input_1:[1, 52, 52, 6], input_2:[1] |
output_1:[1, 52, 52, 6] |
{‘package’: ‘qti.aisw’, ‘eltwise_type’: ‘ElementWiseDivide’, …} |
[‘previous_node_name’] |
[‘next_node_name1’, ‘next_node_name2’] |
Done |
ElementWiseDivide has been replaced by ElementWiseMultiply using the reciprocal value |
2 |
Node_name_2 |
The number of channels in the input/output tensor of this convolution node is low (smaller than 32). |
Try increasing the number of channels in the input/output tensor to 32 or greater to get better performance. |
Conv2d |
input_3:[1, 250, 250, 3], __param_1:[5, 5, 3, 32], convolution_1_bias:[32] |
output_2:[1, 123, 123, 32] |
{‘package’: ‘qti.aisw’, ‘type’: ‘Conv2d’, …} |
[‘previous_node_name’] |
[‘next_node_name1’, ‘next_node_name2’] |
N/A |
N/A |
Following are the commands to invoke Architecture Checker with Modifier to display list of modifications:
Sample Command
qnn-architecture-checker --input_json ./model_net.json
--bin ./model.bin
--output_path ./archCheckerOutput
--modify
Sample Command
qnn-architecture-checker --input_json ./model_net.json
--bin ./model.bin
--output_path ./archCheckerOutput
--modify show
Following are the commands to apply the modifications either on all possible modifications or specific rules:
Sample Command
qnn-architecture-checker --input_json ./model_net.json
--bin ./model.bin
--output_path ./archCheckerOutput
--modify all
Sample Command
qnn-architecture-checker --input_json ./model_net.json
--bin ./model.bin
--output_path ./archCheckerOutput
--modify apply=prelu,elwisediv
qnn-accuracy-debugger (Beta)¶
Dependencies
The Accuracy Debugger depends on the setup outlined in Setup. In particular, the following are required:
Platform dependencies are need to be met as per Platform Dependencies
The desired ML frameworks need to be installed. Accuracy debugger is verified to work with the ML framework versions mentioned at Environment Setup
The following environment variables are used inside this guide (User may change the following path depending on their needs):
RESOURCESPATH = {Path to the directory where all models and input files reside}
PROJECTREPOPATH = {Path to your accuracy debugger project directory}
Supported models
The qnn-accuracy-debugger currently supports ONNX, TFLite, and Tensorflow 1.x models. Pytorch models are supported only in oneshot-layerwise debugging algorithm of tool.
Overview
The accuracy-debugger tool finds inaccuracies in a neural-network at the layer level. The tool compares the golden outputs produced by running a model through a specific ML framework (ie. Tensorflow, Onnx, TFlite) with the results produced by running the same model through Qualcomm’s QNN Inference Engine. The inference engine can be run on a variety of computing mediums including GPU, CPU and DSP.
The following features are available in Accuracy Debugger. Each feature can be run with its corresponding option; for example, qnn-accuracy-debugger --{option}.
qnn-accuracy-debugger -–framework_runner This feature uses a ML framework e.g. tensorflow, tflite or onnx, to run the model to get intermediate outputs. Note: The argument –framewok_diagnosis has been replaced by –framework_runner. –framework_diagnosis will be deprecated in the future release.
qnn-accuracy-debugger –-inference_engine This feature uses the QNN engine to run a model to retrieve intermediate outputs.
qnn-accuracy-debugger –-verification This feature compares the output generated by the framework runner and inference engine features using verifiers such as CosineSimilarity, RtolAtol, etc.
qnn-accuracy-debugger –compare_encodings This feature extracts encodings from a given QNN net JSON file, compares them with the given AIMET encodings, and outputs an Excel sheet highlighting mismatches.
qnn-accuracy-debugger –tensor_inspection This feature compares given target outputs with reference outputs.
qnn-accuracy-debugger –quant_checker This feature analyzes the activations, weights, and biases of all the possible quantization options available in the qnn-converters for each subsequent layer of a given model.
- Tip:
You can use –help after the bin commands to see what other options (required or optional) you can add.
If no option is provided, Accuracy Debugger runs framework_runner, inference_engine, and verification sequentially.
Below are the instructons for running the Accuracy Debugger:
Framework Runner¶
The Framework Runner feature is designed to run models with different machine learning frameworks (e.g. Tensorflow, etc). A selected model is run with a specific ML framework. Golden outputs are produced for future comparison with inference results from the Inference Engine step.
Usage¶
usage: qnn-accuracy-debugger --framework_runner [-h]
-f FRAMEWORK [FRAMEWORK ...]
-m MODEL_PATH
-i INPUT_TENSOR [INPUT_TENSOR ...]
-o OUTPUT_TENSOR
[-w WORKING_DIR]
[--output_dirname OUTPUT_DIRNAME]
[-v]
[--disable_graph_optimization]
[--onnx_custom_op_lib ONNX_CUSTOM_OP_LIB]
[--add_layer_outputs ADD_LAYER_OUTPUTS]
[--add_layer_types ADD_LAYER_TYPES]
[--skip_layer_types SKIP_LAYER_TYPES]
[--skip_layer_outputs SKIP_LAYER_OUTPUTS]
[--start_layer START_LAYER]
[--end_layer END_LAYER]
[--use_native_output_files]
Script to generate intermediate tensors from an ML Framework.
optional arguments:
-h, --help show this help message and exit
required arguments:
-f FRAMEWORK [FRAMEWORK ...], --framework FRAMEWORK [FRAMEWORK ...]
Framework type and version, version is optional. Currently
supported frameworks are ["tensorflow","onnx","tflite"] case
insensitive but spelling sensitive
-m MODEL_PATH, --model_path MODEL_PATH
Path to the model file(s).
-i INPUT_TENSOR [INPUT_TENSOR ...], --input_tensor INPUT_TENSOR [INPUT_TENSOR ...]
The name, dimensions, raw data, and optionally data
type of the network input tensor(s) specifiedin the
format "input_name" comma-separated-dimensions path-
to-raw-file, for example: "data" 1,224,224,3 data.raw
float32. Note that the quotes should always be
included in order to handle special characters,
spaces, etc. For multiple inputs specify multiple
--input_tensor on the command line like:
--input_tensor "data1" 1,224,224,3 data1.raw
--input_tensor "data2" 1,50,100,3 data2.raw float32.
-o OUTPUT_TENSOR, --output_tensor OUTPUT_TENSOR
Name of the graph's specified output tensor(s).
optional arguments:
-w WORKING_DIR, --working_dir WORKING_DIR
Working directory for the framework_runner to store
temporary files. Creates a new directory if the
specified working directory does not exist
--output_dirname OUTPUT_DIRNAME
output directory name for the framework_runner to
store temporary files under
<working_dir>/framework_runner. Creates a new
directory if the specified working directory does not
exist
-v, --verbose Verbose printing
--disable_graph_optimization
Disables basic model optimization
--onnx_custom_op_lib ONNX_CUSTOM_OP_LIB
path to onnx custom operator library
(below options are supported only for onnx and ignored for other frameworks)
--add_layer_outputs ADD_LAYER_OUTPUTS
Output layers to be dumped. example:1579,232
--add_layer_types ADD_LAYER_TYPES
outputs of layer types to be dumped. e.g
:Resize,Transpose. All enabled by default.
--skip_layer_types SKIP_LAYER_TYPES
comma delimited layer types to skip snooping. e.g
:Resize, Transpose
--skip_layer_outputs SKIP_LAYER_OUTPUTS
comma delimited layer output names to skip debugging.
e.g :1171, 1174
--start_layer START_LAYER
save all intermediate layer outputs from provided
start layer to bottom layer of model
--end_layer END_LAYER
save all intermediate layer outputs from top layer to
provided end layer of model
--use_native_output_files
Dumps outputs as per framework model's actual data types.
Please note: All the command line arguments should either be provided through command line or through the config file. They will not override those in the config file if there is overlap.
Sample Commands
qnn-accuracy-debugger \
--framework_runner \
--framework tensorflow \
--model_path $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen.pb \
--input_tensor "input:0" 1,299,299,3 $RESOURCESPATH/samples/InceptionV3Model/data/chairs.raw \
--output_tensor InceptionV3/Predictions/Reshape_1:0
qnn-accuracy-debugger \
--framework_runner \
--framework onnx \
--model_path $RESOURCESPATH/samples/dlv3onnx/dlv3plus_mbnet_513-513_op9_mod_basic.onnx \
--input_tensor Input 1,3,513,513 $RESOURCESPATH/samples/dlv3onnx/data/00000_1_3_513_513.raw \
--output_tensor Output
To run model with custom operator:
qnn-accuracy-debugger \
--framework_runner \
--framework onnx \
-input_tensor "image" 1,3,640,640 $RESOURCESPATH/models/yolov3/batched-inp-107-0.raw \
--model_path $RESOURCESPATH/models/yolov3/yolov3_640_640_with_abp_qnms.onnx \
--output_tensor detection_boxes \
--onnx_custom_op_lib $RESOURCESPATH/models/libCustomQnmsYoloOrt.so
- TIP:
a working_directory, if not otherwise specified, is generated from wherever you are calling the script from; it is recommended to call all scripts from the same directory so all your outputs and results are stored under the same directory without having outputs everywhere
for tensorflow it is sometimes necessary to add the :0 after the input and output node name to signify the index of the node. Notice the :0 is dropped for onnx models.
Output
The program also creates a directory named latest in working_directory/framework_runner which is symbolically linked to the most recently generated directory. In the example below, latest will have data that is symlinked to the data in the most recent directory YYYY-MM-DD_HH:mm:ss. Users may choose to override the directory name by passing it to –output_dirname (i.e. –output_dirname myTest1Ouput).
The float data produced by the Framework Runner step offers precise reference material for the Verification component to diagnose the accuracy of the network generated by the Inference Engine. Unless a path is otherwise specified, the Accuracy Debugger will create directories within the working_directory/framework_runner directory found in the current working directory. The directories will be named with the date and time of the program’s execution, and contain tensor data. Depending on the tensor naming convention of the model, there may be numerous sub-directories within the new directory. This occurs when tensor names include a slash “/”. For example, for the tensor names ‘inception_3a/1x1/bn/sc’, ‘inception_3a/1x1/bn/sc_internal’ and ‘inception_3a/1x1/bn’, subdirectories will be generated.
The figure above shows a sample output from a framework_runner run. InceptionV3 and Logits contain the outputs of each layer before the last layer. Each output directory contains the .raw files corresponding to each node. Every raw file that can be seen is the output of an operation. The outputs of the final layer are saved inside the Predictions directory. The file framework_runner_options.json contains all the options used to run this feature.
Inference Engine¶
The Inference Engine feature is designed to find the outputs for a QNN model. The output produced by this step can be compared with the golden outputs produced by the framework runner step.
Usage¶
usage: qnn-accuracy-debugger --inference_engine [-h]
-l INPUT_LIST
-r {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,htp}
-a {x86_64-linux-clang,aarch64-android,wos-remote,x86_64-windows-msvc,wos}
[--stage {source,converted,compiled}]
[-i INPUT_TENSOR [INPUT_TENSOR ...]]
[-o OUTPUT_TENSOR] [-m MODEL_PATH]
[-f FRAMEWORK [FRAMEWORK ...]]
[-qmcpp QNN_MODEL_CPP_PATH]
[-qmbin QNN_MODEL_BIN_PATH]
[-qmb QNN_MODEL_BINARY_PATH] [-p ENGINE_PATH]
[-e ENGINE_NAME [ENGINE_VERSION ...]]
[--deviceId DEVICEID] [-v]
[--host_device {x86,x86_64-windows-msvc,wos}]
[-w WORKING_DIR]
[--output_dirname OUTPUT_DIRNAME]
[--debug_mode_off] [-bbw {8,32}] [-abw {8,16}]
[-wbw {8,16}] [-nif] [-nof]
[-qo QUANTIZATION_OVERRIDES]
[--golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY]
[-mn MODEL_NAME] [--args_config ARGS_CONFIG]
[--print_version PRINT_VERSION]
[--perf_profile {low_balanced,balanced,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}]
[--offline_prepare]
[--extra_converter_args EXTRA_CONVERTER_ARGS]
[--extra_runtime_args EXTRA_RUNTIME_ARGS]
[--remote_server REMOTE_SERVER]
[--remote_username REMOTE_USERNAME]
[--remote_password REMOTE_PASSWORD]
[--float_fallback]
[--profiling_level {basic,detailed,backend}]
[--lib_name LIB_NAME] [-bd BINARIES_DIR]
[-pq {tf,enhanced,adjusted,symmetric}]
[--act_quantizer {tf,enhanced,adjusted,symmetric}]
[--act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}]
[--param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}]
[--act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}]
[--param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}]
[--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE]
[-fbw {16,32}] [-rqs RESTRICT_QUANTIZATION_STEPS]
[--algorithms ALGORITHMS] [--ignore_encodings]
[--per_channel_quantization]
[--log_level {error,warn,info,debug,verbose}]
[--qnn_model_net_json QNN_MODEL_NET_JSON]
[--qnn_netrun_config_file QNN_NETRUN_CONFIG_FILE]
[--compiler_config COMPILER_CONFIG]
[--context_config_params CONTEXT_CONFIG_PARAMS]
[--graph_config_params GRAPH_CONFIG_PARAMS]
[--start_layer START_LAYER]
[--end_layer END_LAYER]
[--add_layer_outputs ADD_LAYER_OUTPUTS]
[--add_layer_types ADD_LAYER_TYPES]
[--skip_layer_types SKIP_LAYER_TYPES]
[--skip_layer_outputs SKIP_LAYER_OUTPUTS]
[--extra_contextbin_args EXTRA_CONTEXTBIN_ARGS]
[--precision {int8,fp16,fp32}]
Script to run QNN inference engine.
options:
-h, --help show this help message and exit
Core Arguments:
--stage {source,converted,compiled}
Specifies the starting stage in the Accuracy Debugger
pipeline.
Source: starting with a source framework.
Converted: starting with a model's .cpp and .bin files.
Compiled: starting with a model's .so binary
-l INPUT_LIST, --input_list INPUT_LIST
Path to the input list text.
-r {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,htp}, --runtime {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,htp}
Runtime to be used. Please use htp runtime for
emulation on x86 host
-a {x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}, --architecture {x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}
Name of the architecture to use for inference engine.
Arguments required for SOURCE stage:
-i INPUT_TENSOR [INPUT_TENSOR ...], --input_tensor INPUT_TENSOR [INPUT_TENSOR ...]
The name, dimension, and raw data of the network input
tensor(s) specified in the format "input_name" comma-
separated-dimensions path-to-raw-file, for example:
"data" 1,224,224,3 data.raw. Note that the quotes
should always be included in order to handle special
characters, spaces, etc. For multiple inputs specify
multiple --input_tensor on the command line like:
--input_tensor "data1" 1,224,224,3 data1.raw
--input_tensor "data2" 1,50,100,3 data2.raw.
-o OUTPUT_TENSOR, --output_tensor OUTPUT_TENSOR
Name of the graph's output tensor(s).
-m MODEL_PATH, --model_path MODEL_PATH
Path to the model file(s).
-f FRAMEWORK [FRAMEWORK ...], --framework FRAMEWORK [FRAMEWORK ...]
Framework type to be used, followed optionally by
framework version.
Arguments required for CONVERTED stage:
-qmcpp QNN_MODEL_CPP_PATH, --qnn_model_cpp_path QNN_MODEL_CPP_PATH
Path to the qnn model .cpp file
-qmbin QNN_MODEL_BIN_PATH, --qnn_model_bin_path QNN_MODEL_BIN_PATH
Path to the qnn model .bin file
Arguments required for COMPILED stage:
-qmb QNN_MODEL_BINARY_PATH, --qnn_model_binary_path QNN_MODEL_BINARY_PATH
Path to the qnn model .so binary.
Optional Arguments:
-p ENGINE_PATH, --engine_path ENGINE_PATH
Path to the inference engine.
-e ENGINE_NAME [ENGINE_VERSION ...], --engine ENGINE_NAME [ENGINE_VERSION ...]
Name of engine that will be running inference,
optionally followed by the engine version. Used here
for tensor_mapping.
--deviceId DEVICEID The serial number of the device to use. If not
available, the first in a list of queried devices will
be used for validation.
-v, --verbose Verbose printing
--host_device {x86,x86_64-windows-msvc,wos}
The device that will be running conversion. Set to x86
by default.
-w WORKING_DIR, --working_dir WORKING_DIR
Working directory for the inference_engine to store
temporary files. Creates a new directory if the
specified working directory does not exist
--output_dirname OUTPUT_DIRNAME
output directory name for the inference_engine to
store temporary files under
<working_dir>/inference_engine .Creates a new
directory if the specified working directory does not
exist
--debug_mode_off Specifies if wish to turn off debug_mode mode.
-bbw {8,32}, --bias_bitwidth {8,32}
option to select the bitwidth to use when quantizing
the bias. default 8
-abw {8,16}, --act_bitwidth {8,16}
option to select the bitwidth to use when quantizing
the activations. default 8
-wbw {8,16}, --weights_bitwidth {8,16}
option to select the bitwidth to use when quantizing
the weights. default 8
-nif, --use_native_input_files
Specifies that the input files will be parsed in the
data type native to the graph. If not specified, input
files will be parsed in floating point.
-nof, --use_native_output_files
Specifies that the output files will be generated in
the data type native to the graph. If not specified,
output files will be generated in floating point.
-qo QUANTIZATION_OVERRIDES, --quantization_overrides QUANTIZATION_OVERRIDES
Path to quantization overrides json file.
--golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY, --golden_dir_for_mapping GOLDEN_OUTPUT_REFERENCE_DIRECTORY
Optional parameter to indicate the directory of the
goldens, it's used for tensor mapping without
framework.
-mn MODEL_NAME, --model_name MODEL_NAME
Name of the desired output sdk specific model
--args_config ARGS_CONFIG
Path to a config file with arguments. This can be used
to feed arguments to the AccuracyDebugger as an
alternative to supplying them on the command line.
--print_version PRINT_VERSION
Print the QNN SDK version alongside the output.
--perf_profile {low_balanced,balanced,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}
--offline_prepare Use offline prepare to run QNN model.
--extra_converter_args EXTRA_CONVERTER_ARGS
additional converter arguments in a quoted string.
example: --extra_converter_args 'input_dtype=data
float;input_layout=data1 NCHW'
--extra_runtime_args EXTRA_RUNTIME_ARGS
additional net runner arguments in a quoted string.
example: --extra_runtime_args
'arg1=value1;arg2=value2'
--remote_server REMOTE_SERVER
ip address of remote machine
--remote_username REMOTE_USERNAME
username of remote machine
--remote_password REMOTE_PASSWORD
password of remote machine
--float_fallback Use this option to enable fallback to floating point
(FP) instead of fixed point. This option can be paired
with --float_bitwidth to indicate the bitwidth for FP
(by default 32). If this option is enabled, then input
list must not be provided and --ignore_encodings must
not be provided. The external quantization encodings
(encoding file/FakeQuant encodings) might be missing
quantization parameters for some interim tensors.
First it will try to fill the gaps by propagating
across math-invariant functions. If the quantization
params are still missing, then it will apply fallback
to nodes to floating point.
--profiling_level {basic,detailed,backend}
Enables profiling and sets its level.
--lib_name LIB_NAME Name to use for model library (.so file or .dll file)
-bd BINARIES_DIR, --binaries_dir BINARIES_DIR
Directory to which to save model binaries, if they
don't yet exist.
-pq {tf,enhanced,adjusted,symmetric}, --param_quantizer {tf,enhanced,adjusted,symmetric}
Param quantizer algorithm used.
--act_quantizer {tf,enhanced,adjusted,symmetric}
Optional parameter to indicate the activation
quantizer to use
--act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}
Specify which quantization calibration method to use
for activations. This option has to be paired with
--act_quantizer_schema.
--param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}
Specify which quantization calibration method to use
for parameters. This option has to be paired with
--param_quantizer_schema.
--act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}
Specify which quantization schema to use for
activations. Can not be used together with
act_quantizer. Note: This argument mandates
--act_quantizer_calibration to be passed
--param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}
Specify which quantization schema to use for
parameters. Can not be used together with
param_quantizer. Note: This argument mandates
--param_quantizer_calibration to be passed
--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE
Value must lie between 90-100
-fbw {16,32}, --float_bias_bitwidth {16,32}
option to select the bitwidth to use when biases are
in float. default 32
-rqs RESTRICT_QUANTIZATION_STEPS, --restrict_quantization_steps RESTRICT_QUANTIZATION_STEPS
ENCODING_MIN, ENCODING_MAX Specifies the number of
steps to use for computing quantization encodings such
that scale = (max - min) / number of quantization
steps. The option should be passed as a space
separated pair of hexadecimal string minimum and
maximum values. i.e. --restrict_quantization_steps
'MIN MAX'. Please note that this is a hexadecimal
string literal and not a signed integer, to supply a
negative value an explicit minus sign is required.
E.g.--restrict_quantization_steps '-0x80 0x7F'
indicates an example 8 bit range,
--restrict_quantization_steps '-0x8000 0x7F7F'
indicates an example 16 bit range.
--algorithms ALGORITHMS
Use this option to enable new optimization algorithms.
Usage is: --algorithms <algo_name1> ... The available
optimization algorithms are: 'cle ' - Cross layer
equalization includes a number of methods for
equalizing weights and biases across layers in order
to rectify imbalances that cause quantization errors.
--ignore_encodings Use only quantizer generated encodings, ignoring any
user or model provided encodings.
--per_channel_quantization
Use per-channel quantization for convolution-based op
weights.
--log_level {error,warn,info,debug,verbose}
Enable verbose logging.
--qnn_model_net_json QNN_MODEL_NET_JSON
Path to the qnn model net json. Only necessary if it's
being run from the converted stage. It has information
about what structure the data is in within the
framework_runner and inference_engine steps. This file
is required to generate the model_graph_struct.json
file which is used by the verification stage.
--qnn_netrun_config_file QNN_NETRUN_CONFIG_FILE
allow backend_extention features to be applied during
qnn-net-run
--compiler_config COMPILER_CONFIG
Path to the compiler config file.
--context_config_params CONTEXT_CONFIG_PARAMS
optional context config params in a quoted string.
example: --context_config_params
'context_priority=high;
cache_compatibility_mode=strict'
--graph_config_params GRAPH_CONFIG_PARAMS
optional graph config params in a quoted string.
example: --graph_config_params 'graph_priority=low;
graph_profiling_num_executions=10'
--start_layer START_LAYER
save all intermediate layer outputs from provided
start layer to bottom layer of model. Can be used in
conjunction with --end_layer.
--end_layer END_LAYER
save all intermediate layer outputs from top layer to
provided end layer of model. Can be used in
conjunction with --start_layer.
--add_layer_outputs ADD_LAYER_OUTPUTS
Output layers to be dumped. example:1579,232
--add_layer_types ADD_LAYER_TYPES
outputs of layer types to be dumped. e.g
:Resize,Transpose. All enabled by default.
--skip_layer_types SKIP_LAYER_TYPES
comma delimited layer types to skip snooping. e.g
:Resize, Transpose
--skip_layer_outputs SKIP_LAYER_OUTPUTS
comma delimited layer output names to skip debugging.
e.g :1171, 1174
--extra_contextbin_args EXTRA_CONTEXTBIN_ARGS
additional context binary generator arguments in a
quoted string. example: --extra_contextbin_args
'arg1=value1;arg2=value2'
--precision {int8,fp16,fp32}
Choose the precision. Default is int8. Note: This
option is not applicable when --stage is set to
converted or compiled.
Please note: All the command line arguments should either be provided through command line or through the config file. They will not override those in the config file if there is overlap.
The inference engine config file can be found in {accuracy_debugger tool root directory}/python/qti/aisw/accuracy_debugger/lib/inference_engine/configs/config_files and is a JSON file. This config file stores information that helps the inference engine determine which tool and parameters to read in.
Sample Command
qnn-accuracy-debugger \
--inference_engine \
--framework tensorflow \
--runtime dspv73 \
--model_path $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen.pb \
--input_tensor "input:0" 1,299,299,3 $RESOURCESPATH/samples/InceptionV3Model/data/chairs.raw \
--output_tensor InceptionV3/Predictions/Reshape_1 \
--architecture x86_64-linux-clang \
--input_list $RESOURCESPATH/samples/InceptionV3Model/data/image_list.txt \
--verbose
Sample Command
qnn-accuracy-debugger \
--inference_engine \
--framework tensorflow \
--runtime dspv73 \
--host_device wos \
--model_path <RESOURCESPATH>\InceptionV3Model\inception_v3_2016_08_28_frozen.pb \
--input_tensor "input:0" 1,299,299,3 <RESOURCESPATH>\samples\InceptionV3Model\data\chairs.raw \
--output_tensor InceptionV3\Predictions\Reshape_1 \
--architecture wos \
--input_list <RESOURCESPATH>\samples\InceptionV3Model\data\image_list.txt \
--verbose
Sample Command
qnn-accuracy-debugger \
--inference_engine \
--framework tensorflow \
--runtime cpu \
--host_device x86_64-windows-msvc \
--model_path <RESOURCESPATH>\InceptionV3Model\inception_v3_2016_08_28_frozen.pb \
--input_tensor "input:0" 1,299,299,3 <RESOURCESPATH>\samples\InceptionV3Model\data\chairs.raw \
--output_tensor InceptionV3\Predictions\Reshape_1 \
--architecture x86_64-windows-msvc \
--input_list <RESOURCESPATH>\samples\InceptionV3Model\data\image_list.txt \
--verbose
Sample Command
qnn-accuracy-debugger \
--inference_engine \
--deviceId 357415c4 \
--framework tensorflow \
--runtime dspv73 \
--architecture aarch64-android \
--model_path $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen.pb \
--input_tensor "input:0" 1,299,299,3 $RESOURCESPATH/samples/InceptionV3Model/data/chairs.raw \
--output_tensor InceptionV3/Predictions/Reshape_1 \
--input_list $RESOURCESPATH/samples/InceptionV3Model/data/image_list.txt \
--verbose
- Tip:
for runtime (choose from ‘cpu’, ‘gpu’, ‘dsp’, ‘dspv65’, ‘dspv66’, ‘dspv68’, ‘dspv69’, ‘dspv73’, ‘htp’). Make sure the runtime is 73 for kailua, 69 for waipio, etc. Choose HTP runtime for emulation on x86 host.
the input_tensor (–i) and output_tensor (-o) does not need the :0 indexing like when runing tensorflow framework runner
two files, namely tensor_mapping.json and qnn_model_graph_struct.json are generated to be used in verification, be sure to locate these 2 files in the working_directory/inference_engine/latest
Before running the qnn-accuracy-debugger on a Windows x86 system/Windows on Snapdragon system, ensure that you have configured the environment. And, Specify the host and target machine as x86_64-windows-msvc/wos respectively.
Note that qnn-accuracy-debugger on Windows x86 system is tested only for CPU runtime currently.
More example commands running from different stages:
Sample Command
source file stage: same as example from above section (stage default is "source")
running from converted stage (x86):
qnn-accuracy-debugger \
--inference_engine \
--stage converted \
-qmcpp $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen_qnn_model.cpp \
-qmbin $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen_qnn_model.bin \
--runtime dspv73 \
--architecture x86_64-linux-clang \
--input_list $RESOURCESPATH/samples/InceptionV3Model/data/image_list.txt \
--qnn_model_net_json $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen_qnn_model_net.json \
--verbose \
--framework tensorflow \
--golden_output_reference_directory $RESOURCESPATH/samples/InceptionV3Model/golden_from_framework_runner/
Android Devices (ie. MTP):
qnn-accuracy-debugger \
--inference_engine \
--stage converted \
-qmcpp $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen_qnn_model.cpp \
-qmbin $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen_qnn_model.bin \
--deviceId f366ce60 \
--runtime dspv73 \
--architecture aarch64-android \
--input_list $RESOURCESPATH/samples/InceptionV3Model/data/image_list.txt \
--qnn_model_net_json $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen_qnn_model_net.json \
--verbose \
--framework tensorflow \
--golden_output_reference_directory $RESOURCESPATH/samples/InceptionV3Model/golden_from_framework_runner/
running in compiled stage (x86):
qnn-accuracy-debugger \
--inference_engine \
--stage compiled \
--qnn_model_binary $RESOURCESPATH/samples/InceptionV3Model/qnn_model_binaries/x86_64-linux-clang/libqnn_model.so \
--runtime dspv73 \
--architecture x86_64-linux-clang \
--input_list $RESOURCESPATH/samples/InceptionV3Model/data/image_list.txt \
--verbose \
--qnn_model_net_json $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen_qnn_model_net.json \
--golden_output_reference_directory $RESOURCESPATH/samples/InceptionV3Model/golden_from_framework_runner/
running in compiled stage (wos):
qnn-accuracy-debugger \
--inference_engine \
--stage compiled \
--qnn_model_binary <RESOURCESPATH>\samples\InceptionV3Model\qnn_model_binaries\x86_64-linux-clang\libqnn_model.so \
--runtime dspv73 \
--architecture wos \
--input_list <RESOURCESPATH>\samples\InceptionV3Model\data\image_list.txt \
--verbose \
--qnn_model_net_json <RESOURCESPATH>\samples\InceptionV3Model\inception_v3_2016_08_28_frozen_qnn_model_net.json \
--golden_output_reference_directory <RESOURCESPATH>\samples\InceptionV3Model\golden_from_framework_runner\
Android devices (ie MTP):
qnn-accuracy-debugger \
--inference_engine \
--stage compiled \
--qnn_model_binary $RESOURCESPATH/samples/InceptionV3Model/qnn_model_binaries/aarch64-android/libqnn_model.so \
--runtime dspv73 \
--architecture aarch64-android \
--input_list $RESOURCESPATH/samples/InceptionV3Model/data/image_list.txt \
--verbose \
--qnn_model_net_json $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen_qnn_model_net.json \
--framework tensorflow \
--golden_output_reference_directory $RESOURCESPATH/samples/InceptionV3Model/golden_from_framework_runner/
To run onnx model with custom operator:
qnn-accuracy-debugger \
--inference_engine \
--framework onnx \
--runtime dspv75
--architecture aarch64_android \
--model_path $RESOURCESPATH/AISW-77095/model.onnx \
--input_tensor "image" 1,3,640,1794 $RESOURCESPATH/inputs/image.raw \
--output_tensor uncertainty_jacobian_bb \
--input_list $RESOURCESPATH/input_list.txt \
--default_verifier mse \
--engine QNN \
--engine_path $QNN_SDK_ROOT \
--extra_converter_args 'op_package_config=$RESOURCESPATH/CustomPreTopKOpPackageCPU_v2.xml;op_package_lib=$RESOURCESPATH/libCustomPreTopKOpPackageHtp.so:CustomPreTopKOpPackageHtpInterfaceProvider:' \
--extra_contextbin_args 'op_packages=$RESOURCESPATH/libQnnCustomPreTopKOpPackageHtp.so:CustomPreTopKOpPackageHtpInterfaceProvider:' \
--extra_runtime_args 'op_packages=$RESOURCESPATH/AISW-77095/libQnnCustomPreTopKOpPackageHtp_v75.so:CustomPreTopKOpPackageHtpInterfaceProvider' \
--debug_mode_off \
--offline_prepare \
--verbose
- Tip:
The qnn_model_net_json file is not required to run this step. However, it is needed to build the qnn_model_graph_struct.json, which can be used in the Verification step. The model_net.json file is generated when the original model is converted into a converted model. Hence if you are debugging this model from the converted model stage, it is recommended to ask for this model_net.json file.
framework and golden_dir_for_mapping, or just golden_dir_for_mapping itself is an alternative to the original model to be provided to generate the tensor_mapping.json. However, providing only the golden_dir_for_mapping, the get_tensor_mapping module will try its best to map but it is not guaranteed this mapping will be 100% accurate.
Output
Once the inference engine has finished running, it will store the output in the specified directory (or the current working directory by default) and store the files in that directory. By default, it will store the output in working_directory/inference_engine in the current working directory.
The figure above shows the sample output from one of the runs of inference engine step. The output directory contains raw files. Each raw file is an output of an operation in the network. The model.bin and model.cpp files are created by the model converter. The qnn_model_binaries directory contains the .so file that is generated by the modellibgenerator utility. The file image_list.txt contains the path for sample test images. The inference_engine_options.json file contains all the options with which this run was launched. In addition to generating the .raw files, the inference_engine also generates the model’s graph structure in a .json file. The name of the file is the same as the name of the protobuf model file. The model_graph_struct.json aids in providing structure related information of the converted model graph during the verification step. Specifically, it helps with organizing the nodes in order (for i.e. the beginning nodes should come earlier than ending nodes). The model_net.json has information about what structure the data is in within the framework_runner and inference_engine steps (data can be in different formats for e.g. channels first vs channels last). The verification step uses this information so that data can be properly transposed and compared. It is an optional parameter which can be provided during inference engine step for generating the model_graph_struct.json file (mandated only when running inference engine from the converted stage). Finally, the tensor_mapping file contains a mapping of the various intermediate output file names generated from the framework runner step and the inference engine step.
The created .raw files are organized in the same manner as framework_runner (see above).
Verification¶
The Verification step compares the output (from the intermediate tensors of a given model) produced by the framework runner step with the output produced by the inference engine step. Once the comparison is complete, the verification results are compiled and displayed visually in a format that can be easily interpreted by the user.
There are different types of verifiers for e.g.: CosineSimilarity, RtolAtol, etc. To see available verifiers please use the –help option (qnn-accuracy-debugger –verification –help). Each verifier compares the Framework Runner and Inference Engine output using an error metric. It also prepares reports and/or visualizations to help the user analyze the network’s error data.
Usage¶
usage: qnn-accuracy-debugger --verification [-h]
--default_verifier DEFAULT_VERIFIER
[DEFAULT_VERIFIER ...]
--golden_output_reference_directory
GOLDEN_OUTPUT_REFERENCE_DIRECTORY
--inference_results INFERENCE_RESULTS
[--tensor_mapping TENSOR_MAPPING]
[--qnn_model_json_path QNN_MODEL_JSON_PATH]
[--dlc_path DLC_PATH]
[--verifier_config VERIFIER_CONFIG]
[--graph_struct GRAPH_STRUCT] [-v]
[-w WORKING_DIR]
[--output_dirname OUTPUT_DIRNAME]
[--args_config ARGS_CONFIG]
[--target_encodings TARGET_ENCODINGS]
[-e ENGINE [ENGINE ...]]
[--use_native_output_files]
[--disable_layout_transform]
Script to run verification.
required arguments:
--default_verifier DEFAULT_VERIFIER [DEFAULT_VERIFIER ...]
Default verifier used for verification. The options
"RtolAtol", "AdjustedRtolAtol", "TopK", "L1Error",
"CosineSimilarity", "MSE", "MAE", "SQNR", "ScaledDiff"
are supported. An optional list of hyperparameters can
be appended. For example: --default_verifier
rtolatol,rtolmargin,0.01,atolmargin,0.01 An optional
list of placeholders can be appended. For example:
--default_verifier CosineSimilarity param1 1 param2 2.
to use multiple verifiers, add additional
--default_verifier CosineSimilarity
--golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY, --framework_results GOLDEN_OUTPUT_REFERENCE_DIRECTORY
Path to root directory of golden output files. Paths
may be absolute, or relative to the working directory.
--inference_results INFERENCE_RESULTS
Path to root directory generated from inference engine
diagnosis. Paths may be absolute, or relative to the
working directory.
optional arguments:
--tensor_mapping TENSOR_MAPPING
Path to the file describing the tensor name mapping
between inference and golden tensors.
--qnn_model_json_path QNN_MODEL_JSON_PATH
Path to the qnn model net json, used for transforming
axis of golden outputs w.r.t to qnn outputs. Note:
Applicable only for QNN
--dlc_path DLC_PATH Path to the dlc file, used for transforming axis of
golden outputs w.r.t to target outputs. Note:
Applicable for QAIRT/SNPE
--verifier_config VERIFIER_CONFIG
Path to the verifiers' config file
--graph_struct GRAPH_STRUCT
Path to the inference graph structure .json file. This
file aids in providing structure related information
of the converted model graph during this stage.Note:
This file is mandatory when using ScaledDiff verifier
-v, --verbose Verbose printing
-w WORKING_DIR, --working_dir WORKING_DIR
Working directory for the verification to store
temporary files. Creates a new directory if the
specified working directory does not exist
--output_dirname OUTPUT_DIRNAME
output directory name for the verification to store
temporary files under <working_dir>/verification.
Creates a new directory if the specified working
directory does not exist
--args_config ARGS_CONFIG
Path to a config file with arguments. This can be used
to feed arguments to the AccuracyDebugger as an
alternative to supplying them on the command line.
--target_encodings TARGET_ENCODINGS
Path to target encodings json file.
--use_native_output_files
Loads given outputs as per framework model's actual data types.
--disable_layout_transform
Disables layout transformation of Target outputs. This
option has to be used used when Golden/Framework
outputs and Target outputs are already in the same
layout.
Arguments for generating Tensor mapping (required when --tensor_mapping is not specified):
-e ENGINE [ENGINE ...], --engine ENGINE [ENGINE ...]
Name of engine(qnn/snpe) that is used for running
inference.
Please note: All the command line arguments should either be provided through command line or through the config file. They will not override those in the config file if there is overlap.
The main verification process run using qnn-accuracy-debugger –verification optionally uses –tensor_mapping and –graph_struct to find files to compare. These files are generated by the inference engine step, and should be supplied to verification for best results. By default they are named tensor_mapping.json and {model name}_graph_struct.json, and can be found in the output directory of the inference engine results.
Sample Command
# Compare output of framework runner with inference engine:
qnn-accuracy-debugger \
--verification \
--default_verifier CosineSimilarity \
--default_verifier mse \
--golden_output_reference_directory $PROJECTREPOPATH/working_directory/framework_runner/2022-10-31_17-07-58/ \
--inference_results $PROJECTREPOPATH/working_directory/inference_engine/latest/output/Result_0/ \
--tensor_mapping $PROJECTREPOPATH/working_directory/inference_engine/latest/tensor_mapping.json \
--graph_struct $PROJECTREPOPATH/working_directory/inference_engine/latest/qnn_model_graph_struct.json \
--qnn_model_json_path $PROJECTREPOPATH/working_directory/inference_engine/latest/qnn_model_net.json
# Compare outputs of two different inference engine outputs:
qnn-accuracy-debugger \
--verification \
--default_verifier mse \
--golden_output_reference_directory $PROJECTREPOPATH/working_directory/framework_runner/2022-10-31_17-07-58/ \
--inference_results $PROJECTREPOPATH/working_directory/inference_engine/latest/output/Result_0/ \
--graph_struct $PROJECTREPOPATH/working_directory/inference_engine/latest/qnn_model_graph_struct.json \
--disable_layout_transform
- Tip:
If you passed multiple images in the image_list.txt from run inference engine diagnosis, you’ll receive multiple output/Result_x, choose result that matches the input you used for framework runner for comparison (ie. in framework you used chair.raw and inference chair.raw was the first item in the image_list.txt then choose output/Result_0, if chair.raw was the second item in image_list.txt, then choose output/Result_1).
It is recommended to always supply ‘graph_struct’ and ‘tensor_mapping’ to the command as it is used to line up the report and find the corresponding files for comparison. if tensor_mapping did not get generated from previous steps, you can supplement with ‘model_path’, ‘engine’, ‘framework’ to have module generate ‘tensor_mapping’ during runtime.
You can also compare inference_engine outputs to inference_engine outputs by passing the /output of the inference_engine output to the ‘framework_results’. If you want the outputs to be exact-name-matching, then you do not need to provide a tensor_mapping file.
Note that if you need to generate a tensor mapping instead of providing a path to prexisting tensor mapping file. You can provide the ‘model_path’ option.
Verifier uses two optional config files. The first file is used to set parameters for specific verifiers, as well as which tensors to use these verifiers on. The second file is used to map tensor names from framework_runner to the inference_engine, since certain tensors generated by framework_runner may have different names than tensors generated by inference_engine.
Verifier Config:
The verifier config file is a JSON file that tells verification which verifiers (asides from the default verifier) to use and with which parameters and on what specific tensors. If no config file is provided, the tool will only use the default verifier specified from the command line, with its default parameters, on all the tensors. The JSON file is keyed by verifier names, with each verifier as its own dictionary keyed by “parameters” and “tensors”.
Config File
```json
{
"MeanIOU": {
"parameters": {
"background_classification": 1.0
},
"tensors": [["Postprocessor/BatchMultiClassNonMaxSuppression_boxes", "detection_classes:0"]]
},
"TopK": {
"parameters": {
"k": 5,
"ordered": false
},
"tensors": [["Reshape_1:0"], ["detection_classes:0"]]
}
}
```
Note that the “tensors” field is a list of lists. This is done because specific verifiers runs on two tensor at a time. Hence the two tensors are placed in a list. Otherwise if a verifier only runs on one tensor, it will have a list of lists with only one tensor name in each list. MeanIOU is not supported as verifer in Debugger.
Tensor Mapping:
Tensor mapping is a JSON file keyed by inference tensor names, of framework tensor names. If the tensor mapping is not provided, the tool will assume inference and golden tensor names are identical.
Tensor Mapping File
```json
{
"Postprocessor/BatchMultiClassNonMaxSuppression_boxes": "detection_boxes:0",
"Postprocessor/BatchMultiClassNonMaxSuppression_scores": "detection_scores:0"
}
```
Output
Verification’s output is divided into different verifiers. For example, if both RtolAtol and TopK verifiers are used, there will be two sub-directories named “RtolAtol” and “TopK”. Availble verifiers can be found by issuing –help option.
Under each sub-directory, the verification analysis for each tensor is organized similar to how framework_runner (see above) and inference_engine are organized. For each tensor, a CSV and HTML file is generated. In addition to the tensor-specific analysis, the tool also generates a summary CSV and HTML file which summarizes the data from all verifiers and their subsequent tensors. The following figure shows how a sample summary generated in the verification step looks. Each row in this summary corresponds to one tensor name that is identified by the framework runner and inference engine steps. The final column shows cosinesimilarity score which can vary between 0 to 1 (this range might be different for other verifiers). Higher scores denote similarity while lower scores indicate variance. The developer can then further investigate those specific tensor details. Developer should inspect tensors from top-to-bottom order, meaning if a tensor is broken at an earlier node, anything that was generated post that node is unreliable until that node is properly fixed.
Compare Encodings¶
The Compare Encodings feature is designed to compare QNN and AIMET encodings. This feature takes QNN model net and AIMET encoding JSON files as inputs. This feature executes in the following order.
Extracts encodings from the given QNN model net JSON.
Compares extracted QNN encodings with given AIMET encodings.
Writes results to an Excel file that highlights mismatches.
Throws warnings if some encodings are present in QNN but not in AIMET and vice-versa.
Writes the extracted QNN encodings JSON file (for reference).
Usage¶
usage: qnn-accuracy-debugger --compare_encodings [-h]
--input INPUT
--aimet_encodings_json AIMET_ENCODINGS_JSON
[--precision PRECISION]
[--params_only]
[--activations_only]
[--specific_node SPECIFIC_NODE]
[--working_dir WORKING_DIR]
[--output_dirname OUTPUT_DIRNAME]
[-v]
Script to compare QNN encodings with AIMET encodings
optional arguments:
-h, --help Show this help message and exit
required arguments:
--input INPUT
Path to QNN model net JSON file
--aimet_encodings_json AIMET_ENCODINGS_JSON
Path to AIMET encodings JSON file
optional arguments:
--precision PRECISION
Number of decimal places up to which comparison will be done (default: 17)
--params_only Compare only parameters in the encodings
--activations_only Compare only activations in the encodings
--specific_node SPECIFIC_NODE
Display encoding differences for the given node
--working_dir WORKING_DIR
Working directory for the compare_encodings to store temporary files.
Creates a new directory if the specified working directory does not exist.
--output_dirname OUTPUT_DIRNAME
Output directory name for the compare_encodings to store temporary files
under <working_dir>/compare_encodings. Creates a new directory if the
specified working directory does not exist.
-v, --verbose Verbose printing
Sample Commands
# Compare both params and activations
qnn-accuracy-debugger \
--compare_encodings \
--input QNN_model_net.json \
--aimet_encodings_json aimet_encodings.json
# Compare only params
qnn-accuracy-debugger \
--compare_encodings \
--input QNN_model_net.json \
--aimet_encodings_json aimet_encodings.json \
--params_only
# Compare only activations
qnn-accuracy-debugger \
--compare_encodings \
--input QNN_model_net.json \
--aimet_encodings_json aimet_encodings.json \
--activations_only
# Compare only a specific encoding
qnn-accuracy-debugger \
--compare_encodings \
--input QNN_model_net.json \
--aimet_encodings_json aimet_encodings.json \
--specific_node _2_22_Conv_output_0
Tip
A working_directory is generated from wherever this script is called from unless otherwise specified.
Output
The program creates a directory named latest in working_directory/compare_encodings which is symbolically linked to the most recently generated directory. In the example below, latest will have data that is symlinked to the data in the most recent directory YYYY-MM-DD_HH:mm:ss. Users may choose to override the directory name by passing it to –output_dirname, e.g., –output_dirname myTest.
The figure above shows a sample output from a compare_encodings run. The following details what each file contains.
compare_encodings_options.json contains all the options used to run this feature
encodings_diff.xlsx contains comparison results with mismatches highlighted
log.txt contains log statements for the run
extracted_encodings.json contains extracted QNN encodings
Tensor inspection¶
Tensor inspection compares given reference output and target output tensors and dumps various statistics to represent differences between them.
The Tensor inspection feature can:
Plot histograms for golden and target tensors
Plot a graph indicating deviation between golden and target tensors
Plot a cumulative distribution graph (CDF) for golden vs target tensors
Plot a density (KDE) graph for target tensor highlighting target min/max and calibrated min/max values
Create a CSV file containing information about: target min/max; calibrated min/max; golden output min/max; target/calibrated min/max differences; and computed metrics (verifiers).
Note
Usage¶
usage: qnn-accuracy-debugger --tensor_inspection [-h]
--golden_data GOLDEN_DATA
--target_data TARGET_DATA
--verifier VERIFIER [VERIFIER ...]
[-w WORKING_DIR]
[--data_type {int8,uint8,int16,uint16,float32}]
[--target_encodings TARGET_ENCODINGS]
[-v]
Script to inspection tensor.
required arguments:
--golden_data GOLDEN_DATA
Path to golden/framework outputs folder. Paths may be absolute or
relative to the working directory.
--target_data TARGET_DATA
Path to target outputs folder. Paths may be absolute or relative to the
working directory.
--verifier VERIFIER [VERIFIER ...]
Verifier used for verification. The options "RtolAtol",
"AdjustedRtolAtol", "TopK", "L1Error", "CosineSimilarity", "MSE", "MAE",
"SQNR", "ScaledDiff" are supported.
An optional list of hyperparameters can be appended, for example:
--verifier rtolatol,rtolmargin,0.01,atolmargin,0,01.
To use multiple verifiers, add additional --verifier CosineSimilarity
optional arguments:
-w WORKING_DIR, --working_dir WORKING_DIR
Working directory to save results. Creates a new directory if the
specified working directory does not exist
--data_type {int8,uint8,int16,uint16,float32}
DataType of the output tensor.
--target_encodings TARGET_ENCODINGS
Path to target encodings json file.
-v, --verbose Verbose printing
Sample Commands
# Basic run
qnn-accuracy-debugger --tensor_inspection \
--golden_data golden_tensors_dir \
--target_data target_tensors_dir \
--verifier sqnr
# Pass target encodings file and enable multiple verifiers
qnn-accuracy-debugger --tensor_inspection \
--golden_data golden_tensors_dir \
--target_data target_tensors_dir \
--verifier mse \
--verifier sqnr \
--verifier rtolatol,rtolmargin,0.01,atolmargin,0.01 \
--target_encodings qnn_encoding.json
Tip
A working_directory is generated from wherever this script is called from unless otherwise specified.
The figure above shows a sample output from a Tensor inspection run. The following details what each file contains.
Each tensor will have its own directory; the directory name matches the tensor name.
CDF_plots.html – Golden vs target CDF graph
Diff_plots.html – Golden and target deviation graph
Distribution_min-max.png – Density plot for target tensor highlighting target vs calibrated min/max values
Histograms.html – Golden and target histograms
golden_data.csv – Golden tensor data
target_data.csv – Target tensor data
log.txt – Log statements from the entire run
summary.csv – Target min/max, calibrated min/max, golden output min/max, target vs calibrated min/max differences, and verifier outputs
Histogram Plots
Comparison: We compare histograms for both the golden data and the target data.
Overlay: To enhance clarity, we overlay the histograms bin by bin.
Binned Ranges: Each bin represents a value range, showing the frequency of occurrence.
Visual Insight: Overlapping histograms reveal differences or similarities between the datasets.
Interactive: Hover over histograms to get tensor range and frequencies for the dataset.
Cumulative Distribution Function (CDF) Plots
Overview: CDF plots display the cumulative probability distribution.
Overlay: We superimpose CDF plots for golden and target data.
Percentiles: These plots illustrate data distribution across different percentiles.
Hover Details: Exact cumulative probabilities are available on hover.
Tensor Difference Plots
Inspection: We generate plots highlighting differences between golden and target data tensors.
Scatter and Line: Scatter plots represent tensor values, while line plots show differences at each index.
Interactive: Hover over points to access precise values.
Run QNN Accuracy Debugger E2E¶
This feature is designed to run the framework runner, inference engine, and verification features sequentially with a single command to debug the model. The following debugging algorithms are available.
- Oneshot-layerwise(default):
- This algorithm is designed to debug all layers of model at a time by performing below steps
Execute framework runner to collect reference outputs in fp32
Execute inference engine to collect backend outputs in provided target precision.
Execute verification for comparison of intermediate outputs from the above 2 steps
Execute tensor inspection (when –enable_tensor_inspection is passed) to dump various plots, e.g., scatter, line, CDF, etc., for intermediate outputs
It provides quick analysis to identify layers of model causing accuracy deviation.
User can chose cumulative-layerwise(below) for deeper analysis of accuracy deviation.
- Cumulative-layerwise:
- This algorithm is designed to debug one layer at a time by performing below steps
Execute framework runner to collect reference outputs from all layers of model in fp32.
- Execute inference engine and verification in iterative manner to perform below operations
to collect backend outputs in target precision for each layer while removing the effect of its preceeding layers on final output.
to compare intermediate outputs from framework runner and inference engine
It provides deeper analysis to identify all layers of model causing accuracy deviation.
Currently this option supports only onnx models.
- Layerwise:
- This algorithm is designed to debug a single layer model at a time by performing the following steps
Get golden reference per layer outputs from an external tool or, if a golden reference is not given, run framework runner to collect intermediate layer outputs.
- Iteratively execute inference engine and verification to:
Collect backend outputs in target precision for each single layer model by removing the preceding and following layers
Compare intermediate output from golden reference with inference engine single layer model output
Layerwise snooping provides deeper analysis to identify all model layers causing accuracy deviation on hardware with respect to framework/simulation outputs.
Layerwise snooping only supports ONNX models.
Usage¶
usage: qnn-accuracy-debugger [--framework_runner] [--inference_engine] [--verification] [-h]
Script that runs Framework Runner, Inference Engine or Verification.
Arguments to select which component of the tool to run. Arguments are mutually exclusive (at
most 1 can be selected). If none are selected, then all components are run:
--framework_runner Run framework
--inference_engine Run inference engine
--verification Run verification
optional arguments:
-h, --help Show this help message. To show help for any of the components, run
script with --help and --<component>. For example, to show the help
for Framework Runner, run script with the following: --help
--framework_runner
usage: qnn-accuracy-debugger [-h] -f FRAMEWORK [FRAMEWORK ...] -m MODEL_PATH -i INPUT_TENSOR
[INPUT_TENSOR ...] -o OUTPUT_TENSOR -r RUNTIME -a
{aarch64-android,x86_64-linux-clang,aarch64-android-clang6.0}
-l INPUT_LIST --default_verifier DEFAULT_VERIFIER [DEFAULT_VERIFIER ...]
[--debugging_algorithm {layerwise,cumulative-layerwise,oneshot-layerwise}]
Options for running the Accuracy Debugger components
optional arguments:
-h, --help show this help message and exit
Arguments required by both Framework Runner and Inference Engine:
-f FRAMEWORK [FRAMEWORK ...], --framework FRAMEWORK [FRAMEWORK ...]
Framework type and version, version is optional. Currently supported
frameworks are [tensorflow, tflite, onnx]. For example, tensorflow
2.3.0
-m MODEL_PATH, --model_path MODEL_PATH
Path to the model file(s).
-i INPUT_TENSOR [INPUT_TENSOR ...], --input_tensor INPUT_TENSOR [INPUT_TENSOR ...]
The name, dimensions, raw data, and optionally data type of the
network input tensor(s) specifiedin the format "input_name" comma-
separated-dimensions path-to-raw-file, for example: "data"
1,224,224,3 data.raw float32. Note that the quotes should always be
included in order to handle special characters, spaces, etc. For
multiple inputs specify multiple --input_tensor on the command line
like: --input_tensor "data1" 1,224,224,3 data1.raw --input_tensor
"data2" 1,50,100,3 data2.raw float32.
-o OUTPUT_TENSOR, --output_tensor OUTPUT_TENSOR
Name of the graph's specified output tensor(s).
Arguments required by Inference Engine:
-r RUNTIME, --runtime RUNTIME
Runtime to be used for inference.
-a {aarch64-android,x86_64-linux-clang,aarch64-android-clang6.0}, --architecture {aarch64-an
droid,x86_64-linux-clang,aarch64-android-clang6.0}
Name of the architecture to use for inference engine.
-l INPUT_LIST, --input_list INPUT_LIST
Path to the input list text.
Arguments required by Verification: [3/467]
--default_verifier DEFAULT_VERIFIER [DEFAULT_VERIFIER ...]
Default verifier used for verification. The options "RtolAtol",
"AdjustedRtolAtol", "TopK", "L1Error", "CosineSimilarity", "MSE",
"MAE", "SQNR", "ScaledDiff" are supported. An optional
list of hyperparameters can be appended. For example:
--default_verifier rtolatol,rtolmargin,0.01,atolmargin,0,01. An
optional list of placeholders can be appended. For example:
--default_verifier CosineSimilarity param1 1 param2 2. to use
multiple verifiers, add additional --default_verifier
CosineSimilarity
optional arguments:
--debugging_algorithm {layerwise,cumulative-layerwise,oneshot-layerwise}
Performs model debugging layerwise, cumulative-layerwise or in oneshot-
layerwise based on choice. Default is oneshot-layerwise.
-v, --verbose Verbose printing
-w WORKING_DIR, --working_dir WORKING_DIR
Working directory for the wrapper to store temporary files. Creates
a new directory if the specified working directory does not exitst.
--output_dirname OUTPUT_DIRNAME
output directory name for the wrapper to store temporary files under
<working_dir>/wrapper. Creates a new directory if the specified
working directory does not exist
--deep_analyzer {modelDissectionAnalyzer}
Deep Analyzer to perform deep analysis
--golden_output_reference_directory
Optional parameter to indicate the directory of the golden reference outputs.
When this option is provided, the framework runner is stage skipped.
In inference stage, it's used for tensor mapping without a framework.
In verification stage, it's used as a reference to compare
outputs produced in the inference engine stage.
--enable_tensor_inspection
Plots graphs (line, scatter, CDF etc.) for each
layer's output. Additionally, summary sheet will have
more details like golden min/max, target min/max etc.,
--step_size
Number of layers to skip in each iteration of debugging.
Applicable only for cumulative-layerwise algorithm.
--step_size (> 1) should not be used along with --add_layer_outputs,
--add_layer_types, --skip_layer_outputs, skip_layer_types,
--start_layer, --end_layer
(below options are ignored for framework_runner component incase of layerwise and cumulative-layerwise runs)
--add_layer_outputs ADD_LAYER_OUTPUTS
Output layers to be dumped, e.g., 1579,232
--add_layer_types ADD_LAYER_TYPES
Outputs of layer types to be dumped, e.g., Resize, Transpose; all enabled by default
--skip_layer_types SKIP_LAYER_TYPES
Comma delimited layer types to skip snooping, e.g., Resize, Transpose
--skip_layer_outputs SKIP_LAYER_OUTPUTS
Comma delimited layer output names to skip debugging, e.g., 1171, 1174
--start_layer START_LAYER
Extracts the given model from mentioned start layer
output name
--end_layer END_LAYER
Extracts the given model from mentioned end layer
output name
--use_native_output_files
Specifies that the output files will be generated in
the data type native to the graph. If not specified,
output files will be generated in floating point.
--disable_layout_transform
Disables layout transformation of Target outputs. This
option has to be used used when Golden/Framework
outputs and Target outputs are already in the same
layout.
Note : --start_layer and --end_layer options are allowed only for Layerwise and Cumulative layerwise run
Sample Command for oneshot-layerwise
Command for Oneshot-layerwise using DSP backend:
qnn-accuracy-debugger \
--architecture aarch64-android \
--runtime dspv73 \
--framework tensorflow \
--model_path $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen.pb \
--input_tensor "input:0" 1,299,299,3 $PATHTOGOLDENI/samples/InceptionV3Model/data/chairs.raw \
--output_tensor InceptionV3/Predictions/Reshape_1:0 \
--debugging_algorithm oneshot-layerwise
--input_list $RESOURCESPATH/samples/InceptionV3Model/data/image_list.txt \
--default_verifier CosineSimilarity \
--enable_tensor_inspection \
--verbose
Command for Oneshot-layerwise using HTP emulation on x86 host:
qnn-accuracy-debugger \
--framework onnx \
--runtime htp \
--model_path /local/mnt/workspace/models/vit/vit_base_16_224.onnx \
--input_tensor "input.1" 1,3,224,224 /local/mnt/workspace/models/vit/000000039769_1_3_224_224.raw \
--output_tensor 1597 \
--architecture x86_64-linux-clang \
--input_list /local/mnt/workspace/models/vit/list.txt \
--default_verifier CosineSimilarity \
--offline_prepare \
--debugging_algorithm oneshot-layerwise
--enable_tensor_inspection \
--verbose
Running pre-quantized models (tflite model example):
qnn-accuracy-debugger \
--debugging_algorithm oneshot-layerwise \
--runtime dspv75 \
--architecture aarch64-android \
--framework tflite \
--model_path hand_regressor_random_weights.tflite \
--input_list sample.txt \
--input_tensor "serving_default_features:0" 1,160,160,1 1.raw uint8 \
--output_tensor "StatefulPartitionedCall:4" \
--output_tensor "StatefulPartitionedCall:3" \
--output_tensor "StatefulPartitionedCall:5" \
--output_tensor "StatefulPartitionedCall:0" \
--output_tensor "StatefulPartitionedCall:2" \
--output_tensor "StatefulPartitionedCall:1" \
--default_verifier mse \
--engine QNN \
--engine_path $QNN_SDK_ROOT \
--use_native_input_files \
--use_native_output_files \
--float_fallback
Example for using external golden outputs dumped by any frameworks like ONNX, TF:
qnn-accuracy-debugger \
--debugging_algorithm cumulative-layerwise \
--architecture aarch64-android \
--runtime dspv75 \
--framework onnx \
--model_path /path/to/model.onnx \
--input_tensor "input.1" 1,3,224,224 /path/to/input.raw \
--output_tensor 1597 \
--input_list /path/to/list.txt \
--default_verifier CosineSimilarity \
--offline_prepare \
--golden_output_reference_directory /path/to/goldens
Example for using external golden outputs dumped by QNN:
qnn-accuracy-debugger \
--debugging_algorithm cumulative-layerwise \
--architecture aarch64-android \
--runtime dspv75 \
--framework onnx \
--model_path /path/to/model.onnx \
--input_tensor "input.1" 1,3,224,224 /path/to/input.raw \
--output_tensor 1597 \
--input_list /path/to/list.txt \
--default_verifier CosineSimilarity \
--offline_prepare \
--golden_output_reference_directory /path/to/goldens \
--disable_layout_transform
Note
The –enable_tensor_inspection argument significantly increases overall execution time when used with large models. To speed up execution, omit this argument.
Output
The program creates framework_runner, inference_engine, verification, and wrapper output directories as below:
framework_runner – Contains a timestamped directory that contains the intermediate layer outputs (framework) stored in .raw format as described in the framework runner step.
inference_engine – Contains a timestamped directory that contains the intermediate layer outputs (inference engine) stored in .raw format as described in the inference engine step.
verification directory – Contains a timestamped directory that contains the following:
A directory for each verifier specified while running oneshot; it contains CSV and HTML files with metric details for each layer output
tensor_inspection – Individual directories for each layer’s output with the following contents:
CDF_plots.png – Golden vs target CDF graph
Diff_plots.png – Golden and target deviation graph
Histograms.png – Golden and target histograms
golden_data.csv – Golden tensor data
target_data.csv – Target tensor data
summary.csv – Report for verification results of each layers output
Wrapper directory containing log.txt with the entire log for the run.
Note: Except wrapper directory all other directories will have a folder called latest which is a symlink to the latest run’s corresponding timestamped directory.
Snapshot of summary.csv file:
Understanding the oneshot-layerwise report:
Column |
Description |
|---|---|
Name |
Output name of the current layer |
Layer Type |
Type of the current layer |
Size |
Size of this layer’s output |
Tensor_dims |
Shape of this layer’s output |
<Verifier name> |
Verifier value of the current layer output compared to reference output |
golden_min |
minimum value in the reference output for current layer |
golden_max |
maximum value in the reference output for current layer |
target_min |
minimum value in the target output for current layer |
target_max |
maximum value in the target output for current layer |
Sample Command for cumulative-layerwise
Command for Cumulative-layerwise using DSP backend:
qnn-accuracy-debugger \
--framework onnx \
--runtime dspv73 \
--model_path /local/mnt/workspace/models/vit/vit_base_16_224.onnx \
--input_tensor "input.1" 1,3,224,224 /local/mnt/workspace/models/vit/000000039769_1_3_224_224.raw \
--output_tensor 1597 \
--architecture x86_64-linux-clang \
--input_list /local/mnt/workspace/models/vit/list.txt \
--default_verifier CosineSimilarity \
--offline_prepare \
--debugging_algorithm cumulative-layerwise
--engine QNN \
--verbose
Command for Cumulative-layerwise using HTP emulation on x86 host:
qnn-accuracy-debugger \
--framework onnx \
--runtime htp \
--model_path /local/mnt/workspace/models/vit/vit_base_16_224.onnx \
--input_tensor "input.1" 1,3,224,224 /local/mnt/workspace/models/vit/000000039769_1_3_224_224.raw \
--output_tensor 1597 \
--architecture x86_64-linux-clang \
--input_list /local/mnt/workspace/models/vit/list.txt \
--default_verifier CosineSimilarity \
--offline_prepare \
--debugging_algorithm cumulative-layerwise
--engine QNN \
--verbose
Output
The program creates output directories framework_runner, cumulative_layerwise_snooping and wrapper directories as below
framework_runner directory contains timestamped directory that contains the intermediate layers outputs stored in .raw format just as mentioned in Framework Runner step.
cumulative_layerwise_snooping directory contains intemediate outputs obtained from inference engine step stored in separate directories with respective layer names. Also it contains final report named cumulative_layerwise.csv which contains verifier scores for each layer. User can identify layers with most deviating scores as problematic nodes.
Wrapper directory consists a log.txt where user can refer entire logs for the whole run.
Understanding the cumulative-layerwise report
At the end of cumulative-layerwise run, the tool generates .csv with below information for each layer
Column |
Description |
|---|---|
O/P Name |
Output name of the current layer. |
Status |
|
Layer Type |
Type of the current layer. |
Shape |
Shape of this layer’s output. |
Activations |
The Min, Max and Median of the outputs at this layer taken from reference execution. |
<Verifier name> |
Absolute verifier value of the current layer compared to reference platform. |
Orig outputs |
Displays the original outputs verifier score observed when the model was run with the current
layer output enabled starting from the last partitioned layer.
|
Info |
Displays information for the output verifiers, if the values are abnormal. |
Command for Layerwise:
qnn-accuracy-debugger \
--framework onnx \
--runtime dspv73 \
--model_path /local/mnt/workspace/models/vit/vit_base_16_224.onnx \
--input_tensor "input.1" 1,3,224,224 /local/mnt/workspace/models/vit/000000039769_1_3_224_224.raw \
--output_tensor 1597 \
--architecture x86_64-linux-clang \
--input_list /local/mnt/workspace/models/vit/list.txt \
--default_verifier CosineSimilarity \
--offline_prepare \
--debugging_algorithm layerwise \
--quantization_overrides /local/mnt/workspace/layer_output_dump/vit_base_16_224.encodings \
--engine QNN \
--verbose
Output
The program creates layerwise_snooping and wrapper output directories as well as framework_runner if a golden reference is not provided (like described for cumulative-layerwise).
layerwise_snooping directory – Contains each single layer model outputs obtained from the inference engine stage stored in separate directories and the final report named layerwise.csv which contains verifier scores for each layer model. Users can identify layers with the most deviating scores as problematic nodes.
wrapper directory – Contains log.txt which stores the full logs for the run.
The output .csv is similar to the cumulative-layerwise output, but the original outputs column will not be present in layerwise snooping, since we are not dealing with final outputs of the model.
Debugging Accuracy issue with Quantized model using Cumulative Layerwise Snooping
With quantized models, it is expected to have some mismatch at most data intensive layers - arising due to quantization error.
The debugger can be used to identify operators which are most sensitive with high verifier score and run those at higher precision to improve overall accuracy.
The sensitivity is determined by the verifier score seen at that layer regarding the reference platform (like ONNXRT).
Note that Cumulative-layerwise debugging takes considerable time as the partitioned model shall be quantized and compiled at every layer that does not have a 100% match with reference.
Below is one strategy to debug larger models:
Run Oneshot-layerwise on the model which helps to identify the starting point of sensitivity in the model.
Run Cumulative-layerwise at different parts of the model using start-layer and end-layer options (if the model has 100 nodes, use start layer at starting node from Oneshot-layerwise run and end layer at the 25th node for run 1, start layer at 26th and end layer at 50th node for run 2, start layer at 51st node and end layer at 75th node for run 3 .. and so on).The final reports of all runs help to identify the most sensitive layers in the model. Let’s say node A,B,C have high verifier scores which indicates high sensitivity
Run the original model with those specific layers (A/B/C - one at a time or combinations) in FP16 and observe the improvement in accuracy.
Debugging Accuracy issue for models exhibiting Accuracy discrepancy between golden reference (for ex. - AIMET/framework runtime output) vs target output using Layerwise Snooping
- One of the popular usecase for layerwise snooping is debugging accuracy difference between AIMET vs target
Though we are creating an exact simulation of hardware using tools like AIMET, still it is expected to have a very minute mismatch due to environment differences. This can be because simulation executes on GPU FP32 kernels and is simulating noise rather than actual execution on integer kernels in the case of hardware execution.
If we have a higher deviation between simulation and hardware, then layerwise snooping could be used to point out to the nodes having higher deviations. The nodes showing higher deviation as per layerwise.csv can be identified as the erroneous nodes.
Other usecases include debugging Framework runtime’s FP32 output vs target INT16 output deviations.
Binary Snooping¶
The binary snooping tool debugs the given ONNX graph in a binary search fashion.
For the graph under analysis, it quantizes half of the graph and lets the other half run in fp16/32. The final model output is used to calculate the subgraph quantization effect. If the subgraph has a high effect(verifier scores greater than 60% of sum of the two subgraphs scores) on the final model output due to quantization, the process repeats until the subgraph size is less than the min_graph_size or the subgraph cannot be divided again. If both subgraphs have similar scores(verifier scores greater than 40% of sum of the two subgraphs scores), both subgraphs are investigated further.
usage
usage: qnn-accuracy-debugger --binary_snooping \
-m MODEL_PATH \
-l INPUT_LIST \
-i INPUT_TENSOR \
-f FRAMEWORK \
-o OUTPUT_TENSOR \
-e ENGINE_NAME \
-qo QUANTIZATION_OVERRIDES \
[--verifier VERIFIER] \
[-a {x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}] \
[--host_device {x86,x86_64-windows-msvc,wos}] \
[-r {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic,htp}] \
[--deviceId DEVICEID] \
[--golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY] \
[--bias_bitwidth BIAS_BITWIDTH] \
[--use_per_channel_quantization USE_PER_CHANNEL_QUANTIZATION] \
[--weights_bitwidth WEIGHTS_BITWIDTH] \
[--act_bitwidth {8,16}] [-fbw {16,32}] \
[-rqs RESTRICT_QUANTIZATION_STEPS] \
[-w WORKING_DIR] \
[--output_dirname OUTPUT_DIRNAME] \
[-p ENGINE_PATH] \
[--min_graph_size MIN_GRAPH_SIZE] \
[--extra_converter_args EXTRA_CONVERTER_ARGS] \
[--act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}] \
[--param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}] \
[--act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}] \
[--param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}] \
[--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE] \
[--param_quantizer {tf,enhanced,adjusted,symmetric}] \
[--act_quantizer {tf,enhanced,adjusted,symmetric}] \
[--per_channel_quantization] \
[--algorithms ALGORITHMS] \
[--verifier_config VERIFIER_CONFIG] \
[--start_layer START_LAYER] \
[--end_layer END_LAYER] [--precision {int8,fp16}] \
[--compiler_config COMPILER_CONFIG] \
[--ignore_encodings] \
[--extra_runtime_args EXTRA_RUNTIME_ARGS] \
[--add_layer_outputs ADD_LAYER_OUTPUTS] \
[--add_layer_types ADD_LAYER_TYPES] \
[--skip_layer_types SKIP_LAYER_TYPES] \
[--skip_layer_outputs SKIP_LAYER_OUTPUTS] \
[--remote_server REMOTE_SERVER] \
[--remote_username REMOTE_USERNAME] \
[--remote_password REMOTE_PASSWORD] [-nif] [-nof]
Sample Commands
Sample command to run binary snooping on mv2 large model
qnn-accuracy-debugger\
--binary_snooping\
--framework onnx\
--model_path models/mv2/mobilenet-v2.onnx\
--architecture aarch64-android\
--input_list models/mv2/inputs/input_list_1.txt\
--input_tensor "input.1" 1,3,224,224 /local/mnt/workspace/harsraj/models/mv2/inputs/data1.raw\
--output_tensor "473"\
--engine_path $QNN_SDK_ROOT\
--working_dir working_directory/QNN/BINARY_MV2_DSP\
--runtime dspv75\
--engine QNN\
--verifier mse\
--extra_converter_args "float_bitwidth=32;preserve_io=layout"\
--quantization_overrides /local/mnt/workspace/harsraj/models/mv2/quantized_encoding.json\
--min_graph_size 16
Outputs The algorithm provides two JSON files:
graph_result.json (for each subgraph) - Contains verifier scores for two child subgraphs; for example 318_473 has child subgraphs 318_392 and 393_473.
subgraph_result.json (for each subgraph) - Contains the corresponding and sorted verifier scores.
Keys in both files look like “subgraph_start_node_activation_name” + _ + “subgraph_end_node_activation_name”.
For example, 318_473 means a subgraph starts at node activation 318 and ends at node activation 473. Only the subgraph from 318 to 473 is quantized while the rest of the model runs in fp16/32.
Debugging accuracy issues with binary snooping results Subgraphs with maximum verifier scores in subgraph_result.json are the culprit subgraphs.
One subgraph can be a subset of another subgraph. In this case prioritize a subgraph size you are comfortable debugging. The details of a subset can be found in graph_result.json.
Quantization Checker¶
The quantization checker analyzes activations, weights, and biases of a given model. It provides:
comparison between quantized and unquantized weights and biases
Analysis on unquantized weights, biases, and activations
Results in csv, html, or plots
Problematic weights and biases for a given bitwidth quantization
Usage
usage: qnn-accuracy-debugger --quant_checker [-h] \
--model_path \
--input_tensor \
--config_file \
--framework \
--input_list \
--output_tensor \
[--engine_path] \
[--working_dir] \
[--quantization_overrides] \
[--extra_converter_args] \
[--bias_width] \
[--weights_width] \
[--host_device] \
[--deviceId] \
[--generate_csv] \
[--generate_plots] \
[--per_channel_plots] \
[--golden_output_reference_directory] \
[--output_dirname]
[--verbose]
Sample quant_checker_config_file
- {
- “WEIGHT_COMPARISON_ALGORITHMS”: [
{“algo_name”:”minmax”,”threshold”:”10”}, {“algo_name”:”maxdiff”, “threshold”:”10”}, {“algo_name”:”sqnr”, “threshold”:”26”}, {“algo_name”:”stats”, “threshold”:”2”}, {“algo_name”:”data_range_analyzer”}, {“algo_name”:”data_distribution_analyzer”, “threshold”:”0.6”}
], “BIAS_COMPARISON_ALGORITHMS”: [
{“algo_name”:”minmax”, “threshold”:”10”}, {“algo_name”:”maxdiff”, “threshold”:”10”}, {“algo_name”:”sqnr”, “threshold”:”26”}, {“algo_name”:”stats”, “threshold”:”2”}, {“algo_name”:”data_range_analyzer”}, {“algo_name”:”data_distribution_analyzer”, “threshold”:”0.6”}
], “ACT_COMPARISON_ALGORITHMS”: [
{“algo_name”:”minmax”, “threshold”:”10”}, {“algo_name”:”data_range_analyzer”}
], “INPUT_DATA_ANALYSIS_ALGORITHMS”: [{“algo_name”:”stats”, “threshold”:”2”}], “QUANTIZATION_ALGORITHMS”: [“cle”, “None”], “QUANTIZATION_VARIATIONS”: [“tf”, “enhanced”, “symmetric”, “asymmetric”]
}
Output
Output are available in the <working-directory>/results, which looks like: .. container:
.. figure:: /../_static/resources/quant_checker_acc_debug_output_dir_struct.png
Results are provided in:
HTML
CSV
Histogram
A log is provided in the <working-directory>/quant_checker directory.
HTML Each HTML file contains a summary of the results for each quantization option and for each input file provided.
The following example provides additional guidance on the contents of the HTML files.
CSV Results Files
Each CSV file contains detailed computation results for a specific node type (activation/weight/bias) and quantization option. Each row in the csv file displays the op name, node name, passes accuracy (True/False), computation result (accuracy differences), threshold used for each algorithm, and the algorithm name. The format of the computation results (accuracy differences) differs according to the algorithms/metrics used.
The following table provides additional notes about the different algorithms and information in each csv row. .. list-table:
:header-rows: 1
:widths: auto
* - Field
- Comparator
- Information
- Example
* - minmax
- Indicates the difference between the unquantized
minimum and the dequantized minimum value. Correspondingly,
indicates the same difference for the maximum unquantized and
dequantized value.
- computation result: "min: #VALUE max: #VALUE"
* - maxdiff
- Calculates the absolute difference between the
unquantized and dequantized data for all data points and
displays the maximum value of the result.
- computation result: "#VALUE"
* - sqnr
- Calculates the signal to quantization noise ratio
between the two tensors of unquantized and dequantized
data.
- computation result: "#VALUE"
* - data_range_analyzer
- Calculates the difference between the
maximum and minimum values in a tensor and compares that to
the maximum value supported by the bit-width used to
determine if the range of values can be reasonably
represented by the selected quantization bit width.
- computation result: "unique dec places: #INT_VALUE
data range : #VALUE". Information in the computation results
field includes how many unique decimal places we need to
express the unquantized data in quantized format and what is
the actual data range.
* - data_distribution_analyzer
- Calculates the clustering of
the data to find whether a large number of unique unquantized
values are quantized to the same value or not.
- computation result: "Distribution of pixels above
threshold: #VALUE"
* - stats
- Calculates some basic statistics on the received
data such as the min, max, median, variance, standard
deviation, the mode and the skew. The skew is used to
indicate how symmetric the data is.
- computation result: skew: #VALUE min: #VALUE max:
#VALUE median: #VALUE variance: #VALUE stdDev: #VALUE mode:
#VALUE
The following CSV example shows weight data for one of the quantization options.
Separate .csv files are available for activations, weights and biases for each quantization option. The activation related results also include analysis for each input file provided.
Histogram
For each quantization variation and for each weight and bias tensor in the model, we generate historagm. a histogram is generated for each quantization variation and for each weight and bias tensor in the model. The following example illustrates the generated histograms.
- align
left
Logs
The log files contain the following information.
The commands executed as part of the script’s run, including different runs of the snpe-converter tool with different quantization options
Analysis failures for activations, weights, and biases
The following example shows a sample log output.
<====ACTIVATIONS ANALYSIS FAILURES====>
<====ACTIVATIONS ANALYSIS FAILURES====>
Results for the enhanced quantization: | Op Name | Activation Node | Passes Accuracy | Accuracy Difference | Threshold Used | Algorithm Used | | conv_tanh_comp1_conv0 | ReLU_6919 | False | minabs_diff: 0.59 maxabs_diff: 17.16 | 0.05 | minmax |
where,
Op Name : Op name as expressed in corresponding qnn artifacts
Activation Node : Activation node name in the operation
Passes Accuracy : True if the quantized activation (or weight or bias) meets threshold when compared with values from float32 graph; false otherwise
Accuracy Difference : Details about the accuracy per the algorithm used
Threshold Used : The threshold used to influence the result of “Passes Accuracy” column
Algorithm Used : Metric used to compare actual quantized activations/weights/biases against unquantized float data or analyze the quality of unquantized float data. Metrics can be minmax, maxdiff, sqnr, stats, data_range_analyzer, data_distribution_analyzer.
qairt-accuracy-debugger (Beta)¶
Dependencies
The Accuracy Debugger depends on the setup outlined in Setup. In particular, the following are required:
Platform dependencies are need to be met as per Platform Dependencies
The desired ML frameworks need to be installed. Accuracy debugger is verified to work with the ML framework versions mentioned at Environment Setup
Supported models
The qairt-accuracy-debugger currently supports ONNX.
Overview
The Accuracy Debugger tool finds inaccuracies in a neural-network at the layer level. Primarily functionality of this tool is to compare the golden outputs produced by running a model through ML framework with the results produced by running the same model on Target devices (HTP, CPU, GPU etc.,).
The following component are available in Accuracy Debugger. Each component can be run with its corresponding subcommand; for example, qairt-accuracy-debugger {component}.
qairt-accuracy-debugger framework_runner uses an ML framework e.g. Onnx, to run the model to get intermediate outputs.
qairt-accuracy-debugger inference_engine uses inference engine to run a model on the target device to retrieve intermediate outputs.
qairt-accuracy-debugger verification compares the output generated by the framework runner and inference engine features using verifiers such as CosineSimilarity, RtolAtol, etc.
qairt-accuracy-debugger compare_encodings compares target encodings with the AIMET encodings, and outputs an Excel sheet highlighting mismatches.
qairt-accuracy-debugger tensor_visualizer compares given target outputs with golden outputs.
qairt-accuracy-debugger snooping runs chosen snooping algorithm to investigate accuracy issues.
- Tip:
You can use –help with component name to see options (required or optional) available for that component.
Below are the instructions for running various components available in Accuracy Debugger:
Framework Runner¶
The Framework Runner component is designed to run models with different machine learning frameworks (e.g. Tensorflow, Onnx, TFLite etc). A given model is run with a specific ML framework. Golden outputs are produced for future comparison with inference results from the Inference Engine step.
Usage
usage: qairt-accuracy-debugger framework_runner [-h] -m INPUT_MODEL --input_sample INPUT_SAMPLE [INPUT_SAMPLE ...] [--working_directory WORKING_DIRECTORY]
[-o OUTPUT_TENSOR] [--log_level {info,debug,warning,error}]
options:
-h, --help show this help message and exit
required arguments:
-m INPUT_MODEL, --input_model INPUT_MODEL
path to the model file
--input_sample INPUT_SAMPLE [INPUT_SAMPLE ...]
Path to text file containing input sample. Refer to qnn-net-run input_list for format of input_sample file.
--onnx_define_symbol SYMBOL VALUE
Option to override specific input dimension symbols.
optional arguments:
--working_directory WORKING_DIRECTORY
Path to working directory. If not specified a directory with name working_directory will be created in the current directory.
-o OUTPUT_TENSOR, --output_tensor OUTPUT_TENSOR
Name of the graph's specified output tensor(s).
--log_level {info,debug,warning,error}
Log level. Default is info.
Sample Commands
qairt-accuracy-debugger framework_runner \
--input_model dlv3onnx/dlv3plus_mbnet_513-513_op9_mod_basic.onnx \
--input_sample input_sample.txt \
--output_tensor Output
- TIP:
a working_directory, if not otherwise specified, is generated from wherever you are calling the script
Outputs
Once the Framework Runner has finished running, it will store the outputs in the specified working directory. It creates an output directory with a timestamp of the format YYYY-MM-DD_HH:mm:ss in working_directory/framework_runner. The following figure shows a sample output folder from a Framework Runner run using an Onnx model.
working_directory
└── framework_runner
├── 2025-07-07_22-01-02
│ ├── mobilenetv20_features_batchnorm0_fwd.raw
│ ├── .
│ ├── .
│ └── profile_info.json
The output directory contains the outputs of each layer in the model saved as .raw files. Every raw file that can be seen is the output of an operation in the model.
The intermediate outputs produced by the Framework Runner step offers precise reference/golden material for the Verification component to diagnose the accuracy of the network outputs generated by the Inference Engine.
Inference Engine¶
The Inference Engine component is designed to dump intermediate outputs of the model when run on target devices like CPU, DSP, GPU etc.,. The output produced by this step can be compared with the golden outputs produced by the framework runner step.
Usage
usage: qairt-accuracy-debugger inference_engine [-h] --input_model INPUT_MODEL
[--desired_input_shape DESIRED_INPUT_SHAPE [DESIRED_INPUT_SHAPE ...]]
[--output_tensor OUTPUT_TENSOR]
[--converter_float_bitwidth {32,16}]
[--float_bias_bitwidth {32,16}]
[--quantization_overrides QUANTIZATION_OVERRIDES]
[--onnx_define_symbol SYMBOL VALUE]
[--onnx_defer_loading]
[--enable_framework_trace]
[--op_package_config OP_PACKAGE_CONFIG [OP_PACKAGE_CONFIG ...]]
[--converter_op_package_lib CONVERTER_OP_PACKAGE_LIB]
[--package_name PACKAGE_NAME]
[--calibration_input_list CALIBRATION_INPUT_LIST]
[--bias_bitwidth {8,32}]
[--act_bitwidth {8,16}]
[--weights_bitwidth {8,4}]
[--quantizer_float_bitwidth {32,16}]
[--act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}]
[--param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}]
[--act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}]
[--param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}]
[--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE]
[--use_per_channel_quantization]
[--use_per_row_quantization]
[--float_fallback]
[--quantization_algorithms QUANTIZATION_ALGORITHMS [QUANTIZATION_ALGORITHMS ...]]
[--restrict_quantization_steps RESTRICT_QUANTIZATION_STEPS]
[--dump_encodings_json]
[--ignore_encodings]
[--op_package_lib OP_PACKAGE_LIB]
[--perf_profile {low_balanced,balanced,default,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}]
[--profiling_level PROFILING_LEVEL]
[--input_list INPUT_LIST]
[--netrun_backend_extension_config NETRUN_BACKEND_EXTENSION_CONFIG]
[--offline_prepare_backend_extension_config OFFLINE_PREPARE_BACKEND_EXTENSION_CONFIG]
[--backend {CPU,GPU,HTP,}]
[--platform {aarch64-android,linux-embedded,qnx,wos,x86_64-linux-clang,x86_64-windows-msvc}]
[--offline_prepare]
[--working_directory WORKING_DIRECTORY]
[--deviceId DEVICEID]
[--log_level {ERROR,WARN,INFO,DEBUG,VERBOSE}]
[--op_packages OP_PACKAGES]
Script to run inference engine.
options:
-h, --help show this help message and exit
required arguments:
--input_model INPUT_MODEL
Path to the source model/dlc/bin file
optional arguments:
--backend {CPU,GPU,HTP,}
Backend type for inference to be run
--platform {aarch64-android,linux-embedded,qnx,wos,x86_64-linux-clang,x86_64-windows-msvc}
The type of device platform to be used for inference
--offline_prepare Boolean to indicate offline preapre of the graph
--working_directory WORKING_DIRECTORY
Path to the directory to store the output result
--deviceId DEVICEID The serial number of the device to use. If not available, the first in a list of queried devices will be used for inference.
--log_level {ERROR,WARN,INFO,DEBUG,VERBOSE}
Enable verbose logging.
--op_packages OP_PACKAGES
Provide a comma separated list of op package and interface providers to register during graph preparation.Usage: op_package_path:interface_provider[,op_package_path:interface_provider...]
converter arguments:
--desired_input_shape DESIRED_INPUT_SHAPE [DESIRED_INPUT_SHAPE ...], --input_tensor DESIRED_INPUT_SHAPE [DESIRED_INPUT_SHAPE ...]
The name,dimension,datatype and layout of all the input buffers to the network specified in the format [input_name comma-separated-dimensions data-type layout]. Dimension, datatype and layout are optional.for example: 'data' 1,224,224,3. Note that the
quotes should always be included in order to handle special characters, spaces, etc. For multiple inputs, specify multiple --desired_input_shape on the command line like: --desired_input_shape "data1" 1,224,224,3 float32 --desired_input_shape "data2"
1,50,100,3 int64
--output_tensor OUTPUT_TENSOR
Name of the graph's specified output tensor(s).
--converter_float_bitwidth {32,16}
Use this option to convert the graph to the specified float bitwidth, either 32 (default) or 16.
--float_bias_bitwidth {32,16}
Option to select the bitwidth to use for float bias tensor, either 32(default) or 16
--quantization_overrides QUANTIZATION_OVERRIDES
Path to quantization overrides json file.
--onnx_define_symbol SYMBOL VALUE
Option to override specific input dimension symbols.
--onnx_defer_loading Option to have the model not load weights. If False, the model will be loaded eagerly.
--enable_framework_trace
Use this option to enable converter to trace the o/p tensor change information.
--op_package_config OP_PACKAGE_CONFIG [OP_PACKAGE_CONFIG ...]
Absolute paths to Qnn Op Package XML configuration file that contains user defined custom operations.Note: Only one of: {'op_package_config', 'package_name'} can be specified.
--converter_op_package_lib CONVERTER_OP_PACKAGE_LIB
Absolute path to converter op package library compiled by the OpPackage generator. Must be separated by a comma for multiple package libraries. Note: Libraries must follow the same order as the xml files. E.g.1: --converter_op_package_lib
absolute_path_to/libExample.so E.g.2: --converter_op_package_lib absolute_path_to/libExample1.so,absolute_path_to/libExample2.so
--package_name PACKAGE_NAME
A global package name to be used for each node in the Model.cpp file. Defaults to Qnn header defined package name. Note: Only one of: {'op_package_config', 'package_name'} can be specified.
quantizer_arguments:
--calibration_input_list CALIBRATION_INPUT_LIST
Path to the inputs list text file to run quantization(used with qairt-quantizer)
--bias_bitwidth {8,32}
Option to select the bitwidth to use when quantizing the bias. default 8
--act_bitwidth {8,16}
Option to select the bitwidth to use when quantizing the activations. default 8
--weights_bitwidth {8,4}
Option to select the bitwidth to use when quantizing the weights. default 8
--quantizer_float_bitwidth {32,16}
Use this option to select the bitwidth to use for float tensors, either 32 (default) or 16.
--act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}
Specify which quantization calibration method to use for activations supported values: min-max (default), sqnr, entropy, mse, percentile This option can be paired with --act_quantizer_schema to override the quantization schema to use for activations
otherwise default schema(asymmetric) will be used
--param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}
Specify which quantization calibration method to use for parameters supported values: min-max (default), sqnr, entropy, mse, percentile This option can be paired with --act_quantizer_schema to override the quantization schema to use for activations
otherwise default schema(asymmetric) will be used
--act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}
Specify which quantization schema to use for activations. Note: Default is asymmetric.
--param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}
Specify which quantization schema to use for parameters. Note: Default is asymmetric.
--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE
Value must lie between 90 and 100. Default is 99.99
--use_per_channel_quantization
Use per-channel quantization for convolution-based op weights. Note: This will replace built-in model QAT encodings when used for a given weight.
--use_per_row_quantization
Use this option to enable rowwise quantization of Matmul and FullyConnected ops.
--float_fallback Use this option to enable fallback to floating point (FP) instead of fixed point.This option can be paired with --quantizer_float_bitwidth to indicate the bitwidth forFP (by default 32). If this option is enabled, then input list must not be provided and
--ignore_encodings must not be provided. The external quantization encodings (encoding file/FakeQuant encodings) might be missing quantization parameters for some interim tensors. First it will try to fill the gaps by propagating across math-invariant
functions. If the quantization params are still missing, then it will apply fallback to nodes to floating point.
--quantization_algorithms QUANTIZATION_ALGORITHMS [QUANTIZATION_ALGORITHMS ...]
Use this option to select quantization algorithms. Usage is: --quantization_algorithms <algo_name1> ...
--restrict_quantization_steps RESTRICT_QUANTIZATION_STEPS
Specifies the number of steps to use for computingquantization encodings E.g.--restrict_quantization_steps "-0x80 0x7F" indicates an example 8 bit range,
--dump_encodings_json
Dump encoding of all the tensors in a json file
--ignore_encodings Use only quantizer generated encodings, ignoring any user or model provided encodings.
--op_package_lib OP_PACKAGE_LIB
Use this argument to pass an op package library for quantization. Must be in the form <op_package_lib_path:interfaceProviderName> and be separated by a comma for multiple package libs
netrun arguments:
--perf_profile {low_balanced,balanced,default,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}
Specifies perf profile to set. Valid settings are "low_balanced" , "balanced" , "default", high_performance" ,"sustained_high_performance", "burst", "low_power_saver", "power_saver", "high_power_saver", "extreme_power_saver", and "system_settings". Note:
perf_profile argument is now deprecated for HTP backend, user can specify performance profile through backend extension config now.
--profiling_level PROFILING_LEVEL
Enables profiling and sets its level. For QNN executor, valid settings are "basic", "detailed" and "client" Default is detailed.
--input_list INPUT_LIST
Path to the input list text file to run inference(used with net-run).
--netrun_backend_extension_config NETRUN_BACKEND_EXTENSION_CONFIG
Path to config to be used with qnn-net-run
offline prepare arguments:
--offline_prepare_backend_extension_config OFFLINE_PREPARE_BACKEND_EXTENSION_CONFIG
Path to config to be used with qnn-context-binary-generator.
Sample Commands
# Example for running on Linux host's CPU without quantization encodings
qairt-accuracy-debugger inference_engine \
--backend cpu \
--platform x86_64-linux-clang \
--input_model source_model/mobilenet.onnx \
--input_list inputs/input_list.txt \
--calibration_input_list inputs/calibration_list.txt \
--param_quantizer_schema symmetric \
--act_quantizer_schema asymmetric \
--param_quantizer_calibration sqnr \
--act_quantizer_calibration percentile \
--percentile_calibration_value 99.995 \
--bias_bitwidth 32
# Example for running on Android DSP target
qairt-accuracy-debugger inference_engine \
--backend htp \
--platform aarch64-android \
--device_id 357415c4 \
--input_model source_model/mobilenet.onnx \
--input_list inputs/input_list.txt \
--quantization_overrides AIMET_quantization_encodings.json
# Example for running on a WoS HTP target
qairt-accuracy-debugger inference_engine ^
--backend htp ^
--platform wos ^
--input_model source_model/mobilenet.onnx ^
--input_list inputs/input_list.txt ^
--quantization_overrides AIMET_quantization_encodings.json
# Example for running on a WoS CPU target
qairt-accuracy-debugger inference_engine ^
--backend cpu ^
--platform wos ^
--input_model source_model/mobilenet.onnx ^
--input_list inputs/input_list.txt ^
--calibration_input_list inputs/calib_list.txt
# Example for running on Android GPU target with fp16 precision
qairt-accuracy-debugger inference_engine \
--backend gpu \
--platform aarch64-android \
--input_model mobilenet.onnx \
--input_tensor "data" 1,3,224,224 inputs/data.raw \
--output_tensor mobilenetv20_output_flatten0_reshape0 \
--input_list inputs/input_list.txt \
--converter_float_bitwidth 16
- Tip:
Although tool can quantize the given model using data provided through –calibration_input_list argument, it is recommended to pass quantization encodings through –quantization_overrides argument to speed-up the execution
More example commands with different stage configurations:
Sample Commands
# source stage: same as examples from above section
# Running from converted stage (Android DSP):
qairt-accuracy-debugger inference_engine \
--input_model converted_model.dlc \
--backend htp \
--device_id f366ce60 \
--platform aarch64-android \
--input_list inputs/input_list.txt \
--quantization_overrides AIMET_quantization_encodings.json
# Running from quantized stage (x86 CPU):
qairt-accuracy-debugger inference_engine \
--input_model quantized_model.dlc \
--backend cpu \
--platform x86_64-linux-clang \
--input_list inputs/input_list.txt \
Outputs
Once the Inference Engine has finished running, it will store the outputs in the specified working directory. By default, it will store the output in working_directory/inference_engine in the current working directory. It creates an output directory with a timestamp of the format YYYY-MM-DD_HH:mm:ss in working_directory/inference_engine. Below is the output directory structure:
working_directory
├── inference_engine
│ └── 2025-07-07_22-05-54
│ ├── base.dlc
│ ├── base_quantized.dlc
│ └── Output
│ └── Result_0
│ ├── data_0231.raw
│ ├── .
│ ├── .
The “output” directory contains raw files. Each raw file is an output of an operation in the network.
The base_quantized_encoding.json contains quantization encodings used by the model.
Verification¶
The Verification step compares the output (from the intermediate tensors of a given model) produced by the framework runner step with the output produced by the inference engine step. Once the comparison is complete, the verification results are compiled and displayed visually in a format that can be easily interpreted by the user.
There are different types of verifiers for e.g.: l1_norm, rtol_atol, etc. To see available verifiers please use the –help option (qairt-accuracy-debugger verification –help). Each verifier compares the Framework Runner and Inference Engine output using an error metric. It also prepares reports and/or visualizations to help the user analyze the network’s error data.
Usage
usage: qairt-accuracy-debugger verification [-h] --inference_tensor INFERENCE_TENSOR --reference_tensor REFERENCE_TENSOR
[--comparators {l1_norm,l2_norm,average,cosine,standard_deviation,mse,snr,kl_divergence,rtol_atol,l1_error,mse_rel,topk,adjusted_rtol_atol,mae} [{l1_norm,l2_norm,average,cosine,standard_deviation,mse,snr,kl_divergence,rtol_atol,l1_error,mse_rel,topk,adjusted_rtol_atol,mae} ...]]
[--reference_dtype REFERENCE_DTYPE] [--inference_dtype INFERENCE_DTYPE] [--dlc_file DLC_FILE] [--graph_info GRAPH_INFO]
[--is_qnn_golden_reference] [--working_directory WORKING_DIRECTORY] [--log_level {info,debug,warning,error}]
options:
-h, --help show this help message and exit
required arguments:
--inference_tensor INFERENCE_TENSOR
Directory path of inference tensor files.
--reference_tensor REFERENCE_TENSOR
Directory path of reference tensor files.
optional arguments:
--comparators {l1_norm,l2_norm,average,cosine,standard_deviation,mse,snr,kl_divergence,rtol_atol,l1_error,mse_rel,topk,adjusted_rtol_atol,mae} [{l1_norm,l2_norm,average,cosine,standard_deviation,mse,snr,kl_divergence,rtol_atol,l1_error,mse_rel,topk,adjusted_rtol_atol,mae} ...]
Comparator to use to compare tensors. For multiple comparators, specify as follows: --comparator mse std. Default comparator is mse
--reference_dtype REFERENCE_DTYPE
Data type of reference tensor files.
--inference_dtype INFERENCE_DTYPE
Data type of inference tensor files.
--dlc_file DLC_FILE Path to dlc file.
--graph_info GRAPH_INFO
Path to json file containing graph information like, tensor mapping, graph structure and layout information in the following format:
{'tensor_mapping':{}, graph_structure:{}, layout_info:{}}
--is_qnn_golden_reference
Specifies that outputs passed with --reference_tensor are dumped by QNN.
--working_directory WORKING_DIRECTORY
Path to working directory. If not specified a directory with name working_directory will be created in the current directory.
--log_level {info,debug,warning,error}
Log level. Default is info.
Sample Commands
# Compare output of framework runner with inference engine
qairt-accuracy-debugger verification \
--comparator CosineSimilarity mse \
--golden_output_reference_directory working_directory/framework_runner_output/ \
--inference_results working_directory/inference_engine_output/ \
--graph_info working_directory/graph_info.json \
--dlc_file working_directory/inference_engine/base.dlc
# Compare outputs of two different inference engine outputs:
qairt-accuracy-debugger verification \
--comparator mse \
--golden_output_reference_directory working_directory/inference_engine_output1/ \
--inference_results working_directory/inference_engine_output2/ \
- Tip:
If you passed multiple images in the image_list.txt from run inference engine diagnosis, you’ll receive multiple output/Result_x. Choose the result that matches the input you used for framework runner for comparison (i.e., in framework you used chair.raw and inference chair.raw was the first item in the image_list.txt then choose output/Result_0 if chair.raw was the second item in image_list.txt, then choose output/Result_1).
It is recommended to always supply dlc_file or graph_info to the command as it is used to line up the report and find the corresponding files for comparison.
If both targets and golden outputs are to be exact-name-matching, then you do not need to provide a tensor_mapping file.
Tensor Mapping:
Tensor mapping is a JSON file keyed by inference tensor names, of framework tensor names. If the tensor mapping is not provided, the tool generate it from dlc_file. If dlc_file is not provided, it will assume inference and golden tensor names are identical.
Tensor Mapping File
```json
{
"Postprocessor/BatchMultiClassNonMaxSuppression_boxes": "detection_boxes:0",
"Postprocessor/BatchMultiClassNonMaxSuppression_scores": "detection_scores:0"
}
```
Outputs
Once the Verification has finished running, it will store the outputs in the specified working directory. By default, it will store the output in working_directory/verification in the current working directory. It creates an output directory with a timestamp of the format YYYY-MM-DD_HH:mm:ss in working_directory/verification.
Below is the output directory structure:
working_directory
└── verification
├── 2025-07-07_22-10-10
└── verification.csv
Verifier generates a summary CSV file which summarizes the data from all verifiers and their subsequent tensors. The following figure shows how a sample summary generated in the verification step looks. Each row in this summary corresponds to one tensor name that is identified by the framework runner and inference engine steps. The final column shows cosine similarity score which can vary between 0 to 1 (this range might be different for other verifiers). Higher scores denote similarity while lower scores indicate variance. The developer can then further investigate those specific tensor details. The developer should inspect tensors from top-to-bottom order, meaning if a tensor is broken at an earlier node, anything that was generated post that node is unreliable until that node is properly fixed.
Compare Encodings¶
- The Compare Encodings feature is designed to compare two encoding files. It provides the following features:
One-to-Many Mapping: It maps a given tensor in an encoding file to all tensors in another encoding file with which it shares the similar encodings (encodings which can be algebraically converted to each other) and vice-versa.
UnMapped Tensors: It provides the list of tensors in an encoding file that do not have similar encodings with any of the tensors in the other encoding file. It can be seen in the .csv file by setting the “Status” field to “UNMAPPED”.
Incorrect consumption of AIMET encodings by QAIRT: It gives the list of AIMET tensors whose encodings were not consumed by QAIRT. It can be seen in the .csv file by setting the “Status” field to “ERROR”.
Supergroup Mapping: It helps in identifying fusions.
- It supports the following for encoding comparisons:
QAIRT vs QAIRT
QAIRT vs AIMET
AIMET vs AIMET
- It supports the following encoding schema versions:
LEGACY AIMET encoding format
“1.0.0” AIMET encoding format
Usage
usage: qairt-accuracy-debugger compare_encodings [-h] --encoding1_file_path ENCODING1_FILE_PATH --encoding2_file_path ENCODING2_FILE_PATH
[--quantized_dlc1_path QUANTIZED_DLC1_PATH] [--quantized_dlc2_path QUANTIZED_DLC2_PATH]
[--framework_model_path FRAMEWORK_MODEL_PATH] [--scale_threshold SCALE_THRESHOLD] [--working_directory WORKING_DIRECTORY]
[--log_level {info,debug,warning,error}]
options:
-h, --help show this help message and exit
required arguments:
--encoding1_file_path ENCODING1_FILE_PATH
Path to either QAIRT or AIMET encodings file
--encoding2_file_path ENCODING2_FILE_PATH
Path to either QAIRT or AIMET encodings file
optional arguments:
--quantized_dlc1_path QUANTIZED_DLC1_PATH
Path to quantized dlc file related to encoding_file1 being passed.If passed along side with framework model for any of the encoding_config, it
performs following operations on the qairt encodings file:1. Propagates convert_ops encodings to the its parent op considering the fact
thatparent op exists in the framework model2. Resolves any activation name changes done. For e.g. matmul+add in frameworkmodel becomes fc in
the dlc graph and the tensor name gets _fc suffix.It also performs supergroup mapping.
--quantized_dlc2_path QUANTIZED_DLC2_PATH
Path to quantized dlc file related to encoding_file2 being passed.If passed along side with framework model for any of the encoding_config, it
performs following operations on the qairt encodings file:1. Propagates convert_ops encodings to the its parent op considering the fact
thatparent op exists in the framework model2. Resolves any activation name changes done. For e.g. matmul+add in frameworkmodel becomes fc in
the dlc graph and the tensor name gets _fc suffix.It also performs supergroup mapping.
--framework_model_path FRAMEWORK_MODEL_PATH
path to the framework model. If passedalong side with quantized dlc for any of the encoding_config, it performs followingoperations on the
qairt encodings file:1. Propagates convert_ops encodings to the its parent op considering the fact thatparent op exists in the framework
model2. Resolves any activation name changes done. For e.g. matmul+add in frameworkmodel becomes fc in the dlc graph and the tensor name gets
_fc suffix.It also performs supergroup mapping.
--scale_threshold SCALE_THRESHOLD
threshold for scale comparision of two encodings. For e.g.scale1=0.5, scale2=0.01. We compare scale1 and scale2
as:abs(scale1-scale2)<(min(scale1, scale2)*scale_threshold). This ensures that bound ismaintained by the lowest scale value among the given
two scales.
--working_directory WORKING_DIRECTORY
Path to working directory. Default: working_directory
--log_level {info,debug,warning,error}
Log level. Default is info
Sample Commands
# Comparing two encodings with no dlc file
qairt-accuracy-debugger compare_encodings \
--encoding1_file_path encoding1.json \
--encoding2_file_path encoding2.json
# Comparing two encodings with quantized_dlc being passed for encoding1
qairt-accuracy-debugger compare_encodings \
--encoding1_file_path encoding1.json \
--quantized_dlc1_path quantized_dlc.dlc \
--encoding2_file_path encoding2.json \
--framework_model framework_model.onnx
Tip
A working_directory is generated from wherever this script is called from unless otherwise specified.
Outputs
Once the Compare Encodings has finished running, it will store the outputs in the specified working directory. By default, it will store the output in working_directory/compare_encodings in the current working directory. Creates a directory named latest in working_directory/compare_encodings which is symbolically linked to the most recent run YYYY-MM-DD_HH:mm:ss. Users may choose to override the directory name by passing it to –output_dirname (i.e. –output_dirname myTest1).
The analysis report consists of .csv and .json files.
CSV Files:
The tool produces two .csv files. Each file has 10 columns:
Column |
Description |
|---|---|
Tensor Name(name of encoding file1) |
A tensor name in encoding file1 |
Tensor Name(name of encoding file2) |
A tensor name in encoding file |
Status |
One of “UNMAPPED”, “SUCCESS”, “WARNING”, “ERROR” UNMAPPED: When a tensor in a encoding file is not mapped to any of the tensors in another encoding file. SUCCESS: When a tensor in a encoding file is mapped to one or more tensors tensor in another encoding file. WARNING: When a tensor in a encoding file is mapped to one or more tensors in another encoding file but does have the exact bitwidth or is_symm value. ERROR: When a tensor of the same name in both the encoding files does not have the same encoding (scale, offset, channels, dtype). |
dtype |
One of “SAME”, “NOT_COMPARED”, some other info SAME: If the value is the same in both encoding files NOT_COMPARED: If the field is not compared between a pair of tensors belonging to two encoding files. If some tensors are not mapped, the field is not present in the encoding, or if the channels are not the same, there’s no need to compare scale and offsets. Any info if comparison failed due to mismatch. |
is_symm |
One of “SAME”, “NOT_COMPARED”, some other info SAME: If the value is the same in both encoding files NOT_COMPARED: If the field is not compared between a pair of tensors belonging to two encoding files. If some tensors are not mapped, the field is not present in the encoding, or if the channels are not the same, there’s no need to compare scale and offsets. Any info if comparison failed due to mismatch. |
bitwidth |
One of “SAME”, “NOT_COMPARED”, some other info SAME: If the value is the same in both encoding files NOT_COMPARED: If the field is not compared between a pair of tensors belonging to two encoding files. If some tensors are not mapped, the field is not present in the encoding, or if the channels are not the same, there’s no need to compare scale and offsets. Any info if comparison failed due to mismatch. |
channels |
One of “SAME”, “NOT_COMPARED”, some other info SAME: If the value is the same in both encoding files NOT_COMPARED: If the field is not compared between a pair of tensors belonging to two encoding files. If some tensors are not mapped, the field is not present in the encoding, or if the channels are not the same, there’s no need to compare scale and offsets. Any info if comparison failed due to mismatch. |
scale |
One of “SAME”, “NOT_COMPARED”, some other info SAME: If the value is the same in both encoding files NOT_COMPARED: If the field is not compared between a pair of tensors belonging to two encoding files. If some tensors are not mapped, the field is not present in the encoding, or if the channels are not the same, there’s no need to compare scale and offsets. Any info if comparison failed due to mismatch. |
offset |
One of “SAME”, “NOT_COMPARED”, some other info SAME: If the value is the same in both encoding files NOT_COMPARED: If the field is not compared between a pair of tensors belonging to two encoding files. If some tensors are not mapped, the field is not present in the encoding, or if the channels are not the same, there’s no need to compare scale and offsets. Any info if comparison failed due to mismatch. |
Total Mappings |
Number of tensors in another file with which the given tensor name shares its encoding. |
It produces two .csv files:
param_comparison.csv
activation_comparison.csv
JSON Files
Encoding comparison .json files
Because it produces a “one-to-many” map of tensors sharing the same encodings between two files, CSV is not a “conclusive” format to represent the whole data. CSV gives overall information at a glance but JSON provides indepth details related to “one-to-many” maps. This can be used when “Total Mappings” for any tensor is >1 and it was not expected.
For example: for one tensor the info looks something like this:
for the tensor name /rms_norm_0/Cast_1_output_0,
we have
“compare_info” which contains all the tensor names in another encoding file along with its comparison info
“Status” which contains the status of comparison with tensor names in another encoding file
“Mapping” is a list of tensor names in another encoding file which is mapped to the tensor
It generates 4 .json files:
<encoding1 file name>_param.json: comparison of params in encoding file1 against the params in encoding file2
<encoding1 file name>_activation.json: comparison of activations in encoding file1 against the activations in encoding file2
<encoding2 file name>_param.json: comparison of params in encoding file2 against the params in encoding file1
<encoding2 file name>_activation.json: comparison of activations in encoding file2 against the activations in encoding file1
Supergroup Info .json files:
When for any of the encoding_config quantized_dlc_path along with framework_model_path is provided, the tool dumps supergroup mapping. For example, if providing quantized_dlc_path for encoding_config1 as well as framework_model_path, then map each activation tensor in encoding_config2 to a supergroup in the dlc file belonging to encoding_config1.
A sample mapping is shown below:
Keys in the .json file are the activation name in encoding_config2 and mapping represent a supergroup’s info (inputs, outputs, and tensors) in in the dlc file belonging to encoding_config1. When quantized_dlc along with framework model is provided for both of the encoding_config, it generates 2 such supergroup mappings.
Tensor visualizer¶
Tensor visualizer compares given reference output and target output tensors and plots various statistics to represent differences between them.
The Tensor visualizer feature can:
Plot histograms for golden and target tensors
Plot a graph indicating deviation between golden and target tensors
Plot a cumulative distribution graph (CDF) for golden vs. target tensors
Note
Usage
usage: qairt-accuracy-debugger tensor_visualizer [-h] --target_tensors TARGET_TENSORS --golden_tensors GOLDEN_TENSORS [-dt DATA_TYPE] [-wd WORKING_DIRECTORY] [--log_level {info,debug,warning,error}]
options:
-h, --help show this help message and exit
required arguments:
--target_tensors TARGET_TENSORS
Directory path to Target tensor files
--golden_tensors GOLDEN_TENSORS
Directory path to Golden tensor files
optional arguments:
-dt DATA_TYPE, --data_type DATA_TYPE
Data type to load the tensor file in. Default: float32
-wd WORKING_DIRECTORY, --working_directory WORKING_DIRECTORY
Path to output directory. Default: tensor_visualizer_output_dir
--log_level {info,debug,warning,error}
Log level. Default is info
Sample Commands
# Basic run
qairt-accuracy-debugger tensor_visualizer \
--golden_tensors golden_tensors_dir \
--target_tensors target_tensors_dir \
Tip
A working_directory is generated from wherever this script is called from unless otherwise specified.
Outputs
Once the Tensor Inspection has finished running, it will store the outputs in the specified working directory. It creates a output directory with timestamp of the format YYYY-MM-DD_HH:mm:ss in working_directory/tensor_visualizer.
Below is the output directory structure:
working_directory
├── tensor_visualizer
│ └── 2025-07-07_08-18-24
│ ├── mobilenetv20_features_batchnorm0_fwd
│ ├── CDF_plots.jpeg
│ ├── Diff_plots.jpeg
│ └── Histograms.jpeg
The following details what each file contains.
Each tensor will have its own directory; the directory name matches the tensor name.
Histograms.html – Golden and target histograms
CDF_plots.html – Golden vs. target CDF graph
Diff_plots.html – Golden and target deviation graph
Histogram Plots
Comparison: We compare histograms for both the golden data and the target data.
Overlay: To enhance clarity, we overlay the histograms bin by bin.
Binned Ranges: Each bin represents a value range, showing the frequency of occurrence.
Visual Insight: Overlapping histograms reveal differences or similarities between the datasets.
Cumulative Distribution Function (CDF) Plots
Overview: CDF plots display the cumulative probability distribution.
Overlay: We superimpose CDF plots for golden and target data.
Percentiles: These plots illustrate data distribution across different percentiles.
Tensor Difference Plots
Inspection: We generate plots highlighting differences between golden and target data tensors.
Scatter and Line: Scatter plots represent tensor values, while line plots show differences at each index.
Snooping¶
Snooping algorithms help in finding inaccuracies in a neural-network at the layer level. The following snooping options are available:
oneshot-layerwise
cumulative-layerwise
layerwise
Usage
usage: qairt-accuracy-debugger snooping [-h] [--algorithm {oneshot,layerwise,cumulative_layerwise}] [--desired_input_shape DESIRED_INPUT_SHAPE [DESIRED_INPUT_SHAPE ...]] [--converter_float_bitwidth {32,16}] [--float_bias_bitwidth {32,16}] [--quantization_overrides QUANTIZATION_OVERRIDES] [--onnx_define_symbol SYMBOL VALUE] [--onnx_defer_loading] [--enable_framework_trace] [--calibration_input_list CALIBRATION_INPUT_LIST] [--bias_bitwidth {8,32}] [--act_bitwidth {8,16}] [--weights_bitwidth {8,4}] [--quantizer_float_bitwidth {32,16}] [--act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}] [--param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}] [--act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}] [--param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}] [--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE] [--use_per_channel_quantization] [--use_per_row_quantization] [--float_fallback] [--quantization_algorithms QUANTIZATION_ALGORITHMS [QUANTIZATION_ALGORITHMS ...]] [--restrict_quantization_steps RESTRICT_QUANTIZATION_STEPS] [--dump_encodings_json] [--ignore_encodings] [--perf_profile {low_balanced,balanced,default,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}] [--profiling_level PROFILING_LEVEL] [--netrun_backend_extension_config NETRUN_BACKEND_EXTENSION_CONFIG] [--offline_prepare_backend_extension_config OFFLINE_PREPARE_BACKEND_EXTENSION_CONFIG] [--device_id DEVICE_ID] [--soc_model SOC_MODEL] -m INPUT_MODEL --input_sample INPUT_SAMPLE [INPUT_SAMPLE ...] [--working_directory WORKING_DIRECTORY] [-o OUTPUT_TENSOR] [--log_level {info,debug,warning,error}] --backend {HTP,CPU,GPU} --platform {aarch64-android,x86_64-linux-clang,wos} [--golden_reference GOLDEN_REFERENCE] [--is_qnn_golden_reference] [--retain_compilation_artifacts] [--comparator {l1_norm,l2_norm,average,cosine,standard_deviation,mse,snr,kl_divergence,rtol_atol,l1_error,mse_rel,topk,adjusted_rtol_atol,mae} [{l1_norm,l2_norm,average,cosine,standard_deviation,mse,snr,kl_divergence,rtol_atol,l1_error,mse_rel,topk,adjusted_rtol_atol,mae} ...]] [--offline_prepare] [--debug_subgraph_inputs DEBUG_SUBGRAPH_INPUTS] [--debug_subgraph_outputs DEBUG_SUBGRAPH_OUTPUTS] options: -h, --help show this help message and exit required arguments: -m INPUT_MODEL, --input_model INPUT_MODEL path to the model file --input_sample INPUT_SAMPLE [INPUT_SAMPLE ...] Path to text file containing input sample. Refer to qnn-net-run input_list for format of input_sample file. --backend {HTP,CPU,GPU} Backend type for inference to be run --platform {aarch64-android,x86_64-linux-clang,wos} The type of device platform to be used for inference optional arguments: --algorithm {oneshot,layerwise,cumulative_layerwise} Algorithm to use to debug the model. --device_id DEVICE_ID The serial number of the device to use. If not available, the first in a list of queried devices will be used for inference. --soc_model SOC_MODEL Option to specify the SOC on which the model needs to run. This can be found from SOC info of the device and it starts with strings such as SDM, SM, QCS, IPQ, SA, QC, SC, SXR, SSG, STP, or QRB. --working_directory WORKING_DIRECTORY Path to working directory. If not specified a directory with name working_directory will be created in the current directory. -o OUTPUT_TENSOR, --output_tensor OUTPUT_TENSOR Name of the graph's specified output tensor(s). --log_level {info,debug,warning,error} Log level. Default is info --golden_reference GOLDEN_REFERENCE The path of directory where golden reference tensor files are saved. --is_qnn_golden_reference Specifies that outputs passed with --golden_reference are dumped by QNN. This option should be used only when --golden_reference is supplied. --retain_compilation_artifacts Flag to retain compilation artifacts. --comparator {l1_norm,l2_norm,average,cosine,standard_deviation,mse,snr,kl_divergence,rtol_atol,l1_error,mse_rel,topk,adjusted_rtol_atol,mae} [{l1_norm,l2_norm,average,cosine,standard_deviation,mse,snr,kl_divergence,rtol_atol,l1_error,mse_rel,topk,adjusted_rtol_atol,mae} ...] Comparator to use to compare tensors. For multiple comparators, specify as follows: --comparator mse std --offline_prepare Boolean to indicate offline preapre of the graph --debug_subgraph_inputs DEBUG_SUBGRAPH_INPUTS Provide comma-separated inputs for the subgraph to be debugged. Currently, only layerwise and cumulative algorithms are supported. --debug_subgraph_outputs DEBUG_SUBGRAPH_OUTPUTS Provide comma-separated outputs for the subgraph to be debugged. Currently, only layerwise and cumulative algorithms are supported. converter arguments: --desired_input_shape DESIRED_INPUT_SHAPE [DESIRED_INPUT_SHAPE ...], --input_tensor DESIRED_INPUT_SHAPE [DESIRED_INPUT_SHAPE ...] The name,dimension,datatype and layout of all the input buffers to the network specified in the format [input_name comma-separated-dimensions data-type layout]. Dimension, datatype and layout are optional.for example: 'data' 1,224,224,3. Note that the quotes should always be included in order to handle special characters, spaces, etc. For multiple inputs, specify multiple --desired_input_shape on the command line like: --desired_input_shape "data1" 1,224,224,3 float32 --desired_input_shape "data2" 1,50,100,3 int64 --converter_float_bitwidth {32,16} Use this option to convert the graph to the specified float bitwidth, either 32 (default) or 16. --float_bias_bitwidth {32,16} Option to select the bitwidth to use for float bias tensor, either 32(default) or 16 --quantization_overrides QUANTIZATION_OVERRIDES Path to quantization overrides json file. --onnx_define_symbol SYMBOL VALUE Option to override specific input dimension symbols. --onnx_defer_loading Option to have the model not load weights. If False, the model will be loaded eagerly. --enable_framework_trace Use this option to enable converter to trace the o/p tensor change information. quantizer_arguments: --calibration_input_list CALIBRATION_INPUT_LIST Path to the inputs list text file to run quantization(used with qairt-quantizer) --bias_bitwidth {8,32} Option to select the bitwidth to use when quantizing the bias. default 8 --act_bitwidth {8,16} Option to select the bitwidth to use when quantizing the activations. default 8 --weights_bitwidth {8,4} Option to select the bitwidth to use when quantizing the weights. default 8 --quantizer_float_bitwidth {32,16} Use this option to select the bitwidth to use for float tensors, either 32 (default) or 16. --act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile.} Specify which quantization calibration method to use for activations. Supported values: min-max (default), sqnr, entropy, mse, percentile. This option can be paired with --act_quantizer_schema to override the quantization schema to use for activations, otherwise the default schema (asymmetric) will be used. --param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile} Specify which quantization calibration method to use for parameters. Supported values: min-max (default), sqnr, entropy, mse, percentile. This option can be paired with --act_quantizer_schema to override the quantization schema to use for activations, otherwise the default schema (asymmetric) will be used. --act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric} Specify which quantization schema to use for activations. Note: Default is asymmetric. --param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric} Specify which quantization schema to use for parameters. Note: Default is asymmetric. --percentile_calibration_value PERCENTILE_CALIBRATION_VALUE Value must lie between 90 and 100. Default is 99.99 --use_per_channel_quantization Use per-channel quantization for convolution-based op weights. Note: This will replace built-in model QAT encodings when used for a given weight. --use_per_row_quantization Use this option to enable rowwise quantization of Matmul and FullyConnected ops. --float_fallback Use this option to enable fallback to floating point (FP) instead of fixed point. This option can be paired with --quantizer_float_bitwidth to indicate the bitwidth for FP (by default 32). If this option is enabled, then input list must not be provided and --ignore_encodings must not be provided. The external quantization encodings (encoding file/FakeQuant encodings) might be missing quantization parameters for some interim tensors. First it will try to fill the gaps by propagating across math-invariant functions. If the quantization parameters are still missing, then it will apply fallback to nodes to floating point. --quantization_algorithms QUANTIZATION_ALGORITHMS [QUANTIZATION_ALGORITHMS ...] Use this option to select quantization algorithms. Usage is: --quantization_algorithms <algo_name1> ... --restrict_quantization_steps RESTRICT_QUANTIZATION_STEPS Specifies the number of steps to use for computingquantization encodings E.g.--restrict_quantization_steps "-0x80 0x7F" indicates an example 8 bit range, --dump_encodings_json Dump encoding of all the tensors in a json file --ignore_encodings Use only quantizer generated encodings, ignoring any user or model provided encodings. netrun arguments: --perf_profile {low_balanced,balanced,default,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings} Specifies performance profile to set. Valid settings are "low_balanced" , "balanced" , "default", high_performance" ,"sustained_high_performance", "burst", "low_power_saver", "power_saver", "high_power_saver", "extreme_power_saver", and "system_settings". Note: perf_profile argument is now deprecated for HTP backend. User can specify performance profile through backend extension config now. --profiling_level PROFILING_LEVEL Enables profiling and sets its level. For QNN executor, valid settings are "basic", "detailed" and "client" Default is detailed. --netrun_backend_extension_config NETRUN_BACKEND_EXTENSION_CONFIG Path to config to be used with qnn-net-run offline prepare arguments: --offline_prepare_backend_extension_config OFFLINE_PREPARE_BACKEND_EXTENSION_CONFIG Path to config to be used with qnn-context-binary-generator.
oneshot-layerwise Snooping¶
This algorithm is designed to debug all layers of the model at a time by performing below steps:
Execute framework runner to collect reference outputs from all intermediate tensors of a model in fp32 precision
Execute inference engine to collect target outputs from all intermediate tensors of a model in provided target precision
Execute verification for comparison of intermediate outputs from the above two steps
This algorithm can be used to get quick analysis to check if layers in the model are quantization sensitive.
Sample Commands
# Example for executing oneshot algorithm on a Android HTP device hosted on a Linux machine:
qairt-accuracy-debugger snooping \
--algorithm oneshot \
--backend htp \
--platform aarch64-android \
--input_model artifacts/mobilenet-v2.onnx \
--input_sample input_sample.txt \
--comparator mse \
--quantization_overrides artifacts/quantized_encoding.json
# Example for executing oneshot snooping on a WoS HTP target:
qairt-accuracy-debugger snooping ^
--algorithm oneshot ^
--backend htp ^
--platform wos ^
--input_model artifacts/mobilenet-v2.onnx ^
--input_sample input_sample.txt ^
--comparator mse ^
--calibration_input_list calib_list.txt
# Example for using external golden outputs dumped by any frameworks like ONNX:
qairt-accuracy-debugger snooping \
--algorithm oneshot \
--backend htp \
--platform aarch64-android \
--input_model artifacts/mobilenet-v2.onnx \
--input_sample input_sample.txt \
--comparator mse \
--quantization_overrides artifacts/quantized_encoding.json \
--golden_reference /path/to/goldens
# Example for using external golden outputs dumped by QNN:
qairt-accuracy-debugger snooping \
--algorithm oneshot \
--backend htp \
--platform aarch64-android \
--input_model artifacts/mobilenet-v2.onnx \
--input_sample input_sample.txt \
--comparator mse \
--quantization_overrides artifacts/quantized_encoding.json \
--golden_reference /path/to/goldens \
--is_qnn_golden_reference
Tip
Refer to inference-engine sample commands to understand usage of different platforms/backends
Output
Below is the output directory structure:
working_directory
└── oneshot_snooping
├── 2025-07-02_11-02-58
│ ├── inference_engine
│ ├── oneshot_layerwise.csv
│ ├── plots
│ └── reference_output
- Once oneshot snooping is completed, a timestamped directory is generated under working_directory/oneshot_snooping containing :
inference_engine directory contains intermediate layer outputs generated by QNN stored in .raw format.
reference_output directory contains intermediate layer outputs generated by framework stored in .raw format.
oneshot_layerwise.csv, report for verification results of each layer output.
plots directory containing html plots of verification results of each layer output.
Snapshot of summary.csv file:
Understanding the oneshot-layerwise summary report:
Column |
Description |
|---|---|
Source Name |
Output name of the current layer in the framework graph. |
Target Name |
Output name of the current layer in the target graph. |
Layer type |
Type of current layer |
Shape |
Shape of this layer’s output |
<Verifier name> |
Verifier value of the current layer output compared to reference output |
cumulative-layerwise Snooping¶
This algorithm is designed to debug one layer at a time by performing below steps:
Execute framework runner to collect reference outputs from all intermediate tensors of a model in fp32 precision
Execute inference engine and verification steps in iterative manner to perform below operations
- Collect target outputs in target precision for each layer while removing the effect of its preceding layers on final output - Compare intermediate outputs from framework runner and inference engine
It provides deeper analysis to identify sensitivity of layers of model causing accuracy deviation and can be used to measure quantization sensitivity of each layer/op in the model with regard to the final output of the model.
Debugging Accuracy issue with Quantized model using Cumulative Layerwise Snooping
With quantized models, it is expected to have some mismatch at most data intensive layers - arising due to quantization error.
The debugger can be used to identify operators which are most sensitive with high verifier score and run those at higher precision to improve overall accuracy.
The sensitivity is determined by the verifier score seen at that layer regarding the reference platform (like ONNXRT).
Note that Cumulative-layerwise debugging takes considerable time as the partitioned model shall be quantized and compiled at every layer that does not have a 100% match with reference.
Below is one strategy to debug larger models:
Run Oneshot-layerwise on the model which helps to identify the starting point of sensitivity in the model.
Run Cumulative-layerwise at different parts of the model using start-layer and end-layer options (if the model has 100 nodes, use start layer at starting node from Oneshot-layerwise run and end layer at the 25th node for run 1, start layer at 26th and end layer at 50th node for run 2, start layer at 51st node and end layer at 75th node for run 3 .. and so on).The final reports of all runs help to identify the most sensitive layers in the model. Let’s say node A,B,C have high verifier scores which indicates high sensitivity
Run the original model with those specific layers (A/B/C - one at a time or combinations) in FP16 and observe the improvement in accuracy.
Sample Commands
# Example for executing cumulative-layerwise on HTP Android device hosted on a Linux machine:
qairt-accuracy-debugger snooping
--algorithm cumulative_layerwise \
--backend htp \
--platform aarch64-android \
--input_model artifacts/mobilenet-v2.onnx \
--calibration_input_list artifacts/list.txt \
--input_sample input_sample.txt\
--output_tensor "473" \
--comparator mse \
--quantization_overrides artifacts/quantized_encoding.json \
# Example for executing cumulative-layerwise snooping on a WoS HTP target:
qairt-accuracy-debugger snooping ^
--algorithm cumulative_layerwise ^
--backend htp ^
--platform wos ^
--input_model artifacts/mobilenet-v2.onnx ^
--input_sample input_sample.txt ^
--comparator mse ^
--calibration_input_list calib_list.txt
# Example for using external golden outputs dumped by frameworks like ONNX:
qairt-accuracy-debugger snooping
--algorithm cumulative_layerwise \
--backend htp \
--platform aarch64-android \
--input_model artifacts/mobilenet-v2.onnx \
--calibration_input_list artifacts/list.txt \
--input_sample input_sample.txt \
--output_tensor "473" \
--comparator mse \
--quantization_overrides artifacts/quantized_encoding.json \
--golden_reference /path/to/goldens
# Example for using external golden outputs dumped by QNN:
qairt-accuracy-debugger snooping \
--algorithm cumulative_layerwise \
--backend htp \
--platform aarch64-android \
--input_model artifacts/mobilenet-v2.onnx \
--calibration_input_list artifacts/list.txt \
--input_sample input_sample.txt\
--output_tensor "473" \
--comparator mse \
--quantization_overrides artifacts/quantized_encoding.json \
--golden_reference /path/to/goldens \
--is_qnn_golden_reference
Tip
Refer to inference-engine sample commands to understand usage of different platforms/backends
Output
Below is the output directory structure:
working_directory
└── cumulative_layerwise_snooping
└── 2025-07-07_06-00-17
├── all_subgraphs.json
├── cumulative_layerwise.csv
├── encodings_converter
├── inference_engine
├── plots
├── reference_output
└── sub_graph_node_precision_files
inference_engine directory contains intemediate outputs obtained from inference engine step stored in separate directories with respective layer names. Also it contains final report named cumulative_layerwise.csv which contains verifier scores for each layer. User can identify layers with most deviating scores as problematic nodes.
reference_output directory contains timestamped directory that contains the intermediate layers outputs stored in .raw format just as mentioned in Framework Runner step.
plots directory containing html plots of verification results of each layer output.
Snapshot of cumulative_layerwise.csv:
Understanding the cumulative-layerwise report:
Column |
Description |
|---|---|
Source Name |
Output name of the current layer in the framework graph. |
Target Name |
Output name of the current layer in the target graph. |
Status |
|
Layer Type |
Type of the current layer. |
Framework Shape |
Shape of this framework layer’s output. |
Target Shape |
Shape of this target layer’s output. |
Framework(Min, Max, Median) |
The Min, Max and Median of the outputs at this layer taken from reference execution. |
Target(Min, Max, Median) |
The Min, Max and Median of the outputs at this layer taken from target execution. |
<Verifier name>(current_layer) |
Absolute verifier value of the current layer compared to reference platform. |
<Verifier name>(original model output name) |
For each original model output, absolute verifier value of the original model output compared to reference platform. |
Info |
Displays information for the output verifiers, if the values are abnormal. |
layerwise Snooping¶
This algorithm is designed to debug a single layer model at a time by performing the following steps:
Get golden reference per layer outputs from an external tool or, if a golden reference is not given, run framework runner to collect reference outputs from all intermediate tensors of a model in fp32 precision
Iteratively execute inference engine and verification to: - Collect target outputs in target precision for the layer under investigation and final model output by quantizing the specific subgraph and running rest of the model in floating point - Compare intermediate output from golden reference with target execution
Layer-wise snooping provides deeper analysis to identify all model layers causing accuracy deviation on hardware with respect to framework/simulation outputs. This algorithm can be used to identify kernel issues for layers/ops present in the model and for sensitivity analysis.
Note
Debugging Accuracy issue for models exhibiting Accuracy discrepancy between golden reference (for ex. - AIMET/framework runtime output) vs target output using Layerwise Snooping
- One of the popular usecase for layerwise snooping is debugging accuracy difference between AIMET vs target
Though we are creating an exact simulation of hardware using tools like AIMET, still it is expected to have a very minute mismatch due to environment differences. This can be because simulation executes on GPU FP32 kernels and is simulating noise rather than actual execution on integer kernels in the case of hardware execution.
If we have a higher deviation between simulation and hardware, then layerwise snooping could be used to point out to the nodes having higher deviations. The nodes showing higher deviation as per layerwise.csv can be identified as the erroneous nodes.
Other usecases include debugging Framework runtime’s FP32 output vs target INT16 output deviations.
Sample Commands
# Example for executing cumulative-layerwise on a HTP Android device hosted on a Linux machine:
qairt-accuracy-debugger snooping
--algorithm layerwise \
--backend htp \
--platform aarch64-android \
--input_model artifacts/mobilenet-v2.onnx \
--calibration_input_list artifacts/list.txt \
--input_sample input_sample.txt \
--output_tensor "473" \
--comparator mse \
--quantization_overrides artifacts/quantized_encoding.json \
# Example for executing layerwise snooping on a WoS HTP target:
qairt-accuracy-debugger snooping ^
--algorithm layerwise ^
--backend htp ^
--platform wos ^
--input_model artifacts/mobilenet-v2.onnx ^
--input_sample input_sample.txt ^
--comparator mse ^
--calibration_input_list calib_list.txt
# Example for using external golden outputs dumped by any frameworks like ONNX, TF:
qairt-accuracy-debugger snooping \
--algorithm layerwise \
--backend htp \
--platform aarch64-android \
--input_model artifacts/mobilenet-v2.onnx \
--calibration_input_list artifacts/list.txt \
--input_sample input_sample.txt \
--output_tensor "473" \
--comparator mse \
--quantization_overrides artifacts/quantized_encoding.json \
--golden_reference /path/to/goldens
# Example for using external golden outputs dumped by QNN:
qairt-accuracy-debugger snooping \
--algorithm layerwise \
--backend htp \
--platform aarch64-android \
--input_model artifacts/mobilenet-v2.onnx \
--calibration_input_list artifacts/list.txt \
--input_sample input_sample.txt \
--output_tensor "473" \
--comparator mse \
--quantization_overrides artifacts/quantized_encoding.json \
--golden_reference /path/to/goldens \
--is_qnn_golden_reference
Tip
Refer to inference-engine sample commands to understand usage of different runtimes/backends
Output
Below is the output directory structure:
working_directory
└── layerwise_snooping
└──2025-07-07_05-58-26
├── all_subgraphs.json
├── encodings_converter
├── inference_engine
├── layerwise.csv
├── plots
├── reference_output
└── sub_graph_node_precision_files
framework_runner directory contains timestamped directory that contains the intermediate layers outputs stored in .raw format just as mentioned in Framework Runner step.
snooping contains each single layer model outputs obtained from the inference engine stage stored in separate directories and the final report named layerwise.csv which contains verifier scores for each layer model. Users can identify layers with the most deviating scores as problematic nodes.
layerwise.csv is similar to the cumulative-layerwise report (cumulative_layerwise.csv), except that original outputs column will not be present in layerwise snooping. Please refer to cumulative-layerwise report for more details.
plots directory containing html plots of verification results of each layer output.
Snapshot of layerwise.csv:
Understanding the layerwise report:
Column |
Description |
|---|---|
Source Name |
Output name of the current layer in the framework graph. |
Target Name |
Output name of the current layer in the target graph. |
Status |
|
Layer Type |
Type of the current layer. |
Framework Shape |
Shape of this framework layer’s output. |
Target Shape |
Shape of this target layer’s output. |
Framework(Min, Max, Median) |
The Min, Max and Median of the outputs at this layer taken from reference execution. |
Target(Min, Max, Median) |
The Min, Max and Median of the outputs at this layer taken from target execution. |
<Verifier name>(current_layer) |
Absolute verifier value of the current layer compared to reference platform. |
<Verifier name>(original model output name) |
For each original model output, absolute verifier value of the original model output compared to reference platform. |
Info |
Displays information for the output verifiers, if the values are abnormal. |
qnn-platform-validator¶
qnn-platform-validator checks the QNN compatibility/capability of a device. The output is saved in a CSV file in the “output” directory, in a csv format. Basic logs are also displayed on the console.
DESCRIPTION:
------------
Helper script to set up the environment for and launch the qnn-platform-
validator executable.
REQUIRED ARGUMENTS:
-------------------
--backend <BACKEND> Specify the backend to validate: <gpu>, <dsp>
<all>.
--directory <DIR> Path to the root of the unpacked SDK directory containing
the executable and library files
--dsp_type <DSP_VERSION> Specify DSP variant: v66 or v68
OPTIONALS ARGUMENTS:
--------------------
--buildVariant <TOOLCHAIN> Specify the build variant
aarch64-android or aarch64-windows-msvc to be validated.
Default: aarch64-android
--testBackend Runs a small program on the runtime and Checks if QNN is supported for
backend.
--deviceId <DEVICE_ID> Uses the device for running the adb command.
Defaults to first device in the adb devices list..
--coreVersion Outputs the version of the runtime that is present on the target.
--libVersion Outputs the library version of the runtime that is present on the target.
--targetPath <DIR> The path to be used on the device.
Defaults to /data/local/tmp/platformValidator
--remoteHost <REMOTEHOST> Run on remote host through remote adb server.
Defaults to localhost.
--debug Set to turn on Debug log
- The following files need to be pushed to the device for the DSP to pass validator test.Note that the stub and skel libraries are specific to the DSP architecture version(e.g., v73):
// Android bin/aarch64-android/qnn-platform-validator lib/aarch64-android/libQnnHtpV73CalculatorStub.so lib/hexagon-${DSP_ARCH}/unsigned/libCalculator_skel.so // Windows bin/aarch64-windows-msvc/qnn-platform-validator.exe lib/aarch64-windows-msvc/QnnHtpV73CalculatorStub.dll lib/hexagon-${DSP_ARCH}/unsigned/libCalculator_skel.so The following example pushes the aarch64-android variant to /data/local/tmp/platformValidator
adb push $SNPE_ROOT/bin/aarch64-android/snpe-platform-validator /data/local/tmp/platformValidator/bin/qnn-platform-validator adb push $SNPE_ROOT/lib/aarch64-android/ /data/local/tmp/platformValidator/lib adb push $SNPE_ROOT/lib/dsp /data/local/tmp/platformValidator/dsp
qnn-profile-viewer¶
The qnn-profile-viewer tool is used to parse profiling data that is generated using qnn-net-run. Additionally, the same data can be saved to a csv file.
usage: qnn-profile-viewer --input_log PROFILING_LOG [--help] [--output=CSV_FILE] [--extract_opaque_objects] [--reader=CUSTOM_READER_SHARED_LIB] [--schematic=SCHEMATIC_BINARY] [--standardized_json_output]
Reads profiling logs and outputs the contents to stdout
Note: The IPS calculation takes the following into account: graph execute time, tensor file IO time, and misc. time for quantization, callbacks, etc.
required arguments:
--input_log PROFILING_LOG1,PROFILING_LOG2
Provides a comma-separated list of Profiling log files
optional arguments:
--output PATH
Output file with processed profiling data. File formats vary depending upon the reader used
(see --reader). If not provided, not output is created.
--help Displays this help message.
--reader CUSTOM_READER_SHARED_LIB
Path to a reader library. If not specified, the default reader outputs a CSV file.
--schematic SCHEMATIC_BINARY
Path to the schematic binary file.
Please note that this option is specific to the QnnHtpOptraceProfilingReader library.
--config CONFIG_JSON_FILE
Path to the config json file.
Please note that this option is specific to the QnnHtpOptraceProfilingReader library.
--dlc DLC_FILE
Path to the dlc file.
Please note that this option is specific to the QnnHtpOptraceProfilingReader library.
--zoom_start PROFILE_SUBMODULE_START_NODE
Name of starting node for a profile submodule optrace. If you specify this option you must also specify --zoom_end.
Please note that this option is specific to the QnnHtpOptraceProfilingReader library.
--zoom_end PROFILE_SUBMODULE_END_NODE
Name of ending node for a profile submodule optrace. If you specify this option you must also specify --zoom_start.
Please note that this option is specific to the QnnHtpOptraceProfilingReader library.
--version Displays version information.
--extract_opaque_objects Specifies that the opaque objects will be dumped to output files
--standardized_json_output Specifies that the JSON output will be standardized for consumption by other tools within the SDK ecosystem.
Please note that this option is specific to the QnnJsonProfilingReader library.
Warning
qnn-netron Deprecation Notice: qnn-netron has been deprecated and will be removed in 2.40.
qnn-netron (Beta)¶
Overview¶
QNN Netron tool is making model debugging and visualization less daunting. qnn-netron is an extension of the netron graph tool. It provides for easier graph debugging and convenient runtime information. There are currently two key functionalities of the tool:
The Visualize section allows customers to view their desired models after using the QNN Converter by importing the JSON representation of the model
The Diff section allows customers to run networks of their choosing on different runtimes in order to compare network accuracy and performance
Launching Tool¶
Dependencies
The QNN netron tool leverages electron JS framework for building GUI frontend and depends on npm/node_js to be available in system. Additionally, python libraries for accuracy analysis are required by backend of tool. A convenient script is available in the QNN SDK to download necessary dependencies for building and running the tool.
# Note: following command should be run as administrator/root to be able to install system libraries
$ sudo bash ${QNN_SDK_ROOT}/bin/check-linux-dependency.sh
$ ${QNN_SDK_ROOT}/bin/check-python-dependency
Launching Application
qnn-netron script is used to be able to launch the QNN Netron application. This script:
Clones vanilla netron git project
Applies custom patches for enabling Netron for QNN
Build the npm project
Launches application
$ qnn-netron -h
usage: qnn-netron [-h] [-w <working_dir>]
Script to build and launch QNN Netron tool for visualizing and running analysis on Qnn Models.
Optional argument(s):
-w <working_dir> Location for building QNN Netron tool. Default: current_dir
# To build and run application use
$ qnn-netron -w <my_working_dir>
QNN Netron Visualize Deep Dive¶
First, the user is prompted to open a JSON file that represents their converted model. This JSON comes from the converter tool. Please refer to this Overview for more details.
Once the file is loaded into the tool, the graph should be displayed in the UI as shown below:
After loading in the model, the user can click on any of the nodes and a side pop-up section will display node information such as the type and name as well as vital parameter information such as inputs and outputs (datatypes, encodings, and shapes)
Netron Diff Customization Deep Dive¶
Limitations
Diff Tool comparison between source framework goldens only works for framework goldens that are spatial first axis order. (NHWC)
For usecases where source framework golden is used for comparison, Diff Tool is only tested to work for tensorflow and tensorflow variant frameworks.
In order for the user to open the Diff Customization tool, they can either click file and then “Open Diff…” or on tool startup by clicking “Diff…” as shown below:
Upon launch of the Diff Customization tool, at the top, the user is prompted to select a use case for the tool. There are 3 options to choose from:
For the purposes of this documentation, only inference vs inference will be detailed. The setup procedure for the other use cases is similar. The other two use cases are explained below:
Golden vs Inference: Used to test inference run using goldens from a particular ML framework and comparing against the output of a QNN backend
Output vs Output: Used to test existing inference results against ML framework goldens OR used to test differences between two existing inference results
Inference Vs Inference: Used to test inference between two converted QNN models or the same QNN model on different QNN backends
Inference vs Inference¶
If this use case is selected, the user is presented with various form fields for the purposes of running two jobs asynchronously with the option of choosing different runtimes for each QNN network being run.
A more detailed view of what the user is prompted is displayed below:
In order to execute the networks, the user has two options:
Running on Host machine
When the Target Device is selected as “host”, the user can only use the CPU as a runtime. In addition, the user can only select “x86_64-linux-clang” as the architecture in this use case.
Running On-Device
When the Target Device is selected as “on-device”, a Device ID is required to connect to the device via adb. Thereafter, the user can select any of the three QNN backend runtimes available (CPU, GPU, or DSPv[68, 69, 73]) and the user can select architecture “aarch64-android”
After choosing the desired target device and runtime configurations, the rest of the fields are explained in detail below:
Note
Users are able to click again and change the location to any of the path fields
Setup Parameters |
Configurations to Select |
|---|---|
The options for what verifier to run on the outputs of the model are (See Note below table for custom verifier (accuracy + performance) thresholds and see table below for providing custom accuracy verifier hyperparameters): |
RtolAtol, AdjustedRtolAtol, TopK, MeanIOU, L1Error, CosineSimilarity, MSE, SQNR |
Model JSON |
upload <model>_net.json file that was outputted from the QNN converters. |
Model Cpp |
upload <model>.cpp that was outputted from the QNN converters. |
Model Bin |
upload <model>.bin that was outputted from the QNN converters. |
NDK Path |
upload the path to your Android NDK |
Devices Engine Path |
upload the path to the top-level of the unzipped qnn-sdk |
Input List |
provide a path to the input file for the model |
Save Run Configurations |
provide a location where the inference and runtime results from the Diff customization tool will be stored |
Note
Users have the option of providing a custom accuracy and performance verifier threshold when running diff. A custom accuracy verifier threshold can be provided for any of the accuracy verifiers. By default the verifier thresholds are 0.01. The custom thresholds can be provided in the text boxes labelled “Accuracy Threshold” and “Perf Threshold”.
Users now have the option to enter accuracy verifier specific hyperparameters inside textboxes. The Default Values are displayed inside the text-boxes and can be customized as per user needs. The table below highlights the hyperparameters for each verifier that can be customized.
Verifier |
Hyperparameters |
|---|---|
AdjustedRtolAtol |
Number of Levels |
RtolAtol |
Rtol Margin, Atol Margin |
Topk |
K, Ordered |
MeanIOU |
Background Classification |
L1Error |
Multiplier, Scale |
CosineSimilarity |
Multiplier, Scale |
MSE (Mean Square Error) |
N/A |
SQNR (Signal-To-Noise Ratio) |
N/A |
Below is an example of what the fields should look like once filled to completion:
After running the Diff Customization tool, the output directories/files should be present in the working directory file path provided in the last field
Results and Outputs:¶
After pressing the Run button as mentioned above, the visualization of the network should pop-up. Nodes will be highlighted if there are any accuracy and/or performance variations. Clicking on each node will show more information about the accuracy and performance diff information as shown below.
Performance and Accuracy Diff Visualizations:¶
As seen above, the performance and accuracy diff information is shown under the Diff section of any given node. The color of the node boundary in the viewer represents whether a performance or accuracy error (above the default verifier threshold of 0.01) was reported. For example, in the Conv2d node shown below, there are two boundaries of orange and red indicating that this node has both an accuracy and performance difference across the runs. The FullyConnected node shown only has a yellow boundary indicating that only a performance difference was found.
qnn-context-binary-utility¶
The qnn-context-binary-utility tool validates and serializes the metadata of context binary into a json file. This json file can then be used for inspecting the context binary aiding in debugging. A QNN context can be serialized to binary using QNN APIs or qnn-context-binary-generator tool.
usage: qnn-context-binary-utility --context_binary CONTEXT_BINARY_FILE --json_file JSON_FILE_NAME [--help] [--version]
Reads a serialized context binary and validates its metadata.
If --json_file is provided, it outputs the metadata to a json file
required arguments:
--context_binary CONTEXT_BINARY_FILE
Path to cached context binary from which the binary info will be extracted
and written to json.
--json_file JSON_FILE_NAME
Provide path along with the file name <DIR>/<FILE_NAME> to serialize
context binary info into json.
The directory path must exist. File with the FILE_NAME will be created at DIR.
optional arguments:
--help Displays this help message.
--version Displays version information.
Additional explanation¶
Accessing Graph Blob Info V2 Struct¶
Graph Blob Info V2 struct is present in serialized binary right after V1 struct (in context binaries prepared in QNN SDK 2.37 or later) and it can be accessed like below:
uint8_t* array = static_cast<uint8_t*>(graphBlobInfo);
QnnHtpSystemContext_GraphBlobInfoV2_t* v2 = array + sizeof(QnnHtpSystemContext_GraphBlobInfo_t);
Note: Users must add a check for null pointer before dereferencing V2
Parameters Description¶
Below is a table representing the meanings of various parameters.
Parameters |
Description |
|---|---|
nativeKChannelSize |
The nativeK channel tile size used by each of the graphs |
nativeVChannelSize |
The nativeV channel tile size used by each of the graphs |
isSafeShareIO |
It is safe to share the buffer between inputs and outputs, 1: True, 0: False
Client is responsible for ensuring no clash between input and output when flag is set
|
graphIOTensorSize |
Graph input/output tensors size(bytes) |
DDRTensorSize |
Size of DDR-tensor(bytes) |
OpDataSize |
Memory size inlcuding op data like runlists(bytes) |
constSize |
Size of const data in the graph(bytes) |
SharedWeightSize |
Shared weights size(bytes) |
spillFillBufferSize |
The spill-fill buffer size used by each of the graphs |
vtcmSize |
HTP vtcm size (MB) |
optimizationLevel |
Optimization level |
htpDlbc |
Htp Dlbc |
numHvxThreads |
Number of HVX Threads to reserve |
Memory Usage Scenarios¶
Use Case 1: Single Model Inference
Total RAM = OpDataSize + constSize + DDRTensorSize + spillFillBufferSize + graphIOTensorSize + vtcmSize
Use Case 2: Large Language Model (LLM) with Weight Sharing
Total RAM = (OpDataSize₁ + constSize₁ + DDRTensorSize₁ + spillFillBufferSize₁ +graphIOTensorSize₁ +vtcmSize₁) + Shared Weights + (OpDataSize₂ + constSize₂ + DDRTensorSize₂ + spillFillBufferSize₂ + graphIOTensorSize₂ +vtcmSize₂)
Accuracy Evaluator plugins¶
File-based plugins¶
This section lists the built-in file-based plugins.
Dataset plugins¶
create_squad_examples - Extracts examples from given squad dataset file and save them to a file.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
squad_version |
Squad version 1 or 2 |
Integer |
1 |
filter_dataset - Filters the dataset including the input list, calibration and annotation files.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
max_inputs |
Maximum number of inputs in inputlist to be considered for execution |
Integer |
Mandatory |
max_calib |
Maximum number of inputs in calibration to be considered for execution |
Integer |
Mandatory |
random |
Shuffles the inputlist and calibration files |
Boolean |
False |
gpt2_tokenizer - Tokenizes data from files using GPT2TokenizerFast.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
vocab_file |
Path to the vocabulary file |
String |
Mandatory |
merges_file |
Path to the merges file |
String |
Mandatory |
seq_length |
Sequence length for the generated model inputs |
Integer |
Mandatory |
past_seq_length |
Sequence length for the “past” inputs |
Integer |
Mandatory |
past_shape |
Shape of the ‘past’ inputs |
List |
|
num_past |
Number of ‘past’ inputs |
Integer |
0 |
split_txt_data - Saves individual text files for each line present in the given input text file.
Preprocessing plugins¶
centernet_preproc - Performs preprocessing on CenterNet dataset examples.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
dims |
Height and width; comma delimited, e.g., 416,416 |
String |
Mandatory |
scale |
Scale factor for image |
Float |
1.0 |
fix_res |
Resolution of the image |
Boolean |
True |
pad |
Image padding |
Integer |
0 |
convert_nchw - Transposes WHC to CHW or CHW to WHC and adds an extra N dimension.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
expand-dims |
Add the Nth dimension |
Boolean |
True |
create_batch - Concatenates raw input files into a single file using numpy.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
delete_prior |
To delete prior unbatched data to save space |
Boolean |
True |
truncate |
If num inputs are not a multiple of batch size, then truncate left over inputs in the last batch or not |
Boolean |
False |
crop - Center crops an image to the given dimensions using numpy or torchvision based on the library parameter.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
dims |
Height and width; comma delimited, e.g., 640,640 |
String |
Mandatory |
library |
Python library used to crop the given input; valid values are: numpy | torchvision |
String |
numpy |
typecasting_required |
To convert final output to numpy or not. Note: This option is specific to torchvision library |
Boolean |
True |
expand_dims - Adds the N dimension for images, e.g., HWC to NHWC.
image_transformers_input - Creates input files with image and/or text for image transformer models like ViT and CLIP.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
dims |
Expected processed output dimension in CHW format |
String |
Mandatory |
num_base_class |
Number of base classes in classification; used in the scenario where text input is also provided |
Integer |
Total classes available |
num_prompt |
Number of prompts for text classes; used in the scenario where text input is also provided |
Integer |
Total classes available |
image_only |
Data type of raw data |
Boolean |
False |
normalize - Normalizes input per the given scheme; data must be of NHWC format.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
library |
Python library used to crop the given input; valid values are: numpy | torchvision |
String |
numpy |
norm |
Normalization factor, all values divided by norm |
float32 |
255 |
means |
Dictionary of means to be subtracted, e.g., {“R”:0.485, “G”:0.456, “B”:0.406} |
RGB dictionary |
{“R”:0, “G”:0, “B”:0} |
std |
Dictionary of std-dev for rescaling the values, e.g., {“R”:0.229, “G”:0.224, “B”:0.225} |
RGB dictionary |
{“R”:1, “G”:1, “B”:1} |
channel_order |
Channel order to specify means and std values per channel - RGB | BGR |
String |
RGB |
normalize_first |
To perform normalization before or after mean subtraction and standard deviation.
normalize_first=True means perform normalization before.
Note: torchvision library does not use this option
|
Boolean |
True |
typecasting_required |
To convert final output to numpy or not. Note: This option is specific to the Torchvision library |
Boolean |
True |
pil_to_tensor_input |
To convert input to tensor before normalization. Note: This option is specific to the Torchvision library |
Boolean |
True |
onmt_preprocess - Performs preprocessing on WMT dataset for FasterTransformer OpenNMT model
Parameters |
Description |
Type |
Default |
|---|---|---|---|
vocab_path |
Path to OpenNMT model vocabulary file (pickle file) |
String |
Mandatory |
src_seq_len |
The maximum total input sequence length |
Integer |
128 |
skip_sentencepiece |
Skip sentencepiece encoding |
Boolean |
True |
sentencepiece_model_path |
Path to sentencepiece model for WMT dataset (mandatory when “skip_sentencepiece” is False) |
String |
None |
pad - Image padding with constant pad size or based on target dimensions
Parameters |
Description |
Type |
Default |
|---|---|---|---|
type |
|
String |
Mandatory |
dims |
Height and width comma delimited, e.g., 416,416 for ‘target-dims’ type of padding |
String |
Mandatory |
pad_size |
Size of padding for ‘constant’ type of padding |
Integer |
None |
img_position |
Parameter to specify position of image, either ‘center’ or ‘corner’ (top-left). Padding is added accordingly. Currently used for ‘target_dims’ type padding |
String |
center |
color |
Padding value for all planes |
Integer |
114 |
resize - Resizes an image using the specified library parameter: cv2(Default), pillow or torchvision
Parameters |
Description |
Type |
Default |
|---|---|---|---|
dims |
Height and width; comma delimited, e.g., 640,640 |
String |
Mandatory |
library |
Python library to be used for resizing a given input; valid values are: opencv | pillow | torchvision |
String |
opencv |
channel_order |
Convert image to specified channel order. At present this parameter only takes the ‘RGB’ value |
String |
RGB |
interp |
|
String |
For opencv and torchvision: bilinear
For pillow: bicubic
|
type |
Type of resize to be done. Note: Torchvision does not use this option. Options:
|
String |
auto-resize |
resize_before_typecast |
To resize before or after conversion to target datatype e.g., fp32 |
Boolean |
True |
typecasting_required |
To convert final output to numpy or not. Note: This option is specific to the Torchvision library |
Boolean |
True |
mean |
Dictionary of means to be subtracted, e.g., {“R”:0.485, “G”:0.456, “B”:0.406}. Note: This option is specific to the Tensorflow library |
RGB dictionary |
{“R”:0, “G”:0, “B”:0} |
std |
Dictionary of std-dev for rescaling the values, e.g., {“R”:0.229, “G”:0.224, “B”:0.225}. Note: This option is specific to the Tensorflow library |
RGB dictionary |
{“R”:0, “G”:0, “B”:0} |
normalize_before_resize |
To perform normalization before or after mean subtraction and standard deviation. Note: This option is specific to the Tensorflow library |
Boolean |
False |
crop_before_resize |
To perform cropping before resize. Note: This option is specific to the Tensorflow library |
Boolean |
False |
squad_read - Reads the SQuAD dataset JSON file. Preprocesses the question-context pairs into features for language models like BERT-Large
Parameters |
Description |
Type |
Default |
|---|---|---|---|
vocab_path |
Path for local directory containing vocabulary files |
String |
Mandatory |
max_seq_length |
The maximum total input sequence length after WordPiece tokenization. Sequences longer than this will be truncated, and sequences shorter than this will be padded |
Integer |
384 |
max_query_length |
The maximum number of tokens for the question. Questions longer than this will be truncated to this length |
Integer |
64 |
doc_stride |
When splitting up a long document into chunks, how much stride to take between chunks |
Integer |
128 |
packing_strategy |
Set this flag when using packing strategy for bert based models |
Boolean |
False |
max_sequence_per_pack |
The maximum number of sequences which can be packed together |
Integer |
3 |
mask_type |
This can take either of three values - ‘None’, ‘Boolean’ or ‘Compressed’ depending on the masking to be done on input_mask |
String |
None |
compressed_mask_length |
Set this value if mask_type is set to compressed |
Integer |
None |
Postprocessing plugins¶
bert_predict - Predicts answers for a SQuAD dataset given start and end logits.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
vocab_path |
Path for a local directory containing vocabulary files |
String |
Mandatory |
max_seq_length |
The maximum total input sequence length after WordPiece tokenization. Sequences longer than this will be truncated, and sequences shorter than this will be padded (optional if preprocessing is run) |
Integer |
384 |
doc_stride |
When splitting up a long document into chunks, how much stride to take between chunks (optional if preprocessing is run) |
Integer |
128 |
max_query_length |
The maximum number of tokens for the question. Questions longer than this will be truncated to this length (optional if preprocessing is run) |
Integer |
64 |
n_best_size |
The total number of n-best predictions to generate in the post.json output file |
Integer |
20 |
max_answer_length |
The maximum length of an answer that can be generated. This is needed because the start and end predictions are not conditioned on one another |
Integer |
30 |
packing_strategy |
This flag is set to True if using packing strategy |
Boolean |
False |
centerface_postproc - Processes the inference outputs to parse detections and generates a detections file for the metric evaluator. Used for processing CenterFace face detector.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
dims |
Height and width; comma delimited, e.g., 640,640 |
String |
Mandatory |
dtypes |
List of datatypes to be used for bounding boxes, scores, and labels (in order), e.g., [float32, float32, int64]. Defaults to the datatypes fetched from the ‘outputs_info’ for the model’s config.yaml |
List |
Datatypes from the outputs_info section of the model config.yaml |
heatmap_threshold |
User input for heatmap threshold |
Float |
0.05 |
nms_threshold |
User input for nms threshold |
Float |
0.3 |
centernet_postprocess - Processes the inference outputs to parse detections and generate a detections file for the metric evaluator. Used for processing CenterNet detector.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
dtypes |
List of datatypes (at least 3) to be used to infer outputs |
String |
Mandatory |
output_dims |
Height and width; comma delimited, e.g., 640,640 |
String |
Mandatory |
top_k |
Top K proposals are given from the postprocess plugin |
Integer |
100 |
num_classes |
Number of classes |
Integer |
1 |
score |
Threshold to purify the detections |
Integer |
1 |
lprnet_predict - Used for LPRNET license plate prediction.
object_detection - Processes the inference outputs to parse detections and generate a detections file for metric evaluator
Parameters |
Description |
Type |
Default |
|---|---|---|---|
dims |
Height and width; comma delimited, e.g., 640,640 |
String |
Mandatory |
type |
Type of post-processing (e.g., letterbox, stretch) |
String |
None |
label_offset |
Offset for the labels information |
Integer |
0 |
score_threshold |
Threshold limit for the detection scores |
Float |
0.001 |
xywh_to_xyxy |
Convert bounding box format from box center (xywh) to box corner (xyxy) format |
Boolean |
False |
xy_swap |
Swap the X and Y coordinates of bbox |
Boolean |
False |
dtypes |
List of datatypes used for bounding boxes, scores, and labels in order, e.g., [float32, float32, int64]. Defaults to the datatypes fetched from the ‘outputs_info’ for the model’s config.yaml. |
List |
Datatypes from the outputs_info section of the model config.yaml |
mask |
Do postprocessing on mask |
Boolean |
False |
mask_dims |
Output dims of model. Provide this only if mask = True. E.g., 100,80,28,28 |
String |
None |
padded_outputs |
Pad the outputs |
Boolean |
False |
scale |
Comma separated scale values |
String |
‘1’ |
skip_padding |
Skip padding while rescaling to original image shape |
Boolean |
False |
onmt_postprocess - Performs preprocessing for OpenNMT model outputs
Parameters |
Description |
Type |
Default |
|---|---|---|---|
sentencepiece_model_path |
Path to sentencepiece model for WMT dataset |
String |
Mandatory |
unrolled_count |
Upper limit on the unrolls required for the output (no. of output tokens to be considered for metric) |
Integer |
26 |
vocab_path |
Path to OpenNMT model vocabulary file (pickle file), optional if preprocessing is run |
String |
None |
skip_sentencepiece |
Skip sentencepiece encoding, optional if preprocessing is run |
Boolean |
None |
Metric plugins¶
bleu - Evaluates bleu score using sacrebleu library
Parameters |
Description |
Type |
Default |
|---|---|---|---|
round |
Number of decimal places to round the result to |
Integer |
1 |
map_coco - Evaluates the mAP score 50 and 50:05:95 for COCO dataset
Parameters |
Description |
Type |
Default |
|---|---|---|---|
map_80_to_90 |
Mapping of classes in range 0-80 to 0-90 |
Boolean |
False |
segm |
Flag to calculate mAP for mask |
Boolean |
False |
keypoint_map |
Flag to calculate mAP for keypoint |
Boolean |
False |
perplexity - Calculates the perplexity metric. Model outputs are expected to be the logits of proper shape. Ground truth data is expected to be in tokenized format and in the form of token IDs. The ground truth will be automatically generated, if using the “gpt2_tokenizer” dataset plugin.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
logits_index |
Index of the logits output if the model has multiple outputs |
Integer |
0 |
precision - Calculates the precision metric, i.e., (correct predictions / total predictions). Ground truth data is expected in the format “filename <space> correct_text”. The postprocessed model outputs are expected to be text files with just the “predicted_text”.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
round |
Number of decimal places to round the result to |
Integer |
7 |
input_image_index |
For multi input models, the index of image file in input file list csv |
Integer |
0 |
squad_em - Calculates the exact match for SQuAD v1.1 dataset predictions and ground truth.
squad_f1 - Calculates F1 score for SQuAD v1.1 dataset predictions and ground truth.
topk - Evaluates topk value by comparing results and annotations.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
kval |
Top k values, e.g., 1,5 evaluates top1 and top5 |
String |
5 |
softmax_index |
Index of the softmax output in the results file list |
Integer |
0 |
label_offset |
Offset required in the labels’ scores, e,g., if shape is 1x1001, then labels_offset=1 |
Integer |
0 |
round |
Number of decimal places to round the result to |
Integer |
3 |
input_image_index |
For multi input models, the index of image file in input file list csv |
Integer |
0 |
widerface_AP - Computes average precision for easy, medium, and hard cases.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
IoU_threshold |
User input for IoU threshold |
Float |
0.4 |
Memory-based plugins¶
This section lists the built-in memory-based plugins.
Dataset plugins¶
SQUADDataset - The Stanford Question Answering Dataset (SQuAD) is a widely used benchmark dataset for question-answering tasks, featuring over 100,000 questions annotated on more than 500 Wikipedia articles. This dataset allows us to load and extract examples from a specified SQuAD dataset file.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
tokenizer_model_name_or_path |
|
os.PathLike | str |
Mandatory |
annotation_path |
Path to the SQUAD annotation file. |
Optional[os.PathLike | str] |
None |
calibration_path |
Path to the SQUAD calibration file. |
Optional[os.PathLike | str] |
None |
max_samples |
The maximum number of samples to load. |
Optional[int] |
None |
use_calibration |
Whether to use calibration data or not. |
Optional[bool] |
False |
max_seq_length |
The maximum sequence length. |
int |
384 |
max_query_length |
The maximum query length. |
int |
64 |
doc_stride |
The document stride. |
int |
128 |
threads |
The number of threads to use. |
int |
8 |
do_lower_case |
Whether to perform lower-casing on the data. |
bool |
True |
model_inputs_count |
The number of input fields in the PackedInputs tuple. |
int |
2 |
use_packing_strategy |
Whether to pack features or not. |
bool |
False |
max_sequence_per_pack |
The maximum number of sequences per pack. |
int |
3 |
mask_type |
The type of mask to use. |
Optional[Literal[‘boolean’, ‘compressed’]] |
None |
compressed_mask_length |
The length of the compressed mask. |
Optional[int] |
None |
squad_version |
The version of the SQUAD dataset. |
int |
1 |
WikiText2Dataset - The WikiText-2 dataset is a comprehensive collection of Wikipedia articles used to evaluate text generation and language modeling systems. It contains 17 million tokens from around 22,000 documents. This dataset allows us to tokenize the WikiText-2 data from files into model inputs.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
tokenizer_model_name_or_path |
|
os.PathLike | str |
Mandatory |
input_list_path |
Path to the file containing text files. |
os.PathLike | str |
Mandatory |
sequence_length |
Length of each sequence. |
int |
Mandatory |
past_shape |
Shape of past sequences. |
List[int] |
None |
calibration_indices |
List containing the indices from input list to be used as calibration data. |
Optional[List[int]] |
None |
max_samples |
Maximum number of samples to be loaded. |
Optional[int] |
None |
use_calibration |
Flag to choose whether to use calibration data. |
bool |
False |
past_sequence_length |
Length of past sequences. |
int |
0 |
num_past |
Number of past sequences. |
int |
0 |
position_id_required |
Whether position IDs are required. |
bool |
True |
mask_dtype |
Data type for masks. |
Literal[“int64”, “float32”] |
“float32” |
ImagenetDataset - The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) dataset, commonly referred to as the ImageNet dataset, is a vast and influential collection of over 14 million annotated images, making it one of the largest and most widely used benchmark datasets for computer vision research.
COCO2017Dataset - The Common Objects in Context (COCO) 2017 Dataset is a large-scale, fine-grained image dataset containing over 120,000 images and 2 million object instances from various categories, including animals, vehicles, furniture, and man-made objects, annotated with precise pixel-level masks.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
inputlist_path |
The path to the input list file. Paths in the file can be either relative to the file list’s location or an absolute path. |
Optional[os.PathLike| str] |
Mandatory |
annotation_path |
The path to the annotation file. If set to None, no annotations will be used. Annotation must be provided if metrics are to be computed. |
Optional[os.PathLike| str] |
None |
calibration_path |
The path to the calibration file. If set to None, no calibration data will be used. |
Optional[os.PathLike | str] |
None |
calibration_indices |
A list of indices from the input dataset that will be utilized for calibration purposes. User can provide a file containing comma-separated values representing the selected indices. |
Optional[list[int] | str] |
None |
use_calibration |
Flag to determine whether to use calibration data or not. |
bool |
False |
image_backend |
Image Backend to be used for loading images from disk. |
Literal[‘opencv’,’pillow’] |
‘opencv’ |
max_samples |
Maximum number of samples to be loaded. |
Optional[int] |
None |
SYN_CHINESE_LP_Dataset - The SYN_CHINESE_LP dataset is a synthetic collection of Chinese license plate images with varying levels of quality, noise, and distortion, designed to simulate real-world challenges in automatic license plate recognition (ALPR) tasks for traffic management applications.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
inputlist_path |
The path to the input list file. Paths in the file can be either relative to the file list’s location or an absolute path. |
Optional[os.PathLike| str] |
Mandatory |
annotation_path |
The path to the annotation file. If set to None, no annotations will be used. Annotation must be provided if metrics are to be computed. |
Optional[os.PathLike| str] |
None |
calibration_path |
The path to the calibration file. If set to None, no calibration data will be used. |
Optional[os.PathLike | str] |
None |
calibration_indices |
A list of indices from the input dataset that will be utilized for calibration purposes. User can provide a file containing comma-separated values representing the selected indices. |
Optional[list[int] | str] |
None |
use_calibration |
Flag to determine whether to use calibration data or not. |
bool |
False |
image_backend |
Image Backend to be used for loading images from disk. |
Literal[‘opencv’,’pillow’] |
‘opencv’ |
max_samples |
Maximum number of samples to be loaded. |
Optional[int] |
None |
WIDERFaceDataset - The WIDERFace dataset is a large-scale facial landmark detection benchmark with more than 24 million annotated facial landmarks, making it one of the most comprehensive and challenging datasets for face localization tasks.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
inputlist_path |
The path to the input list file. Paths in the file can be either relative to the file list’s location or an absolute path. |
Optional[os.PathLike| str] |
Mandatory |
annotation_path |
The path to the folder containing annotation *.mat files. If set to None, no annotations will be used. Annotation must be provided if metrics are to be computed. |
Optional[DirectoryPath] |
None |
calibration_path |
The path to the calibration file. If set to None, no calibration data will be used. |
Optional[os.PathLike | str] |
None |
calibration_indices |
A list of indices from the input dataset that will be utilized for calibration purposes. User can provide a file containing comma-separated values representing the selected indices. |
Optional[list[int] | str] |
None |
use_calibration |
Flag to determine whether to use calibration data or not. |
bool |
False |
image_backend |
Image Backend to be used for loading images from disk. |
Literal[‘opencv’,’pillow’] |
‘opencv’ |
max_samples |
Maximum number of samples to be loaded. |
Optional[int] |
None |
WMT20Dataset - The WMT20 dataset is a collection of machine translation benchmarks, consisting of parallel corpora in 46 language pairs with millions of sentence pairs, used to evaluate and improve the performance of machine translation systems for multilingual applications.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
inputlist_path |
The path to the input list file. Paths in the file can be either relative to the file list’s location or an absolute path. |
Optional[os.PathLike| str] |
Mandatory |
annotation_path |
The path to the annotation file. If set to None, no annotations will be used. Annotation must be provided if metrics are to be computed. |
Optional[os.PathLike| str] |
None |
calibration_indices |
A list of indices from the input dataset that will be utilized for calibration purposes. User can provide a file containing comma-separated values representing the selected indices. |
Optional[list[int] | str] |
None |
use_calibration |
Flag to determine whether to use calibration data or not. |
bool |
False |
max_samples |
Maximum number of samples to be loaded. |
Optional[int] |
None |
Preprocessing memory plugins¶
CenternetPreprocessor - Performs preprocessing on CenterNet dataset examples.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
output_dimensions |
Output dimensions of the processed image output. Height and width; e.g., [640 , 640] |
list[int] |
Mandatory |
scale |
Scale factor for image |
Float |
1.0 |
ConvertNCHW - Transposes WHC to CHW or CHW to WHC and adds an extra N dimension.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
expand_dims |
Add the Nth dimension |
Boolean |
True |
CropImage - Center crops an image to the given dimensions using numpy or torchvision based on the library parameter.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
image_dimensions |
Output dimensions of the processed image output. Height and width; e.g., [640 , 640] |
list[int] |
Mandatory |
library |
Python library used to crop the given input; valid values are: numpy | torchvision |
String |
numpy |
typecasting_required |
To convert final output to numpy or not. Note: This option is specific to torchvision library |
Boolean |
True |
ExpandDimensions - Adds a new dimension for images at the given axis, e.g., HWC to NHWC.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
axis |
The index of the axis to expand |
Integer |
0 |
FlipImage - Flips the input image horizontally or vertically based on given axis.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
axis |
The axis along which the image is flipped. Default: 3, indicating a horizontal flip for RGB images. |
Integer |
3 |
MlCommonsRetinaNetPreprocessor - Preprocessor for the RetinaNet model. Normalize image based on mean and standard deviation and interpolate to provided image_size.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
image_size |
Expected size to which images should be resized in [Height, Width] format; e.g., [299 299] |
list[int, int] |
(800, 800) |
mean |
The mean values for normalization. |
list[float] |
[0.485, 0.456, 0.406] |
std |
The standard deviation values for normalization. |
list[float] |
[0.229, 0.224, 0.225] |
OpenNMTPreprocessor - A preprocessor for OpenNMT that reads text data and applies required preprocessing for ONMT models.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
vocab_path |
The path to the vocabulary file to be used for processing. |
os.PathLike |
128 |
src_seq_len |
The source sequence length. |
Integer |
128 |
CLIPPreprocessor - Creates input files with image and/or text for image transformer models like ViT and CLIP. (Note: This plugin requires Pillow package version:10.0.0)
Parameters |
Description |
Type |
Default |
|---|---|---|---|
image_dimensions |
Expected processed output dimension in [Height, Width] format; e.g., [299 299] |
list[int] |
Mandatory |
image_only |
Whether to process only image tokens |
Boolean |
True |
image_input_index |
Index of the input image data in the input provided |
Integer |
0 |
NormalizeImage - Normalizes input per the given scheme; data must be of NHWC format.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
library |
Python library used to crop the given input; valid values are: numpy | torchvision |
String |
numpy |
norm |
Normalization factor, all values divided by norm |
float32 |
255.0 |
means |
Dictionary of means to be subtracted, e.g., {“R”:0.485, “G”:0.456, “B”:0.406} |
RGB dictionary |
{“R”:0, “G”:0, “B”:0} |
std |
Dictionary of std-dev for rescaling the values, e.g., {“R”:0.229, “G”:0.224, “B”:0.225} |
RGB dictionary |
{“R”:1, “G”:1, “B”:1} |
channel_order |
Channel order to specify means and std values per channel - RGB | BGR |
String |
‘RGB’ |
normalize_first |
To perform normalization before or after mean subtraction and standard deviation.
normalize_first=True means perform normalization before.
Note: torchvision library does not use this option
|
Boolean |
True |
typecasting_required |
To convert final output to numpy or not. Note: This option is specific to the Torchvision library |
Boolean |
True |
PadImage - Image padding with constant pad size or based on target dimensions
Parameters |
Description |
Type |
Default |
|---|---|---|---|
target_dimensions |
Height and width of the processed image output. e.g., [640 , 640] for ‘target-dims’ type of padding |
list[int] |
Mandatory |
pad_type |
|
String |
Mandatory |
constant_pad_size |
Size of padding for ‘constant’ type of padding |
Integer |
None |
image_position |
Parameter to specify position of image, either ‘center’ or ‘corner’ (top-left). Padding is added accordingly. Currently used for ‘target_dims’ type padding |
String |
‘center’ |
color_value |
Padding value for all planes |
Integer |
114 |
ResizeImage - Resizes an image using the specified library parameter: cv2(Default), pillow or torchvision
Parameters |
Description |
Type |
Default |
|---|---|---|---|
image_dimensions |
Height and width of the processed image output. e.g., [640 , 640] |
list[int] |
Mandatory |
library |
Python library to be used for resizing a given input; valid values are: opencv | pillow | torchvision |
String |
opencv |
channel_order |
Convert image to specified channel order. At present this parameter only takes the ‘RGB’ value |
String |
RGB |
interpolation_method |
|
String |
For opencv and torchvision: bilinear
For pillow: bicubic
|
resize_type |
|
String |
None |
resize_before_typecast |
To resize before or after conversion to target datatype e.g., fp32 |
Boolean |
True |
typecasting_required |
To convert final output to numpy or not. Note: This option is specific to the Torchvision library |
Boolean |
True |
mean |
Dictionary of means to be subtracted, e.g., {“R”:0.485, “G”:0.456, “B”:0.406}. Note: This option is specific to the Tensorflow library |
RGB dictionary |
{“R”:0, “G”:0, “B”:0} |
std |
Dictionary of std-dev for rescaling the values, e.g., {“R”:0.229, “G”:0.224, “B”:0.225}. Note: This option is specific to the Tensorflow library |
RGB dictionary |
{“R”:0, “G”:0, “B”:0} |
normalize_before_resize |
To perform normalization before or after mean subtraction and standard deviation. Note: This option is specific to the Tensorflow library |
Boolean |
False |
norm |
Normalization factor, all values divided by norm |
float32 |
255.0 |
normalize_first |
To perform normalization before or after mean subtraction and standard deviation.
normalize_first=True means perform normalization before.
Note: torchvision library does not use this option
|
Boolean |
True |
Adapter¶
ClassificationOutputAdapter - Transforms the output of a classification model into a single output (softmax only), assuming the model provides a list of outputs. Used along with TopKMetric for classification models.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
softmax_index |
The index of the softmax output in the model’s outputs. |
Integer |
0 |
BoundingBoxOutputAdapter - Transforms the bounding box output of a object detection model based on user’s inputs. It allows for conversion from (x, y, w, h) format to (x1, y1, x2, y2) format and swapping of X and Y coordinates. Used along with ObjectDetectionPostProcessor.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
xywh_to_xyxy |
Whether to convert output from box center (xywh) to box corner (xyxy) format. |
Boolean |
False |
xy_swap |
Whether to swap X and Y coordinates of bounding boxes. |
Boolean |
False |
Postprocessing memory plugins¶
SquadPostProcessor - Predicts answers for a SQuAD dataset for the given start and end scores.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
do_unpacking |
This flag is set to True if using packing strategy |
Boolean |
False |
CenterFacePostProcessor - Processes the inference outputs to parse detections and generates detections for the metric evaluation. Used for processing CenterFace face detector.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
image_dimensions |
Output dimensions of the model. Height and width; e.g., [640 , 640] |
list[int] |
Mandatory |
heatmap_threshold |
User input for minimum confidence score to consider a detection as valid. |
Float |
0.05 |
nms_threshold |
User input for nonmaximum suppression threshold for detecting multiple detections per object. |
Float |
0.3 |
CenterNetPostProcessor - Processes the inference outputs to parse detections and generate detections for metric evaluation. Used for processing CenterNet detector.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
output_dimensions |
Output dimensions of the model. Height and width; e.g., [640 , 640] |
list[int] |
Mandatory |
top_k |
Top K proposals are given from the postprocess plugin |
Integer |
100 |
num_classes |
Number of classes |
Integer |
1 |
score_threshold |
Threshold to purify the detections |
Integer |
1 |
LPRNETPostProcessor - Used for LPRNET license plate prediction.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
class_axis |
Axis along which the model output is expected. |
Integer |
-1 |
ObjectDetectionPostProcessor - Processes the inference outputs to parse detections and generate detections for metric evaluation
Parameters |
Description |
Type |
Default |
|---|---|---|---|
image_dimensions |
Output dimensions of the model. Height and width; e.g., [640 , 640] |
list[int] |
Mandatory |
type |
Type of post-processing (e.g., ‘letterbox’) |
Literal[‘letterbox’, ‘stretch’, ‘aspect_ratio’, ‘orgimage’] |
None |
label_offset |
The offset to apply to the label indices. |
Integer |
0 |
score_threshold |
Threshold limit for the detection scores |
Float |
0.001 |
mask |
Do postprocessing on mask |
Boolean |
False |
mask_dims |
Output dims of model. Provide this only if mask = True. E.g., 100,80,28,28 |
String |
None |
scale |
Comma separated scale values |
String |
‘1’ |
skip_padding |
Skip padding while rescaling to original image shape |
Boolean |
False |
OpenNMTPostprocessor - Postprocessor for OpenNMT Model with the WMT20 test dataset
Parameters |
Description |
Type |
Default |
|---|---|---|---|
sentencepiece_model_path |
The path to the SentencePiece model. |
str | os.PathLike |
Mandatory |
unrolled_count |
The count for unfolding |
Optional[Integer] |
26 |
MlCommonsRetinaNetPostProcessor - Postprocessor for MlCommons RetinaNet Model
Parameters |
Description |
Type |
Default |
|---|---|---|---|
image_dimensions |
Output dimensions of the model. Height and width; e.g., [1200 , 1200] |
list[int] |
Mandatory |
prior_boxes_file_path |
Path to the file containing prior boxes. |
os.PathLike |
Mandatory |
score_threshold |
Path to the file containing prior boxes. |
Float |
Mandatory |
nms_threshold |
Path to the file containing prior boxes. |
Float |
Mandatory |
max_detections_per_image |
Path to the file containing prior boxes. |
Integer |
Mandatory |
num_classes_in_dataset |
Path to the file containing prior boxes. |
Integer |
Mandatory |
feature_map_dimensions |
Dimensions of feature maps from FPN. |
list[int] |
Mandatory |
Metric memory plugins¶
MAP_COCOMetric - Evaluates the mAP score 50 and 50:05:95 for COCO dataset
Parameters |
Description |
Type |
Default |
|---|---|---|---|
map_80_to_90 |
Mapping of classes in range 0-80 to 0-90 |
Boolean |
False |
seg_map |
Flag to calculate mAP for mask |
Boolean |
False |
keypoint_map |
Flag to calculate mAP for keypoint |
Boolean |
False |
dataset_type |
Dataset used for evaluation. data must be one of ‘openimages’ or ‘coco’ |
String |
‘coco’ |
perplexity - Calculates the perplexity metric. Model outputs are expected to be the logits of proper shape. Ground truth data is expected to be in tokenized format and in the form of token IDs. The ground truth will be automatically generated, if using the “gpt2_tokenizer” dataset plugin.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
logits_index |
Index of the logits output if the model has multiple outputs |
Integer |
0 |
precision - Calculates the precision metric, i.e., (correct predictions / total predictions).
Parameters |
Description |
Type |
Default |
|---|---|---|---|
round |
Number of decimal places to round the result to |
Integer |
7 |
output_index |
Index of the output to be used from the data provided. |
Integer |
0 |
SquadEvaluation - Calculates F1 score and exact match scores for SQuAD dataset based on predictions and ground truth.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
tokenizer_model_name_or_path |
|
os.PathLike | str |
Mandatory |
max_answer_length |
The maximum length of an answer, after tokenization. In SQuAD v2 this was set to 30 tokens; in SQuAD v1 it was not specified so a default value of 30 was used. |
Integer |
30 |
n_best_size |
Specifies how many of the possible answers to return for a given question along with corresponding confidence scores. |
Integer |
20 |
do_lower_case |
Whether or not to lowercase all text before processing. |
Bool |
False |
squad_version |
Indicates which version of SQuAD style questions and answers we’re dealing with (“v1” or “v2”). |
Integer |
1 |
decimal_places |
Number of decimal places to round the result to |
Integer |
6 |
TopKMetric - Calculate the number of times where the correct label is among the top k predicted labels.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
k |
Top k values, e.g., 1,5 evaluates top1 and top5 |
list[int] |
[1 , 5] |
label_offset |
Offset required in the labels’ scores, e,g., if shape is 1x1001, then labels_offset=1 |
Integer |
0 |
decimal_places |
Number of decimal places to round the result to |
Integer |
7 |
WiderFaceAPMetric - Computes average precision for easy, medium, and hard cases.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
iou_threshold |
User input for IoU threshold to be used for evaluation. |
Float |
0.4 |
SDK Compatibility Verification¶
The model generated by the converter should be inferred by net-run tools from the same SDK as the converter. We can quickly check the SDK info of model.cpp/model.so by running these string grep commands:
strings model.cpp | grep qaisw
strings libqnn_model.so | grep qaisw