Tools¶
This page describes the various SDK tools and feature for Linux/Android and Windows developers. For the integration flow of different developers, please refer to Overview page for further information.
Category |
Tool |
Developer |
|||||
|---|---|---|---|---|---|---|---|
Linux/Android |
Windows |
||||||
Ubuntu |
WSL x86 |
Device |
WSL x86 |
Windows x86_64 |
Windows on Snapdragon |
||
YES |
YES |
YES |
YES |
YES** |
|||
YES |
YES |
YES |
|||||
YES |
YES |
YES |
|||||
YES |
YES |
YES |
YES |
YES** |
|||
YES |
YES |
YES |
YES |
YES |
|||
YES |
YES |
YES |
YES |
||||
YES |
YES |
YES |
|||||
YES |
YES |
YES |
YES |
YES |
YES |
||
YES |
YES |
YES |
YES |
YES |
|||
YES |
YES |
YES |
YES |
||||
YES |
|||||||
YES |
YES |
YES |
YES |
YES** |
|||
YES |
YES |
YES*** |
YES |
||||
YES |
YES |
YES*** |
YES |
||||
YES |
YES |
YES |
|||||
YES |
YES |
YES |
YES* |
YES* |
|||
YES |
|||||||
YES |
|||||||
YES |
|||||||
Note
Note
Note
Extension naming of library: For Windows developers, please replace all ‘.so’ files with the analogous ‘.dll’ file in the following sections. Please refer to Platform Differences for more details.
For more detailed information on converters please refer to Converters.
[*] libQnnGpuProfilingReader.dll is not supported on Windows platform for qnn-profile-viewer.
[**] Requires the python scripts and the executables from the Windows x86_64 binary folder(bin\x86_64-windows-msvc).
[***] Accuracy debugger on Windows x86 system is tested only for CPU runtime currently.
Model Conversion¶
qnn-tensorflow-converter¶
The qnn-tensorflow-converter tool converts a model from the TensorFlow framework to a CPP file representing the model as a series of QNN API calls. Additionally, a binary file containing static weights of the model is produced.
usage: qnn-tensorflow-converter -d INPUT_NAME INPUT_DIM --out_node OUT_NAMES
[--input_type INPUT_NAME INPUT_TYPE]
[--input_dtype INPUT_NAME INPUT_DTYPE] [--input_encoding ...]
[--input_layout INPUT_NAME INPUT_LAYOUT] [--custom_io CUSTOM_IO]
[--show_unconsumed_nodes] [--saved_model_tag SAVED_MODEL_TAG]
[--saved_model_signature_key SAVED_MODEL_SIGNATURE_KEY]
[--quantization_overrides QUANTIZATION_OVERRIDES]
[--keep_quant_nodes] [--disable_batchnorm_folding]
[--expand_lstm_op_structure]
[--keep_disconnected_nodes] [--input_list INPUT_LIST]
[--param_quantizer PARAM_QUANTIZER] [--act_quantizer ACT_QUANTIZER]
[--algorithms ALGORITHMS [ALGORITHMS ...]]
[--bias_bitwidth BIAS_BITWIDTH] [--bias_bw BIAS_BW]
[--act_bitwidth ACT_BITWIDTH] [--act_bw ACT_BW]
[--weights_bitwidth WEIGHTS_BITWIDTH] [--weight_bw WEIGHT_BW]
[--float_bias_bitwidth FLOAT_BIAS_BITWIDTH] [--ignore_encodings]
[--use_per_channel_quantization] [--use_per_row_quantization]
[--float_fallback] [--use_native_input_files] [--use_native_dtype]
[--use_native_output_files] [--disable_relu_squashing]
[--restrict_quantization_steps ENCODING_MIN, ENCODING_MAX]
--input_network INPUT_NETWORK [--debug [DEBUG]]
[-o OUTPUT_PATH] [--copyright_file COPYRIGHT_FILE]
[--float_bitwidth FLOAT_BITWIDTH] [--float_bw FLOAT_BW]
[--float_bias_bw FLOAT_BIAS_BW] [--overwrite_model_prefix]
[--exclude_named_tensors] [--op_package_lib OP_PACKAGE_LIB]
[--converter_op_package_lib CONVERTER_OP_PACKAGE_LIB]
[-p PACKAGE_NAME | --op_package_config CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]]
[-h] [--arch_checker]
Script to convert TF model into QNN
required arguments:
-d INPUT_NAME INPUT_DIM, --input_dim INPUT_NAME INPUT_DIM
The names and dimensions of the network input layers specified in the format
[input_name comma-separated-dimensions], for example:
'data' 1,224,224,3
Note that the quotes should always be included in order to
handlespecial characters, spaces, etc.
For multiple inputs specify multiple --input_dim on the command line like:
--input_dim 'data1' 1,224,224,3 --input_dim 'data2' 1,50,100,3
--out_node OUT_NODE, --out_name OUT_NAMES
Name of the graph's output nodes. Multiple output nodes should be
provided separately like:
--out_node out_1 --out_node out_2
--input_network INPUT_NETWORK, -i INPUT_NETWORK
Path to the source framework model.
optional arguments:
--input_type INPUT_NAME INPUT_TYPE, -t INPUT_NAME INPUT_TYPE
Type of data expected by each input op/layer. Type for each input is
|default| if not specified. For example: "data" image.Note that the quotes
should always be included in order to handle special characters, spaces,etc.
For multiple inputs specify multiple --input_type on the command line.
Eg:
--input_type "data1" image --input_type "data2" opaque
These options get used by DSP runtime and following descriptions state how
input will be handled for each option.
Image:
Input is float between 0-255 and the input's mean is 0.0f and the input's
max is 255.0f. We will cast the float to uint8ts and pass the uint8ts to the
DSP.
Default:
Pass the input as floats to the dsp directly and the DSP will quantize it.
Opaque:
Assumes input is float because the consumer layer(i.e next layer) requires
it as float, therefore it won't be quantized.
Choices supported:
image
default
opaque
--input_dtype INPUT_NAME INPUT_DTYPE
The names and datatype of the network input layers specified in the format
[input_name datatype], for example:
'data' 'float32'.
Default is float32 if not specified.
Note that the quotes should always be included in order to handle special
characters, spaces, etc.
For multiple inputs specify multiple --input_dtype on the command line like:
--input_dtype 'data1' 'float32' --input_dtype 'data2' 'float32'
--input_encoding INPUT_ENCODING [INPUT_ENCODING ...], -e INPUT_ENCODING [INPUT_ENCODING ...]
Usage: --input_encoding "INPUT_NAME" INPUT_ENCODING_IN
[INPUT_ENCODING_OUT]
Input encoding of the network inputs. Default is bgr.
e.g.
--input_encoding "data" rgba
Quotes must wrap the input node name to handle special characters,
spaces, etc. To specify encodings for multiple inputs, invoke
--input_encoding for each one.
e.g.
--input_encoding "data1" rgba --input_encoding "data2" other
Optionally, an output encoding may be specified for an input node by
providing a second encoding. The default output encoding is bgr.
e.g.
--input_encoding "data3" rgba rgb
Input encoding types:
image color encodings: bgr,rgb, nv21, nv12, ...
time_series: for inputs of rnn models;
other: not available above or is unknown.
Supported encodings:
bgr
rgb
rgba
argb32
nv21
nv12
time_series
other
--input_layout INPUT_NAME INPUT_LAYOUT, -l INPUT_NAME INPUT_LAYOUT
Layout of each input tensor. If not specified, it will use the default
based on the Source Framework, shape of input and input encoding.
Accepted values are-
NCDHW, NDHWC, NCHW, NHWC, NFC, NCF, NTF, TNF, NF, NC, F, NONTRIVIAL
N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature, T = Time
NDHWC/NCDHW used for 5d inputs
NHWC/NCHW used for 4d image-like inputs
NFC/NCF used for inputs to Conv1D or other 1D ops
NTF/TNF used for inputs with time steps like the ones used for LSTM op
NF used for 2D inputs, like the inputs to Dense/FullyConnected layers
NC used for 2D inputs with 1 for batch and other for Channels (rarely used)
F used for 1D inputs, e.g. Bias tensor
NONTRIVIAL for everything elseFor multiple inputs specify multiple
--input_layout on the command line.
Eg:
--input_layout "data1" NCHW --input_layout "data2" NCHW
--custom_io CUSTOM_IO
Use this option to specify a yaml file for custom IO
--show_unconsumed_nodes
Displays a list of unconsumed nodes, if there any are found. Nodes which are
unconsumed do not violate the structural fidelity of thegenerated graph.
--saved_model_tag SAVED_MODEL_TAG
Specify the tag to seletet a MetaGraph from savedmodel. ex:
--saved_model_tag serve. Default value will be 'serve' when it is not
assigned.
--saved_model_signature_key SAVED_MODEL_SIGNATURE_KEY
Specify signature key to select input and output of the model. ex:
--saved_model_signature_key serving_default. Default value will be
'serving_default' when it is not assigned
--disable_batchnorm_folding
--expand_lstm_op_structure
Enables optimization that breaks the LSTM op to equivalent math ops
--keep_disconnected_nodes
Disable Optimization that removes Ops not connected to the main graph.
This optimization uses output names provided over commandline OR
inputs/outputs extracted from the Source model to determine the main graph
--debug [DEBUG] Run the converter in debug mode.
-o OUTPUT_PATH, --output_path OUTPUT_PATH
Path where the converted Output model should be saved.If not specified, the
converter model will be written to a file with same name as the input model
--copyright_file COPYRIGHT_FILE
Path to copyright file. If provided, the content of the file will be added
to the output model.
--float_bitwidth FLOAT_BITWIDTH
Selects the bitwidth to use when using float for parameters (weights/bias)
and activations for all ops or a specific op (via encodings) selected
through encoding; 32 (default) or 16.
--float_bw FLOAT_BW Deprecated; use --float_bitwidth.
--float_bias_bw FLOAT_BIAS_BW
Deprecated; use --float_bias_bitwidth.
--overwrite_model_prefix
If option passed, model generator will use the output path name to use as
model prefix to name functions in <qnn_model_name>.cpp. (Useful for running
multiple models at once) eg: ModelName_composeGraphs. Default is to use
generic "QnnModel_".
--exclude_named_tensors
Remove using source framework tensorNames; instead use a counter for naming
tensors. Note: This can potentially help to reduce the final model library
that will be generated(Recommended for deploying model). Default is False.
-h, --help show this help message and exit
Quantizer Options:
--quantization_overrides QUANTIZATION_OVERRIDES
Use this option to specify a json file with parameters to use for
quantization. These will override any quantization data carried from
conversion (eg TF fake quantization) or calculated during the normal
quantization process. Format defined as per AIMET specification.
--keep_quant_nodes Use this option to keep activation quantization nodes in the graph rather
than stripping them.
--input_list INPUT_LIST
Path to a file specifying the input data. This file should be a plain text
file, containing one or more absolute file paths per line. Each path is
expected to point to a binary file containing one input in the "raw" format,
ready to be consumed by the quantizer without any further preprocessing.
Multiple files per line separated by spaces indicate multiple inputs to the
network. See documentation for more details. Must be specified for
quantization. All subsequent quantization options are ignored when this is
not provided.
--param_quantizer PARAM_QUANTIZER
Optional parameter to indicate the weight/bias quantizer to use. Must be followed by one of the following options:
"tf": Uses the real min/max of the data and specified bitwidth (default).
"enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
"adjusted": Deprecated.
"symmetric": Ensures min and max have the same absolute values about zero.
Data will be stored as int#_t data such that the offset is always 0.
--act_quantizer ACT_QUANTIZER
Optional parameter to indicate the activation quantizer to use. Must be followed by one of the following options:
"tf": Uses the real min/max of the data and specified bitwidth (default).
"enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
"adjusted": Deprecated.
"symmetric": Ensures min and max have the same absolute values about zero.
Data will be stored as int#_t data such that the offset is always 0.
--algorithms ALGORITHMS [ALGORITHMS ...]
Use this option to enable new optimization algorithms. Usage is:
--algorithms <algo_name1> ... The available optimization algorithms are:
"cle" - Cross layer equalization includes a number of methods for equalizing
weights and biases across layers in order to rectify imbalances that cause
quantization errors.
--bias_bitwidth BIAS_BITWIDTH
Selects the bitwidth to use when quantizing the biases; 8 (default) or 32.
--bias_bw BIAS_BW Deprecated; use --bias_bitwidth.
--act_bitwidth ACT_BITWIDTH
Selects the bitwidth to use when quantizing the activations; 8 (default) or 16.
--act_bw ACT_BW Deprecated; use --act_bitwidth.
--weights_bitwidth WEIGHTS_BITWIDTH
Selects the bitwidth to use when quantizing the weights; 4 or 8 (default).
--weight_bw WEIGHT_BW
Deprecated; use --weights_bitwidth.
--float_bias_bitwidth FLOAT_BIAS_BITWIDTH
Selects the bitwidth to use when biases are in float; 32 or 16.
--ignore_encodings Use only quantizer generated encodings, ignoring any user or model provided
encodings.
Note: Cannot use --ignore_encodings with --quantization_overrides
--use_per_channel_quantization
Enables per-channel quantization for convolution-based op weights.
This replaces the built-in model QAT encodings when used for a given weight.
--use_per_row_quantization
Enables row wise quantization of Matmul and FullyConnected ops.
--float_fallback Enables fallback to floating point (FP) instead of fixed point.
This option can be paired with --float_bitwidth to indicate the bitwidth for FP (by default 32).
If this option is enabled, --input_list must not be provided and --ignore_encodings must not be provided.
The external quantization encodings (encoding file/FakeQuant encodings) might be missing
quantization parameters for some interim tensors. First it will try to fill the gaps by
propagating across math-invariant functions. If the quantization params are still missing,
it applies fallback to nodes to floating point.
--use_native_input_files
Boolean flag to indicate how to read input files:
1. float (default): reads inputs as floats and quantizes if necessary based
on quantization parameters in the model.
2. native: reads inputs assuming the data type to be native to the
model. For ex., uint8_t.
--use_native_dtype Note: This option is deprecated, use --use_native_input_files option in
future.
Boolean flag to indicate how to read input files:
1. float (default): reads inputs as floats and quantizes if necessary based
on quantization parameters in the model.
2. native: reads inputs assuming the data type to be native to the
model. For ex., uint8_t.
--use_native_output_files
Use this option to indicate the data type of the output files
1. float (default): output the file as floats.
2. native: outputs the file that is native to the model. For ex.,
uint8_t.
--disable_relu_squashing
Disables squashing of ReLU against convolution-based ops for quantized models.
--restrict_quantization_steps ENCODING_MIN, ENCODING_MAX
Specifies the number of steps to use for computing quantization encodings
such that scale = (max - min) / number of quantization steps.
The option should be passed as a space separated pair of hexadecimal string
minimum and maximum values. i.e. --restrict_quantization_steps "MIN MAX".
Please note that this is a hexadecimal string literal and not a signed
integer, to supply a negative value an explicit minus sign is required.
E.g.--restrict_quantization_steps "-0x80 0x7F" indicates an example 8 bit range,
--restrict_quantization_steps "-0x8000 0x7F7F" indicates an example 16
bit range. This argument is required for 16-bit Matmul operations.
Custom Op Package Options:
--op_package_lib OP_PACKAGE_LIB, -opl OP_PACKAGE_LIB
Use this argument to pass an op package library for quantization. Must be in
the form
<op_package_lib_path:interfaceProviderName> and be separated by a
comma for multiple package libs
-p PACKAGE_NAME, --package_name PACKAGE_NAME
A global package name to be used for each node in the Model.cpp file.
Defaults to Qnn header defined package name
--converter_op_package_lib CONVERTER_OP_PACKAGE_LIB, -cpl CONVERTER_OP_PACKAGE_LIB
Absolute path to converter op package library compiled by the OpPackage
generator. Must be separated by a comma for multiple package libraries.
Note: Libraries must follow the same order as the xml files.
E.g.1: --converter_op_package_lib absolute_path_to/libExample.so
E.g.2: -cpl absolute_path_to/libExample1.so,absolute_path_to/libExample2.so
--op_package_config OP_PACKAGE_CONFIG [OP_PACKAGE_CONFIG ...], -opc OP_PACKAGE_CONFIG [OP_PACKAGE_CONFIG ...]
Path to a Qnn Op Package XML configuration file that contains user defined
custom operations.
Architecture Checker Options(Experimental):
--arch_checker Note: This option will be soon deprecated. Use the qnn-architecture-checker tool to achieve the same result.
Note: Only one of: {'package_name', 'op_package_config'} can be specified
Basic command line usage looks like:
$ qnn-tensorflow-converter -i <path>/frozen_graph.pb
-d <network_input_name> <dims>
--out_node <network_output_name>
-o <optional_output_path>
--allow_unconsumed_nodes # optional, but most likely will be need for larger models
-p <optional_package_name> # Defaults to "qti.aisw"
qnn-tflite-converter¶
The qnn-tflite-converter tool converts a TFLite model to a CPP file representing the model as a series of QNN API calls. Additionally, a binary file containing static weights of the model is produced.
usage: qnn-tflite-converter [-d INPUT_NAME INPUT_DIM] [--signature_name SIGNATURE_NAME]
[--out_node OUT_NAMES] [--input_type INPUT_NAME INPUT_TYPE]
[--input_dtype INPUT_NAME INPUT_DTYPE] [--input_encoding ...]
[--input_layout INPUT_NAME INPUT_LAYOUT] [--custom_io CUSTOM_IO]
[--dump_relay DUMP_RELAY]
[--quantization_overrides QUANTIZATION_OVERRIDES] [--keep_quant_nodes]
[--disable_batchnorm_folding] [--expand_lstm_op_structure]
[--keep_disconnected_nodes]
[--input_list INPUT_LIST] [--param_quantizer PARAM_QUANTIZER]
[--act_quantizer ACT_QUANTIZER]
[--algorithms ALGORITHMS [ALGORITHMS ...]]
[--bias_bitwidth BIAS_BITWIDTH] [--bias_bw BIAS_BW]
[--act_bitwidth ACT_BITWIDTH] [--act_bw ACT_BW]
[--weights_bitwidth WEIGHTS_BITWIDTH] [--weight_bw WEIGHT_BW]
[--float_bias_bitwidth FLOAT_BIAS_BITWIDTH] [--ignore_encodings]
[--use_per_channel_quantization] [--use_per_row_quantization]
[--float_fallback] [--use_native_input_files] [--use_native_dtype]
[--use_native_output_files] [--disable_relu_squashing]
[--restrict_quantization_steps ENCODING_MIN, ENCODING_MAX]
--input_network INPUT_NETWORK [--debug [DEBUG]]
[-o OUTPUT_PATH] [--copyright_file COPYRIGHT_FILE]
[--float_bitwidth FLOAT_BITWIDTH] [--float_bw FLOAT_BW]
[--float_bias_bw FLOAT_BIAS_BW] [--overwrite_model_prefix]
[--exclude_named_tensors] [--op_package_lib OP_PACKAGE_LIB]
[--converter_op_package_lib CONVERTER_OP_PACKAGE_LIB]
[-p PACKAGE_NAME | --op_package_config CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]]
[-h] [--arch_checker]
Script to convert TFLite model into QNN
required arguments:
--input_network INPUT_NETWORK, -i INPUT_NETWORK
Path to the source framework model.
optional arguments:
-d INPUT_NAME INPUT_DIM, --input_dim INPUT_NAME INPUT_DIM
The names and dimensions of the network input layers specified in the format
[input_name comma-separated-dimensions], for example:
'data' 1,224,224,3
Note that the quotes should always be included in order to handle special
characters, spaces, etc.
For multiple inputs specify multiple --input_dim on the command line like:
--input_dim 'data1' 1,224,224,3 --input_dim 'data2' 1,50,100,3
--signature_name SIGNATURE_NAME, -sn SIGNATURE_NAME
Specifies a specific subgraph signature to convert.
--out_node OUT_NAMES, --out_name OUT_NAMES
Name of the graph's output Tensor Names. Multiple output names should be
provided separately like:
--out_name out_1 --out_name out_2
--input_type INPUT_NAME INPUT_TYPE, -t INPUT_NAME INPUT_TYPE
Type of data expected by each input op/layer. Type for each input is
|default| if not specified. For example: "data" image.Note that the quotes
should always be included in order to handle special characters, spaces,etc.
For multiple inputs specify multiple --input_type on the command line.
Eg:
--input_type "data1" image --input_type "data2" opaque
These options get used by DSP runtime and following descriptions state how
input will be handled for each option.
Image:
Input is float between 0-255 and the input's mean is 0.0f and the input's
max is 255.0f. We will cast the float to uint8ts and pass the uint8ts to the
DSP.
Default:
Pass the input as floats to the dsp directly and the DSP will quantize it.
Opaque:
Assumes input is float because the consumer layer(i.e next layer) requires
it as float, therefore it won't be quantized.
Choices supported:
image
default
opaque
--input_dtype INPUT_NAME INPUT_DTYPE
The names and datatype of the network input layers specified in the format
[input_name datatype], for example:
'data' 'float32'
Default is float32 if not specified
Note that the quotes should always be included in order to handlespecial
characters, spaces, etc.
For multiple inputs specify multiple --input_dtype on the command line like:
--input_dtype 'data1' 'float32' --input_dtype 'data2' 'float32'
--input_encoding INPUT_ENCODING [INPUT_ENCODING ...], -e INPUT_ENCODING [INPUT_ENCODING ...]
Usage: --input_encoding "INPUT_NAME" INPUT_ENCODING_IN
[INPUT_ENCODING_OUT]
Input encoding of the network inputs. Default is bgr.
e.g.
--input_encoding "data" rgba
Quotes must wrap the input node name to handle special characters,
spaces, etc. To specify encodings for multiple inputs, invoke
--input_encoding for each one.
e.g.
--input_encoding "data1" rgba --input_encoding "data2" other
Optionally, an output encoding may be specified for an input node by
providing a second encoding. The default output encoding is bgr.
e.g.
--input_encoding "data3" rgba rgb
Input encoding types:
image color encodings: bgr,rgb, nv21, nv12, ...
time_series: for inputs of rnn models;
other: not available above or is unknown.
Supported encodings:
bgr
rgb
rgba
argb32
nv21
nv12
time_series
other
--input_layout INPUT_NAME INPUT_LAYOUT, -l INPUT_NAME INPUT_LAYOUT
Layout of each input tensor. If not specified, it will use the default
based on the Source Framework, shape of input and input encoding.
Accepted values are-
NCDHW, NDHWC, NCHW, NHWC, NFC, NCF, NTF, TNF, NF, NC, F, NONTRIVIAL
N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature, T = Time
NDHWC/NCDHW used for 5d inputs
NHWC/NCHW used for 4d image-like inputs
NFC/NCF used for inputs to Conv1D or other 1D ops
NTF/TNF used for inputs with time steps like the ones used for LSTM op
NF used for 2D inputs, like the inputs to Dense/FullyConnected layers
NC used for 2D inputs with 1 for batch and other for Channels (rarely used)
F used for 1D inputs, e.g. Bias tensor
NONTRIVIAL for everything elseFor multiple inputs specify multiple
--input_layout on the command line.
Eg:
--input_layout "data1" NCHW --input_layout "data2" NCHW
--custom_io CUSTOM_IO
Use this option to specify a yaml file for custom IO.
--dump_relay DUMP_RELAY
Dump Relay ASM and Params at the path provided with the argument
Usage: --dump_relay <path_to_dump>
--show_unconsumed_nodes
Displays a list of unconsumed nodes, if there any are
found. Nodes which are unconsumed do not violate the
structural fidelity of the generated graph.
--disable_batchnorm_folding
--expand_lstm_op_structure
Enables optimization that breaks the LSTM op to equivalent math ops
--keep_disconnected_nodes
Disable Optimization that removes Ops not connected to the main graph.
This optimization uses output names provided over commandline OR
inputs/outputs extracted from the Source model to determine the main graph
-o OUTPUT_PATH, --output_path OUTPUT_PATH
Path where the converted Output model should be saved.If not specified, the
converter model will be written to a file with same name as the input model
--copyright_file COPYRIGHT_FILE
Path to copyright file. If provided, the content of the file will be added
to the output model.
--float_bitwidth FLOAT_BITWIDTH
Selects the bitwidth to use when using float for parameters (weights/bias)
and activations for all ops or a specific op (via encodings) selected
through encoding; 32 (default) or 16.
--float_bw FLOAT_BW Deprecated; use --float_bitwidth.
--float_bias_bw FLOAT_BIAS_BW
Deprecated; use --float_bias_bitwidth.
--overwrite_model_prefix
If option passed, model generator will use the output path name to use as
model prefix to name functions in <qnn_model_name>.cpp. (Useful for running
multiple models at once) eg: ModelName_composeGraphs. Default is to use
generic "QnnModel_".
--exclude_named_tensors
Remove using source framework tensorNames; instead use a counter for naming
tensors. Note: This can potentially help to reduce the final model library
that will be generated(Recommended for deploying model). Default is False.
-h, --help show this help message and exit
Quantizer Options:
--quantization_overrides QUANTIZATION_OVERRIDES
Use this option to specify a json file with parameters to use for
quantization. These will override any quantization data carried from
conversion (eg TF fake quantization) or calculated during the normal
quantization process. Format defined as per AIMET specification.
--keep_quant_nodes Use this option to keep activation quantization nodes in the graph rather
than stripping them.
--input_list INPUT_LIST
Path to a file specifying the input data. This file should be a plain text
file, containing one or more absolute file paths per line. Each path is
expected to point to a binary file containing one input in the "raw" format,
ready to be consumed by the quantizer without any further preprocessing.
Multiple files per line separated by spaces indicate multiple inputs to the
network. See documentation for more details. Must be specified for
quantization. All subsequent quantization options are ignored when this is
not provided.
--param_quantizer PARAM_QUANTIZER
Optional parameter to indicate the weight/bias quantizer to use. Must be followed by one of the following options:
"tf": Uses the real min/max of the data and specified bitwidth (default).
"enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
"adjusted": Deprecated.
"symmetric": Ensures min and max have the same absolute values about zero.
Data will be stored as int#_t data such that the offset is always 0.
--act_quantizer ACT_QUANTIZER
Optional parameter to indicate the activation quantizer to use. Must be followed by one of the following options:
"tf": Uses the real min/max of the data and specified bitwidth (default).
"enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
"adjusted": Deprecated.
"symmetric": Ensures min and max have the same absolute values about zero.
Data will be stored as int#_t data such that the offset is always 0.
--algorithms ALGORITHMS [ALGORITHMS ...]
Use this option to enable new optimization algorithms. Usage is:
--algorithms <algo_name1> ... The available optimization algorithms are:
"cle" - Cross layer equalization includes a number of methods for equalizing
weights and biases across layers in order to rectify imbalances that cause
quantization errors.
--bias_bitwidth BIAS_BITWIDTH
Selects the bitwidth to use when quantizing the biases; 8 (default) or 32.
--bias_bw BIAS_BW Deprecated; use --bias_bitwidth.
--act_bitwidth ACT_BITWIDTH
Selects the bitwidth to use when quantizing the activations; 8 (default) or 16.
--act_bw ACT_BW Deprecated; use --act_bitwidth.
--weights_bitwidth WEIGHTS_BITWIDTH
Selects the bitwidth to use when quantizing the weights; 4 or 8 (default).
--weight_bw WEIGHT_BW
Deprecated; use --weights_bitwidth.
--float_bias_bitwidth FLOAT_BIAS_BITWIDTH
Selects the bitwidth to use when biases are in float; 32 or 16.
--ignore_encodings Use only quantizer generated encodings, ignoring any user or model provided
encodings.
Note: Cannot use --ignore_encodings with --quantization_overrides
--use_per_channel_quantization
Enables per-channel quantization for convolution-based op weights.
This replaces the built-in model QAT encodings when used for a given weight.
--use_per_row_quantization
Enables row wise quantization of Matmul and FullyConnected ops.
--float_fallback Enables fallback to floating point (FP) instead of fixed point.
This option can be paired with --float_bitwidth to indicate the bitwidth for FP (by default 32).
If this option is enabled, --input_list must not be provided and --ignore_encodings must not be provided.
The external quantization encodings (encoding file/FakeQuant encodings) might be missing
quantization parameters for some interim tensors. First it will try to fill the gaps by
propagating across math-invariant functions. If the quantization params are still missing,
it applies fallback to nodes to floating point.
--use_native_input_files
Boolean flag to indicate how to read input files:
1. float (default): reads inputs as floats and quantizes if necessary based
on quantization parameters in the model.
2. native: reads inputs assuming the data type to be native to the
model. For ex., uint8_t.
--use_native_dtype Note: This option is deprecated, use --use_native_input_files option in
future.
Boolean flag to indicate how to read input files:
1. float (default): reads inputs as floats and quantizes if necessary based
on quantization parameters in the model.
2. native: reads inputs assuming the data type to be native to the
model. For ex., uint8_t.
--use_native_output_files
Use this option to indicate the data type of the output files
1. float (default): output the file as floats.
2. native: outputs the file that is native to the model. For ex.,
uint8_t.
--disable_relu_squashing
Disables squashing of ReLU against convolution-based ops for quantized models.
--restrict_quantization_steps ENCODING_MIN, ENCODING_MAX
Specifies the number of steps to use for computing quantization encodings
such that scale = (max - min) / number of quantization steps.
The option should be passed as a space separated pair of hexadecimal string
minimum and maximum values. i.e. --restrict_quantization_steps "MIN MAX".
Please note that this is a hexadecimal string literal and not a signed
integer, to supply a negative value an explicit minus sign is required.
E.g.--restrict_quantization_steps "-0x80 0x7F" indicates an example 8 bit range,
--restrict_quantization_steps "-0x8000 0x7F7F" indicates an example 16
bit range.
Custom Op Package Options:
--op_package_lib OP_PACKAGE_LIB, -opl OP_PACKAGE_LIB
Use this argument to pass an op package library for quantization. Must be in
the form <op_package_lib_path:interfaceProviderName> and be separated by a
comma for multiple package libs
--converter_op_package_lib CONVERTER_OP_PACKAGE_LIB, -cpl CONVERTER_OP_PACKAGE_LIB
Absolute path to converter op package library compiled by the OpPackage
generator. Must be separated by a comma for multiple package libraries.
Note: Libraries must follow the same order as the xml files.
E.g.1: --converter_op_package_lib absolute_path_to/libExample.so
E.g.2: -cpl absolute_path_to/libExample1.so,absolute_path_to/libExample2.so
-p PACKAGE_NAME, --package_name PACKAGE_NAME
A global package name to be used for each node in the Model.cpp file.
Defaults to Qnn header defined package name
--op_package_config OP_PACKAGE_CONFIG [OP_PACKAGE_CONFIG ...], -opc OP_PACKAGE_CONFIG [OP_PACKAGE_CONFIG ...]
Path to a Qnn Op Package XML configuration file that contains user defined
custom operations.
Architecture Checker Options(Experimental):
--arch_checker Note: This option will be soon deprecated. Use the qnn-architecture-checker tool to achieve the same result.
Note: Only one of: {'package_name', 'op_package_config'} can be specified
Basic command line usage looks like:
$ qnn-tflite-converter -i <path>/model.tflite
-d <optional_network_input_name> <dims>
-o <optional_output_path>
-p <optional_package_name> # Defaults to "qti.aisw"
qnn-pytorch-converter¶
The qnn-pytorch-converter tool converts a PyTorch model to a CPP file representing the model as a series of QNN API calls. Additionally, a binary file containing static weights of the model is produced.
usage: qnn-pytorch-converter -d INPUT_NAME INPUT_DIM [--out_node OUT_NAMES]
[--input_type INPUT_NAME INPUT_TYPE]
[--input_dtype INPUT_NAME INPUT_DTYPE] [--input_encoding ...]
[--input_layout INPUT_NAME INPUT_LAYOUT] [--custom_io CUSTOM_IO]
[--preserve_io [PRESERVE_IO [PRESERVE_IO ...]]]
[--dump_relay DUMP_RELAY] [--dry_run] [--dump_out_names]
[--pytorch_custom_op_lib PYTORCH_CUSTOM_OP_LIB]
[--quantization_overrides QUANTIZATION_OVERRIDES] [--keep_quant_nodes]
[--disable_batchnorm_folding] [--expand_lstm_op_structure]
[--keep_disconnected_nodes]
[--input_list INPUT_LIST] [--param_quantizer PARAM_QUANTIZER]
[--act_quantizer ACT_QUANTIZER]
[--algorithms ALGORITHMS [ALGORITHMS ...]]
[--bias_bitwidth BIAS_BITWIDTH] [--bias_bw BIAS_BW]
[--act_bitwidth ACT_BITWIDTH] [--act_bw ACT_BW]
[--weights_bitwidth WEIGHTS_BITWIDTH] [--weight_bw WEIGHT_BW]
[--float_bias_bitwidth FLOAT_BIAS_BITWIDTH] [--ignore_encodings]
[--use_per_channel_quantization] [--use_per_row_quantization]
[--float_fallback] [--use_native_input_files] [--use_native_dtype]
[--use_native_output_files] [--disable_relu_squashing]
[--restrict_quantization_steps ENCODING_MIN, ENCODING_MAX]
--input_network INPUT_NETWORK [--debug [DEBUG]]
[-o OUTPUT_PATH] [--copyright_file COPYRIGHT_FILE]
[--float_bitwidth FLOAT_BITWIDTH] [--float_bw FLOAT_BW]
[--float_bias_bw FLOAT_BIAS_BW] [--overwrite_model_prefix]
[--exclude_named_tensors] [--op_package_lib OP_PACKAGE_LIB]
[--converter_op_package_lib CONVERTER_OP_PACKAGE_LIB]
[-p PACKAGE_NAME | --op_package_config CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]]
[-h] [--arch_checker]
Script to convert PyTorch model into QNN
required arguments:
-d INPUT_NAME INPUT_DIM, --input_dim INPUT_NAME INPUT_DIM
The names and dimensions of the network input layers specified in the format
[input_name comma-separated-
dimensions], for example:
'data' 1,3,224,224
Note that the quotes should always be included in order to handle special
characters, spaces, etc.
For multiple inputs specify multiple --input_dim on the command line like:
--input_dim 'data1' 1,3,224,224 --input_dim 'data2' 1,50,100,3
--input_network INPUT_NETWORK, -i INPUT_NETWORK
Path to the source framework model.
optional arguments:
--out_node OUT_NAMES, --out_name OUT_NAMES
Name of the graph's output Tensor Names. Multiple output names should be
provided separately like:
--out_name out_1 --out_name out_2
--input_type INPUT_NAME INPUT_TYPE, -t INPUT_NAME INPUT_TYPE
Type of data expected by each input op/layer. Type for each input is
|default| if not specified. For example: "data" image.Note that the quotes
should always be included in order to handle special characters, spaces, etc.
For multiple inputs specify multiple --input_type on the command line.
Eg:
--input_type "data1" image --input_type "data2" opaque
These options get used by DSP runtime and following descriptions state how
input will be handled for each option.
Image:
Input is float between 0-255 and the input's mean is 0.0f and the input's
max is 255.0f. We will cast the float to uint8ts and pass the uint8ts to the
DSP.
Default:
Pass the input as floats to the dsp directly and the DSP will quantize it.
Opaque:
Assumes
input is float because the consumer layer(i.e next layer) requires
it as float, therefore it won't be quantized.
Choices supported:
image
default
opaque
--input_dtype INPUT_NAME INPUT_DTYPE
The names and datatype of the network input layers specified in the format
[input_name datatype], for example:
'data' 'float32'
Default is float32 if not specified
Note that the quotes should always be included in order to handlespecial
characters, spaces, etc.
For multiple inputs specify multiple --input_dtype on the command line like:
--input_dtype 'data1' 'float32' --input_dtype 'data2' 'float32'
--input_encoding INPUT_ENCODING [INPUT_ENCODING ...], -e INPUT_ENCODING [INPUT_ENCODING ...]
Usage: --input_encoding "INPUT_NAME" INPUT_ENCODING_IN
[INPUT_ENCODING_OUT]
Input encoding of the network inputs. Default is bgr.
e.g.
--input_encoding "data" rgba
Quotes must wrap the input node name to handle special characters,
spaces, etc. To specify encodings for multiple inputs, invoke
--input_encoding for each one.
e.g.
--input_encoding "data1" rgba --input_encoding "data2" other
Optionally, an output encoding may be specified for an input node by
providing a second encoding. The default output encoding is bgr.
e.g.
--input_encoding "data3" rgba rgb
Input encoding types:
image color encodings: bgr,rgb, nv21, nv12, ...
time_series: for inputs of rnn models;
other: not available above or is unknown.
Supported encodings:
bgr
rgb
rgba
argb32
nv21
nv12
time_series
other
--input_layout INPUT_NAME INPUT_LAYOUT, -l INPUT_NAME INPUT_LAYOUT
Layout of each input tensor. If not specified, it will use the default
based on the Source Framework, shape of input and input encoding.
Accepted values are-
NCDHW, NDHWC, NCHW, NHWC, NFC, NCF, NTF, TNF, NF, NC, F, NONTRIVIAL
N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature, T = Time
NDHWC/NCDHW used for 5d inputs
NHWC/NCHW used for 4d image-like inputs
NFC/NCF used for inputs to Conv1D or other 1D ops
NTF/TNF used for inputs with time steps like the ones used for LSTM op
NF used for 2D inputs, like the inputs to Dense/FullyConnected layers
NC used for 2D inputs with 1 for batch and other for Channels (rarely used)
F used for 1D inputs, e.g. Bias tensor
NONTRIVIAL for everything elseFor multiple inputs specify multiple
--input_layout on the command line.
Eg:
--input_layout "data1" NCHW --input_layout "data2" NCHW
--custom_io CUSTOM_IO
Use this option to specify a yaml file for custom IO.
--preserve_io [PRESERVE_IO [PRESERVE_IO ...]]
Use this option to preserve IO layout and datatype. The different ways of
using this option are as follows:
--preserve_io layout <space separated list of names of inputs and
outputs of the graph>
--preserve_io datatype <space separated list of names of inputs and
outputs of the graph>
In this case, user should also specify the string - layout or datatype in
the command to indicate that converter needs to
preserve the layout or datatype. e.g.
--preserve_io layout input1 input2 output1
--preserve_io datatype input1 input2 output1
Optionally, the user may choose to preserve the layout and/or datatype for
all the inputs and outputs of the graph.
This can be done in the following two ways:
--preserve_io layout
--preserve_io datatype
Additionally, the user may choose to preserve both layout and datatypes for
all IO tensors by just passing the option as follows:
--preserve_io
Note: Only one of the above usages are allowed at a time.
Note: --custom_io gets higher precedence than --preserve_io.
--dump_relay DUMP_RELAY
Dump Relay ASM and Params at the path provided with the argument
Usage: --dump_relay <path_to_dump>
--dry_run Evaluates the model without actually converting any ops, and
returns unsupported ops if any.
--dump_out_names Dump output names mapped from QNN CPP stored names to converter used
names and save to file 'model_output_names.json'.
--pytorch_custom_op_lib PYTORCH_CUSTOM_OP_LIB, -pcl PYTORCH_CUSTOM_OP_LIB
Absolute path to the PyTorch library containing the custom op definition.
Multiple custom op libraries must be comma-separated.
For PyTorch custom op details, refer to:
https://pytorch.org/tutorials/advanced/torch_script_custom_ops.html
For custom C++ extension details, refer to:
https://pytorch.org/tutorials/advanced/cpp_extension.html
Eg. 1: --pytorch_custom_op_lib absolute_path_to/Example.so
Eg. 2: -pcl absolute_path_to/Example1.so,absolute_path_to/Example2.so
--disable_batchnorm_folding
--expand_lstm_op_structure
Enables optimization that breaks the LSTM op to equivalent math ops
--keep_disconnected_nodes
Disable Optimization that removes Ops not connected to the main graph.
This optimization uses output names provided over commandline OR
inputs/outputs extracted from the Source model to determine the main graph
--debug [DEBUG] Run the converter in debug mode.
-o OUTPUT_PATH, --output_path OUTPUT_PATH
Path where the converted Output model should be saved.If not specified, the
converter model will be written to a file with same name as the input model
--copyright_file COPYRIGHT_FILE
Path to copyright file. If provided, the content of the file will be added
to the output model.
--float_bitwidth FLOAT_BITWIDTH
Selects the bitwidth to use when using float for parameters (weights/bias)
and activations for all ops or a specific op (via encodings) selected
through encoding; 32 (default) or 16.
--float_bw FLOAT_BW Deprecated; use --float_bitwidth.
--float_bias_bw FLOAT_BIAS_BW
Deprecated; use --float_bias_bitwidth.
--overwrite_model_prefix
If option passed, model generator will use the output path name to use as
model prefix to name functions in <qnn_model_name>.cpp. (Useful for running
multiple models at once) eg: ModelName_composeGraphs. Default is to use
generic "QnnModel_".
--exclude_named_tensors
Remove using source framework tensorNames; instead use a counter for naming
tensors. Note: This can potentially help to reduce the final model library
that will be generated(Recommended for deploying model). Default is False.
-h, --help show this help message and exit
Quantizer Options:
--quantization_overrides QUANTIZATION_OVERRIDES
Use this option to specify a json file with parameters to use for
quantization. These will override any quantization data carried from
conversion (eg TF fake quantization) or calculated during the normal
quantization process. Format defined as per AIMET specification.
--keep_quant_nodes Use this option to keep activation quantization nodes in the graph rather
than stripping them.
--input_list INPUT_LIST
Path to a file specifying the input data. This file should be a plain text
file, containing one or more absolute file paths per line. Each path is
expected to point to a binary file containing one input in the "raw" format,
ready to be consumed by the quantizer without any further preprocessing.
Multiple files per line separated by spaces indicate multiple inputs to the
network. See documentation for more details. Must be specified for
quantization. All subsequent quantization options are ignored when this is
not provided.
--param_quantizer PARAM_QUANTIZER
Optional parameter to indicate the weight/bias quantizer to use. Must be followed by one of the following options:
"tf": Uses the real min/max of the data and specified bitwidth (default).
"enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
"adjusted": Deprecated.
"symmetric": Ensures min and max have the same absolute values about zero.
Data will be stored as int#_t data such that the offset is always 0.
--act_quantizer ACT_QUANTIZER
Optional parameter to indicate the activation quantizer to use. Must be followed by one of the following options:
"tf": Uses the real min/max of the data and specified bitwidth (default).
"enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
"adjusted": Deprecated.
"symmetric": Ensures min and max have the same absolute values about zero.
Data will be stored as int#_t data such that the offset is always 0.
--algorithms ALGORITHMS [ALGORITHMS ...]
Use this option to enable new optimization algorithms. Usage is:
--algorithms <algo_name1> ... The available optimization algorithms are:
"cle" - Cross layer equalization includes a number of methods for equalizing
weights and biases across layers in order to rectify imbalances that cause
quantization errors.
--bias_bitwidth BIAS_BITWIDTH
Selects the bitwidth to use when quantizing the biases; 8 (default) or 32.
--bias_bw BIAS_BW Deprecated; use --bias_bitwidth.
--act_bitwidth ACT_BITWIDTH
Selects the bitwidth to use when quantizing the activations; 8 (default) or 16.
--act_bw ACT_BW Deprecated; use --act_bitwidth.
--weights_bitwidth WEIGHTS_BITWIDTH
Selects the bitwidth to use when quantizing the weights; 4 or 8 (default).
--weight_bw WEIGHT_BW
Deprecated; use --weights_bitwidth.
--float_bias_bitwidth FLOAT_BIAS_BITWIDTH
Selects the bitwidth to use when biases are in float; 32 or 16.
--ignore_encodings Use only quantizer generated encodings, ignoring any user or model provided
encodings.
Note: Cannot use --ignore_encodings with --quantization_overrides
--use_per_channel_quantization
Enables per-channel quantization for convolution-based op weights.
This replaces the built-in model QAT encodings when used for a given weight.
--use_per_row_quantization
Enables row wise quantization of Matmul and FullyConnected ops.
--float_fallback Enables fallback to floating point (FP) instead of fixed point.
This option can be paired with --float_bitwidth to indicate the bitwidth for FP (by default 32).
If this option is enabled, --input_list must not be provided and --ignore_encodings must not be provided.
The external quantization encodings (encoding file/FakeQuant encodings) might be missing
quantization parameters for some interim tensors. First it will try to fill the gaps by
propagating across math-invariant functions. If the quantization params are still missing,
it applies fallback to nodes to floating point.
--use_native_input_files
Boolean flag to indicate how to read input files:
1. float (default): reads inputs as floats and quantizes if necessary based
on quantization parameters in the model.
2. native: reads inputs assuming the data type to be native to the
model. For ex., uint8_t.
--use_native_dtype Note: This option is deprecated, use --use_native_input_files option in
future.
Boolean flag to indicate how to read input files:
1. float (default): reads inputs as floats and quantizes if necessary based
on quantization parameters in the model.
2. native: reads inputs assuming the data type to be native to the
model. For ex., uint8_t.
--use_native_output_files
Use this option to indicate the data type of the output files
1. float (default): output the file as floats.
2. native: outputs the file that is native to the model. For ex.,
uint8_t.
--disable_relu_squashing
Disables squashing of ReLU against convolution-based ops for quantized models.
--restrict_quantization_steps ENCODING_MIN, ENCODING_MAX
Specifies the number of steps to use for computing quantization encodings
such that scale = (max - min) / number of quantization steps.
The option should be passed as a space separated pair of hexadecimal string
minimum and maximum values. i.e. --restrict_quantization_steps "MIN MAX".
Please note that this is a hexadecimal string literal and not a signed
integer, to supply a negative value an explicit minus sign is required.
E.g.--restrict_quantization_steps "-0x80 0x7F" indicates an example 8 bit range,
--restrict_quantization_steps "-0x8000 0x7F7F" indicates an example 16
bit range.
Custom Op Package Options:
--op_package_lib OP_PACKAGE_LIB, -opl OP_PACKAGE_LIB
Use this argument to pass an op package library for quantization. Must be in
the form <op_package_lib_path:interfaceProviderName> and be separated by a
comma for multiple package libs
--converter_op_package_lib CONVERTER_OP_PACKAGE_LIB, -cpl CONVERTER_OP_PACKAGE_LIB
Absolute path to converter op package library compiled by the OpPackage
generator. Must be separated by a comma for multiple package libraries.
Note: Libraries must follow the same order as the xml files.
E.g.1: --converter_op_package_lib absolute_path_to/libExample.so
E.g.2: -cpl absolute_path_to/libExample1.so,absolute_path_to/libExample2.so
-p PACKAGE_NAME, --package_name PACKAGE_NAME
A global package name to be used for each node in the Model.cpp file.
Defaults to Qnn header defined package name
--op_package_config CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...], -opc CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]
Path to a Qnn Op Package XML configuration file that contains user defined
custom operations.
Architecture Checker Options(Experimental):
--arch_checker Note: This option will be soon deprecated. Use the qnn-architecture-checker tool to achieve the same result.
Note: Only one of: {‘package_name’, ‘op_package_config’} can be specified
Basic command line usage looks like:
$ qnn-pytorch-converter -i <path>/model.pt
-d <network_input_name> <dims>
-o <optional_output_path>
-p <optional_package_name> # Defaults to "qti.aisw"
qnn-onnx-converter¶
The qnn-onnx-converter tool converts a model from the ONNX framework to a CPP file representing the model as a series of QNN API calls. Additionally, a binary file containing static weights of the model is produced.
usage: qnn-onnx-converter [--out_node OUT_NAMES] [--input_type INPUT_NAME INPUT_TYPE]
[--input_dtype INPUT_NAME INPUT_DTYPE] [--input_encoding ...]
[--input_layout INPUT_NAME INPUT_LAYOUT] [--custom_io CUSTOM_IO]
[--preserve_io [PRESERVE_IO [PRESERVE_IO ...]]] [--dry_run [DRY_RUN]]
[-d INPUT_NAME INPUT_DIM] [-n] [-b BATCH] [-s SYMBOL_NAME VALUE]
[--dump_custom_io_config_template DUMP_CUSTOM_IO_CONFIG_TEMPLATE]
[--quantization_overrides QUANTIZATION_OVERRIDES] [--keep_quant_nodes]
[--disable_batchnorm_folding] [--expand_lstm_op_structure]
[--keep_disconnected_nodes]
[--input_list INPUT_LIST] [--param_quantizer PARAM_QUANTIZER]
[--act_quantizer ACT_QUANTIZER] [--algorithms ALGORITHMS [ALGORITHMS ...]]
[--bias_bitwidth BIAS_BITWIDTH] [--bias_bw BIAS_BW]
[--act_bitwidth ACT_BITWIDTH] [--act_bw ACT_BW]
[--weights_bitwidth WEIGHTS_BITWIDTH] [--weight_bw WEIGHT_BW]
[--float_bias_bitwidth FLOAT_BIAS_BITWIDTH] [--ignore_encodings]
[--use_per_channel_quantization] [--use_per_row_quantization]
[--float_fallback] [--use_native_input_files] [--use_native_dtype]
[--use_native_output_files] [--disable_relu_squashing]
[--restrict_quantization_steps ENCODING_MIN, ENCODING_MAX]
--input_network INPUT_NETWORK [--debug [DEBUG]]
[-o OUTPUT_PATH] [--copyright_file COPYRIGHT_FILE]
[--float_bitwidth FLOAT_BITWIDTH] [--float_bw FLOAT_BW]
[--float_bias_bw FLOAT_BIAS_BW] [--overwrite_model_prefix]
[--exclude_named_tensors] [--op_package_lib OP_PACKAGE_LIB]
[--converter_op_package_lib CONVERTER_OP_PACKAGE_LIB]
[-p PACKAGE_NAME | --op_package_config CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]]
[-h] [--arch_checker]
Script to convert ONNX model into QNN
required arguments:
--input_network INPUT_NETWORK, -i INPUT_NETWORK
Path to the source framework model.
optional arguments:
--out_node OUT_NAMES, --out_name OUT_NAMES
Name of the graph's output tensor names. Multiple output
nodes should be provided separately like:
--out_name out_1 --out_name out_2
--input_type INPUT_NAME INPUT_TYPE, -t INPUT_NAME INPUT_TYPE
Type of data expected by each input op/layer. Type for
each input is |default| if not specified. For example:
"data" image.Note that the quotes should always be
included in order to handle special characters,
spaces,etc. For multiple inputs specify multiple
--input_type on the command line. Eg:
--input_type "data1" image --input_type "data2" opaque
These options get used by DSP runtime and following
descriptions state how input will be handled for each
option.
Image:
Input is float between 0-255 and the input's mean is 0.0f and the input's
max is 255.0f. We will cast the float to uint8ts and pass the uint8ts to
the DSP.
Default:
Pass the input as floats to the dsp
directly and the DSP will quantize it.
Opaque:
Assumes input is float because the consumer layer(i.e next
layer) requires it as float, therefore it won't be
quantized.
Choices supported:
image
default
opaque
--input_dtype INPUT_NAME INPUT_DTYPE
The names and datatype of the network input layers
specified in the format [input_name datatype], for
example:
'data' 'float32'.
Default is float32 if not specified.
Note that the quotes should always be included in order to handle special
characters, spaces, etc.
For multiple inputs specify multiple --input_dtype on the command line like:
--input_dtype 'data1' 'float32' --input_dtype 'data2' 'float32'
--input_encoding INPUT_ENCODING [INPUT_ENCODING ...], -e INPUT_ENCODING [INPUT_ENCODING ...]
Usage: --input_encoding "INPUT_NAME" INPUT_ENCODING_IN
[INPUT_ENCODING_OUT]
e.g.
--input_encoding "data" rgba
Quotes must wrap the input node name to handle special characters,
spaces, etc. To specify encodings for multiple inputs, invoke
--input_encoding for each one.
e.g.
--input_encoding "data1" rgba --input_encoding "data2" other
Optionally, an output encoding may be specified for an input node by
providing a second encoding. The default output encoding is bgr.
e.g.
--input_encoding "data3" rgba rgb
Input encoding types:
image color encodings: bgr,rgb, nv21, nv12, ...
time_series: for inputs of rnn models;
other: not available above or is unknown.
Supported encodings:
bgr
rgb
rgba
argb32
nv21
nv12
time_series
other
--input_layout INPUT_NAME INPUT_LAYOUT, -l INPUT_NAME INPUT_LAYOUT
Layout of each input tensor. If not specified, it will use the default
based on the Source Framework, shape of input and input encoding.
Accepted values are-
NCDHW, NDHWC, NCHW, NHWC, NFC, NCF, NTF, TNF, NF, NC, F, NONTRIVIAL
N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature, T = Time
NDHWC/NCDHW used for 5d inputs
NHWC/NCHW used for 4d image-like inputs
NFC/NCF used for inputs to Conv1D or other 1D ops
NTF/TNF used for inputs with time steps like the ones used for LSTM op
NF used for 2D inputs, like the inputs to Dense/FullyConnected layers
NC used for 2D inputs with 1 for batch and other for Channels (rarely used)
F used for 1D inputs, e.g. Bias tensor
NONTRIVIAL for everything elseFor multiple inputs specify multiple
--input_layout on the command line.
Eg:
--input_layout "data1" NCHW --input_layout "data2" NCHW
Note: This flag does not set the layout of the input tensor in the converted DLC.
Please use --custom_io for that.
--custom_io CUSTOM_IO
Use this option to specify a yaml file for custom IO.
--preserve_io PRESERVE_IO
Use this option to preserve IO layout and datatype. The different ways of using
this option are as follows:
--preserve_io layout <space separated list of names of inputs and outputs of the graph>
--preserve_io datatype <space separated list of names of inputs and outputs of the graph>
In this case, user should also specify the string - layout or datatype in the command
to indicate that converter needs to preserve the layout or datatype. e.g.
--preserve_io layout input1 input2 output1
--preserve_io datatype input1 input2 output1
Optionally, the user may choose to preserve the layout and/or datatype for all
the inputs and outputs of the graph. This can be done in the following two ways:
--preserve_io layout
--preserve_io datatype
Additionally, the user may choose to preserve both layout and datatypes for all
IO tensors by just passing the option as follows:
--preserve_io
Note: Only one of the above usages are allowed at a time.
Note: --custom_io gets higher precedence than --preserve_io.
--dry_run [DRY_RUN] Evaluates the model without actually converting any ops, and returns
unsupported ops/attributes as well as unused inputs and/or outputs if any.
Leave empty or specify "info" to see dry run as a table, or specify "debug"
to show more detailed messages only"
-d INPUT_NAME INPUT_DIM, --input_dim INPUT_NAME INPUT_DIM
The name and dimension of all the input buffers to the network specified in
the format [input_name comma-separated-dimensions],
for example: 'data' 1,224,224,3.
Note that the quotes should always be included in order to handle special
characters, spaces, etc.
NOTE: This feature works only with Onnx 1.6.0 and above
-n, --no_simplification
Do not attempt to simplify the model automatically. This may prevent some
models from properly converting
-b BATCH, --batch BATCH
The batch dimension override. This will take the first dimension of all
inputs and treat it as a batch dim, overriding it with the value provided
here. For example:
--batch 6
will result in a shape change from [1,3,224,224] to [6,3,224,224].
If there are inputs without batch dim this should not be used and each input
should be overridden independently using -d option for input dimension
overrides.
-s SYMBOL_NAME VALUE, --define_symbol SYMBOL_NAME VALUE
This option allows overriding specific input dimension symbols. For instance
you might see input shapes specified with variables such as :
data: [1,3,height,width]
To override these simply pass the option as:
--define_symbol height 224 --define_symbol width 448
which results in dimensions that look like:
data: [1,3,224,448]
--dump_custom_io_config_template
Dumps the yaml template for Custom I/O configuration. This file can be edited
as per the custom requirements and passed using the option --custom_ioUse
this option to specify a yaml file to which the custom IO config template is
dumped.
--disable_batchnorm_folding
--expand_lstm_op_structure
Enables optimization that breaks the LSTM op to equivalent math ops
--keep_disconnected_nodes
Disable Optimization that removes Ops not connected to the main graph.
This optimization uses output names provided over commandline OR
inputs/outputs extracted from the Source model to determine the main graph
--debug [DEBUG] Run the converter in debug mode.
-o OUTPUT_PATH, --output_path OUTPUT_PATH
Path where the converted Output model should be saved.If not specified, the
converter model will be written to a file with same name as the input model
--copyright_file COPYRIGHT_FILE
Path to copyright file. If provided, the content of the file will be added
to the output model.
--float_bitwidth FLOAT_BITWIDTH
Selects the bitwidth to use when using float for parameters (weights/bias)
and activations for all ops or a specific op (via encodings) selected
through encoding; 32 (default) or 16.
--float_bw FLOAT_BW Deprecated; use --float_bitwidth.
--float_bias_bw FLOAT_BIAS_BW
Deprecated; use --float_bias_bitwidth.
--overwrite_model_prefix
If option passed, model generator will use the output path name to use as
model prefix to name functions in <qnn_model_name>.cpp. (Useful for running
multiple models at once) eg: ModelName_composeGraphs. Default is to use
generic "QnnModel_".
--exclude_named_tensors
Remove using source framework tensorNames; instead use a counter for naming
tensors. Note: This can potentially help to reduce the final model library
that will be generated(Recommended for deploying model). Default is False.
-h, --help show this help message and exit
Quantizer Options:
--quantization_overrides QUANTIZATION_OVERRIDES
Use this option to specify a json file with parameters to use for
quantization. These will override any quantization data carried from
conversion (eg TF fake quantization) or calculated during the normal
quantization process. Format defined as per AIMET specification.
--keep_quant_nodes Use this option to keep activation quantization nodes in the graph rather
than stripping them.
--input_list INPUT_LIST
Path to a file specifying the input data. This file should be a plain text
file, containing one or more absolute file paths per line. Each path is
expected to point to a binary file containing one input in the "raw" format,
ready to be consumed by the quantizer without any further preprocessing.
Multiple files per line separated by spaces indicate multiple inputs to the
network. See documentation for more details. Must be specified for
quantization. All subsequent quantization options are ignored when this is
not provided.
--param_quantizer PARAM_QUANTIZER
Optional parameter to indicate the weight/bias quantizer to use. Must be followed by one of the following options:
"tf": Uses the real min/max of the data and specified bitwidth (default).
"enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
"adjusted": Deprecated.
"symmetric": Ensures min and max have the same absolute values about zero.
Data will be stored as int#_t data such that the offset is always 0.
--act_quantizer ACT_QUANTIZER
Optional parameter to indicate the activation quantizer to use. Must be followed by one of the following options:
"tf": Uses the real min/max of the data and specified bitwidth (default).
"enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
"adjusted": Deprecated.
"symmetric": Ensures min and max have the same absolute values about zero.
Data will be stored as int#_t data such that the offset is always 0.
--algorithms ALGORITHMS [ALGORITHMS ...]
Use this option to enable new optimization algorithms. Usage is:
--algorithms <algo_name1> ... The available optimization algorithms are:
"cle" - Cross layer equalization includes a number of methods for equalizing
weights and biases across layers in order to rectify imbalances that cause
quantization errors.
--bias_bitwidth BIAS_BITWIDTH
Selects the bitwidth to use when quantizing the biases; 8 (default) or 32.
--bias_bw BIAS_BW Deprecated; use --bias_bitwidth.
--act_bitwidth ACT_BITWIDTH
Selects the bitwidth to use when quantizing the activations; 8 (default) or 16.
--act_bw ACT_BW Deprecated; use --act_bitwidth.
--weights_bitwidth WEIGHTS_BITWIDTH
Selects the bitwidth to use when quantizing the weights; 4 or 8 (default).
--weight_bw WEIGHT_BW
Deprecated; use --weights_bitwidth.
--float_bias_bitwidth FLOAT_BIAS_BITWIDTH
Selects the bitwidth to use when biases are in float; 32 or 16.
--ignore_encodings Use only quantizer generated encodings, ignoring any user or model provided
encodings.
Note: Cannot use --ignore_encodings with --quantization_overrides
--use_per_channel_quantization
Enables per-channel quantization for convolution-based op weights.
This replaces the built-in model QAT encodings when used for a given weight.
--use_per_row_quantization
Enables row wise quantization of Matmul and FullyConnected ops.
--float_fallback Enables fallback to floating point (FP) instead of fixed point.
This option can be paired with --float_bitwidth to indicate the bitwidth for FP (by default 32).
If this option is enabled, --input_list must not be provided and --ignore_encodings must not be provided.
The external quantization encodings (encoding file/FakeQuant encodings) might be missing
quantization parameters for some interim tensors. First it will try to fill the gaps by
propagating across math-invariant functions. If the quantization params are still missing,
it applies fallback to nodes to floating point.
--use_native_input_files
Boolean flag to indicate how to read input files:
1. float (default): reads inputs as floats and quantizes if necessary based
on quantization parameters in the model.
2. native: reads inputs assuming the data type to be native to the
model. For ex., uint8_t.
--use_native_dtype Note: This option is deprecated, use --use_native_input_files option in
future.
Boolean flag to indicate how to read input files:
1. float (default): reads inputs as floats and quantizes if necessary based
on quantization parameters in the model.
2. native: reads inputs assuming the data type to be native to the
model. For ex., uint8_t.
--use_native_output_files
Use this option to indicate the data type of the output files
1. float (default): output the file as floats.
2. native: outputs the file that is native to the model. For ex.,
uint8_t.
--disable_relu_squashing
Disables squashing of ReLU against convolution-based ops for quantized models.
--restrict_quantization_steps ENCODING_MIN, ENCODING_MAX
Specifies the number of steps to use for computing quantization encodings
such that scale = (max - min) / number of quantization steps.
The option should be passed as a space separated pair of hexadecimal string
minimum and maximum values. i.e. --restrict_quantization_steps "MIN MAX".
Please note that this is a hexadecimal string literal and not a signed
integer, to supply a negative value an explicit minus sign is required.
E.g.--restrict_quantization_steps "-0x80 0x7F" indicates an example 8 bit range,
--restrict_quantization_steps "-0x8000 0x7F7F" indicates an example 16
bit range. This argument is required for 16-bit Matmul operations.
Custom Op Package Options:
--op_package_lib OP_PACKAGE_LIB, -opl OP_PACKAGE_LIB
Use this argument to pass an op package library for quantization. Must be in
the form <op_package_lib_path:interfaceProviderName> and be separated by a
comma for multiple package libs
--converter_op_package_lib CONVERTER_OP_PACKAGE_LIB, -cpl CONVERTER_OP_PACKAGE_LIB
Absolute path to converter op package library compiled by the OpPackage
generator. Must be separated by a comma for multiple package libraries.
Note: Libraries must follow the same order as the xml files.
E.g.1: --converter_op_package_lib absolute_path_to/libExample.so
E.g.2: -cpl absolute_path_to/libExample1.so,absolute_path_to/libExample2.so
-p PACKAGE_NAME, --package_name PACKAGE_NAME
A global package name to be used for each node in the Model.cpp file.
Defaults to Qnn header defined package name
--op_package_config OP_PACKAGE_CONFIG [OP_PACKAGE_CONFIG ...], -opc OP_PACKAGE_CONFIG [OP_PACKAGE_CONFIG ...]
Path to a Qnn Op Package XML configuration file that contains user defined
custom operations.
Architecture Checker Options(Experimental):
--arch_checker Note: This option will be soon deprecated. Use the qnn-architecture-checker tool to achieve the same result.
Note: Only one of: {'package_name', 'op_package_config'} can be specified
qairt-converter¶
The qairt-converter tool converts a model from the one of Onnx/TensorFlow/TFLite/PyTorch framework to a DLC file representing the QNN graph format that can enable inference on Qualcomm AI IP/HW. The converter auto detects the framework based on the source model extension.
Basic command line usage looks like:
usage: qairt-converter [--desired_input_shape INPUT_NAME INPUT_DIM] [--out_tensor_node OUT_NAMES]
[--source_model_input_datatype INPUT_NAME INPUT_DTYPE]
[--source_model_input_layout INPUT_NAME INPUT_LAYOUT]
[--desired_input_color_encoding [ ...]]
[--dump_io_config_template DUMP_IO_CONFIG_TEMPLATE] [--io_config IO_CONFIG]
[--dry_run [DRY_RUN]] [--quantization_overrides QUANTIZATION_OVERRIDES]
[--onnx_no_simplification] [--onnx_batch BATCH]
[--onnx_define_symbol SYMBOL_NAME VALUE] [--tf_no_optimization]
[--tf_show_unconsumed_nodes] [--tf_saved_model_tag SAVED_MODEL_TAG]
[--tf_saved_model_signature_key SAVED_MODEL_SIGNATURE_KEY]
[--tf_validate_models] [--tflite_signature_name SIGNATURE_NAME]
--input_network INPUT_NETWORK [-h] [--debug [DEBUG]]
[--output_path OUTPUT_PATH] [--copyright_file COPYRIGHT_FILE]
[--float_bitwidth FLOAT_BITWIDTH] [--float_bias_bitwidth FLOAT_BIAS_BITWIDTH]
[--model_version MODEL_VERSION] [--converter_op_package_lib CONVERTER_OP_PACKAGE_LIB]
[--package_name PACKAGE_NAME | --op_package_config CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]]
required arguments:
--input_network INPUT_NETWORK, -i INPUT_NETWORK
Path to the source framework model.
optional arguments:
--desired_input_shape INPUT_NAME INPUT_DIM, -d INPUT_NAME INPUT_DIM
The name and dimension of all the input buffers to the network specified in
the format [input_name comma-separated-dimensions],
for example: 'data' 1,224,224,3.
Note that the quotes should always be included in order to handle special
characters, spaces, etc.
NOTE: Required for TensorFlow and PyTorch. Optional for Onnx and Tflite
In case of Onnx, this feature works only with Onnx 1.6.0 and above
--out_tensor_node OUT_NAMES, --out_tensor_name OUT_NAMES
Name of the graph's output Tensor Names. Multiple output names should be
provided separately like:
--out_name out_1 --out_name out_2
NOTE: Required for TensorFlow. Optional for Onnx, Tflite and PyTorch
--source_model_input_datatype INPUT_NAME INPUT_DTYPE
The names and datatype of the network input layers specified in the format
[input_name datatype], for example:
'data' 'float32'
Default is float32 if not specified
Note that the quotes should always be included in order to handlespecial
characters, spaces, etc.
For multiple inputs specify multiple --input_dtype on the command line like:
--input_dtype 'data1' 'float32' --input_dtype 'data2' 'float32'
--source_model_input_layout INPUT_NAME INPUT_LAYOUT, -l INPUT_NAME INPUT_LAYOUT
Layout of each input tensor. If not specified, it will use the default
based on the Source Framework, shape of input and input encoding.
Accepted values are-
NCDHW, NDHWC, NCHW, NHWC, NFC, NCF, NTF, TNF, NF, NC, F, NONTRIVIAL
N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature, T =
Time
NDHWC/NCDHW used for 5d inputs
NHWC/NCHW used for 4d image-like inputs
NFC/NCF used for inputs to Conv1D or other 1D ops
NTF/TNF used for inputs with time steps like the ones used for LSTM op
NF used for 2D inputs, like the inputs to Dense/FullyConnected layers
NC used for 2D inputs with 1 for batch and other for Channels (rarely used)
F used for 1D inputs, e.g. Bias tensor
NONTRIVIAL for everything elseFor multiple inputs specify multiple
--input_layout on the command line.
Eg:
--input_layout "data1" NCHW --input_layout "data2" NCHW
--desired_input_color_encoding [ ...], -e [ ...]
Usage: --input_color_encoding "INPUT_NAME" INPUT_ENCODING_IN
[INPUT_ENCODING_OUT]
Input encoding of the network inputs. Default is bgr.
e.g.
--input_color_encoding "data" rgba
Quotes must wrap the input node name to handle special characters,
spaces, etc. To specify encodings for multiple inputs, invoke
--input_color_encoding for each one.
e.g.
--input_color_encoding "data1" rgba --input_color_encoding "data2" other
Optionally, an output encoding may be specified for an input node by
providing a second encoding. The default output encoding is bgr.
e.g.
--input_color_encoding "data3" rgba rgb
Input encoding types:
image color encodings: bgr,rgb, nv21, nv12, ...
time_series: for inputs of rnn models;
other: not available above or is unknown.
Supported encodings:
bgr
rgb
rgba
argb32
nv21
nv12
--dump_io_config_template DUMP_IO_CONFIG_TEMPLATE
Dumps the yaml template for I/O configuration. This file can be edited as
per the custom requirements and passed using the option --io_configUse this
option to specify a yaml file to which the IO config template is dumped.
--io_config IO_CONFIG
Use this option to specify a yaml file for input and output options.
--dry_run [DRY_RUN] Evaluates the model without actually converting any ops, and returns
unsupported ops/attributes as well as unused inputs and/or outputs if any.
-h, --help show this help message and exit
--debug [DEBUG] Run the converter in debug mode.
--output_path OUTPUT_PATH, -o OUTPUT_PATH
Path where the converted Output model should be saved.If not specified, the
converter model will be written to a file with same name as the input model
--copyright_file COPYRIGHT_FILE
Path to copyright file. If provided, the content of the file will be added
to the output model.
--float_bitwidth FLOAT_BITWIDTH
Use the --float_bitwidth option to convert the graph to the specified float
bitwidth, either 32 (default) or 16.
--float_bias_bitwidth FLOAT_BIAS_BITWIDTH
Use the --float_bias_bitwidth option to select the bitwidth to use for float
bias tensor
--model_version MODEL_VERSION
User-defined ASCII string to identify the model, only first 64 bytes will be
stored
Custom Op Package Options:
--converter_op_package_lib CONVERTER_OP_PACKAGE_LIB, -cpl CONVERTER_OP_PACKAGE_LIB
Absolute path to the converter op package library compiled by the OpPackage
generator. Multiple package libraries must be comma separated.
Note: The converter op package library order must match the xml file order.
Ex1: --converter_op_package_lib absolute_path_to/libExample.so
Ex2: -cpl absolute_path_to/libExample1.so,absolute_path_to/libExample2.so
--package_name PACKAGE_NAME, -p PACKAGE_NAME
A global package name to be used for each node in the Model.cpp file.
Defaults to the package name defined in the QNN header.
--op_package_config CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...], -opc CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]
Absolute path to an XML configuration file for a QNN op package that
contains custom, user-defined operations.
Quantizer Options:
--quantization_overrides QUANTIZATION_OVERRIDES
Use this option to specify a json file with parameters to use for
quantization. These will override any quantization data carried from
conversion (eg TF fake quantization) or calculated during the normal
quantization process. Format defined as per AIMET specification.
Onnx Converter Options:
--onnx_no_simplification
Do not attempt to simplify the model automatically. This may prevent some
models from properly converting
when sequences of unsupported static operations are present.
--onnx_batch BATCH The batch dimension override. This will take the first dimension of all
inputs and treat it as a batch dim, overriding it with the value provided
here. For example:
--batch 6
will result in a shape change from [1,3,224,224] to [6,3,224,224].
If there are inputs without batch dim this should not be used and each input
should be overridden independently using -d option for input dimension
overrides.
--onnx_define_symbol SYMBOL_NAME VALUE
This option allows overriding specific input dimension symbols. For instance
you might see input shapes specified with variables such as :
data: [1,3,height,width]
To override these simply pass the option as:
--define_symbol height 224 --define_symbol width 448
which results in dimensions that look like:
data: [1,3,224,448]
TensorFlow Converter Options:
--tf_no_optimization Do not attempt to optimize the model automatically.
--tf_show_unconsumed_nodes
Displays a list of unconsumed nodes, if there any are found. Nodeswhich are
unconsumed do not violate the structural fidelity of thegenerated graph.
--tf_saved_model_tag SAVED_MODEL_TAG
Specify the tag to seletet a MetaGraph from savedmodel. ex:
--saved_model_tag serve. Default value will be 'serve' when it is not
assigned.
--tf_saved_model_signature_key SAVED_MODEL_SIGNATURE_KEY
Specify signature key to select input and output of the model. ex:
--saved_model_signature_key serving_default. Default value will be
'serving_default' when it is not assigned
--tf_validate_models Validate the original TF model against optimized TF model.
Constant inputs with all value 1s will be generated and will be used
by both models and their outputs are checked against each other.
The % average error and 90th percentile of output differences will be
calculated for this.
Note: Usage of this flag will incur extra time due to inference of the
models.
Tflite Converter Options:
--tflite_signature_name SIGNATURE_NAME
Use this option to specify a specific Subgraph signature to convert
Model Preparation¶
Quantization Support¶
Quantization is supported through the converter interface and is performed at conversion time. The only required option to enable quantization along with conversion is the –input_list option, which provides the quantizer with the required input data for the given model. The following options are available in each converter listed above to enable and configure quantization:
Quantizer Options:
--quantization_overrides QUANTIZATION_OVERRIDES
Use this option to specify a json file with parameters
to use for quantization. These will override any
quantization data carried from conversion (eg TF fake
quantization) or calculated during the normal
quantization process. Format defined as per AIMET
specification.
--input_list INPUT_LIST
Path to a file specifying the input data. This file
should be a plain text file, containing one or more
absolute file paths per line. Each path is expected to
point to a binary file containing one input in the
"raw" format, ready to be consumed by the quantizer
without any further preprocessing. Multiple files per
line separated by spaces indicate multiple inputs to
the network. See documentation for more details. Must
be specified for quantization. All subsequent
quantization options are ignored when this is not
provided.
--param_quantizer PARAM_QUANTIZER
Optional parameter to indicate the weight/bias
quantizer to use. Must be followed by one of the
following options: "tf": Uses the real min/max of the
data and specified bitwidth (default) "enhanced": Uses
an algorithm useful for quantizing models with long
tails present in the weight distribution "adjusted":
Uses an adjusted min/max for computing the range,
particularly good for denoise models "symmetric":
Ensures min and max have the same absolute values
about zero. Data will be stored as int#_t data such
that the offset is always 0.
--act_quantizer ACT_QUANTIZER
Optional parameter to indicate the activation
quantizer to use. Must be followed by one of the
following options: "tf": Uses the real min/max of the
data and specified bitwidth (default) "enhanced": Uses
an algorithm useful for quantizing models with long
tails present in the weight distribution "adjusted":
Uses an adjusted min/max for computing the range,
particularly good for denoise models "symmetric":
Ensures min and max have the same absolute values
about zero. Data will be stored as int#_t data such
that the offset is always 0.
--algorithms ALGORITHMS [ALGORITHMS ...]
Use this option to enable new optimization algorithms.
Usage is: --algorithms <algo_name1> ... The
available optimization algorithms are: "cle" - Cross
layer equalization includes a number of methods for
equalizing weights and biases across layers in order
to rectify imbalances that cause quantization errors.
--bias_bw BIAS_BW Use the --bias_bw option to select the bitwidth to use
when quantizing the biases, either 8 (default) or 32.
--act_bw ACT_BW Use the --act_bw option to select the bitwidth to use
when quantizing the activations, either 8 (default) or
16.
--weight_bw WEIGHT_BW
Use the --weight_bw option to select the bitwidth to
use when quantizing the weights, currently only 8 bit
(default) supported.
--float_bias_bw FLOAT_BIAS_BW
Use the --float_bias_bw option to select the bitwidth to
use when biases are in float, either 32 or 16.
--ignore_encodings Use only quantizer generated encodings, ignoring any
user or model provided encodings. Note: Cannot use
--ignore_encodings with --quantization_overrides
--use_per_channel_quantization [USE_PER_CHANNEL_QUANTIZATION [USE_PER_CHANNEL_QUANTIZATION ...]]
Use per-channel quantization for
convolution-based op weights. Note: This will replace
built-in model QAT encodings when used for a given
weight.Usage "--use_per_channel_quantization" to
enable or "--use_per_channel_quantization false"
(default) to disable
--use_per_row_quantization [USE_PER_ROW_QUANTIZATION [USE_PER_ROW_QUANTIZATION ...]]
Use this option to enable rowwise quantization of Matmul and
FullyConnected op. Usage "--use_per_row_quantization" to enable
or "--use_per_row_quantization false" (default) to
disable. This option may not be supported by all backends.
Basic command line usage to convert and quantize a model using the TF converter would look like:
$ qnn-tensorflow-converter -i <path>/frozen_graph.pb
-d <network_input_name> <dims>
--out_node <network_output_name>
-o <optional_output_path>
--allow_unconsumed_nodes # optional, but most likely will be need for larger models
-p <optional_package_name> # Defaults to "qti.aisw"
--input_list input_list.txt
This will quantize the network using the default quantizer and bitwidths (8 bits for activations, weights, and biases).
For more detailed information on quantization, options, and algorithms please refer to Quantization.
qairt-quantizer¶
The qairt-quantizer tool converts non-quantized DLC models into quantized DLC models.
Basic command line usage looks like:
usage: qairt-quantizer --input_dlc INPUT_DLC [-h] [--output_dlc OUTPUT_DLC]
[--input_list INPUT_LIST] [--float_fallback]
[--algorithms ALGORITHMS [ALGORITHMS ...]] [--bias_bitwidth BIAS_BITWIDTH]
[--act_bitwidth ACT_BITWIDTH] [--weights_bitwidth WEIGHTS_BITWIDTH]
[--float_bitwidth FLOAT_BITWIDTH] [--float_bias_bitwidth FLOAT_BIAS_BITWIDTH]
[--ignore_encodings] [--use_per_channel_quantization]
[--use_per_row_quantization] [--use_native_input_files]
[--use_native_output_files]
[--restrict_quantization_steps ENCODING_MIN, ENCODING_MAX]
[--act_quantizer_calibration ACT_QUANTIZER_CALIBRATION]
[--param_quantizer_calibration PARAM_QUANTIZER_CALIBRATION]
[--act_quantizer_schema ACT_QUANTIZER_SCHEMA]
[--param_quantizer_schema PARAM_QUANTIZER_SCHEMA]
[--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE]
[--use_aimet_quantizer]
[--config_file CONFIG_FILE]
[--op_package_lib OP_PACKAGE_LIB]
[--dump_encoding_json] [--debug [DEBUG]]
required arguments:
--input_dlc INPUT_DLC
Path to the dlc container containing the model for which fixed-point
encoding metadata should be generated. This argument is required
optional arguments:
-h, --help show this help message and exit
--output_dlc OUTPUT_DLC
Path at which the metadata-included quantized model container should be
written.If this argument is omitted, the quantized model will be written at
<unquantized_model_name>_quantized.dlc
--input_list INPUT_LIST
Path to a file specifying the input data. This file should be a plain text
file, containing one or more absolute file paths per line. Each path is
expected to point to a binary file containing one input in the "raw" format,
ready to be consumed by the quantizer without any further preprocessing.
Multiple files per line separated by spaces indicate multiple inputs to the
network. See documentation for more details. Must be specified for
quantization. All subsequent quantization options are ignored when this is
not provided.
--float_fallback Use this option to enable fallback to floating point (FP) instead of fixed
point.
This option can be paired with --float_bitwidth to indicate the bitwidth for
FP (by default 32).
If this option is enabled, then input list must not be provided and
--ignore_encodings must not be provided.
The external quantization encodings (encoding file/FakeQuant encodings)
might be missing quantization parameters for some interim tensors.
First it will try to fill the gaps by propagating across math-invariant
functions. If the quantization params are still missing,
then it will apply fallback to nodes to floating point.
--algorithms ALGORITHMS [ALGORITHMS ...]
Use this option to enable new optimization algorithms. Usage is:
--algorithms <algo_name1> ... The available optimization algorithms are:
"cle" - Cross layer equalization includes a number of methods for equalizing
weights and biases across layers in order to rectify imbalances that cause
quantization errors.
--bias_bitwidth BIAS_BITWIDTH
Use the --bias_bitwidth option to select the bitwidth to use when quantizing
the biases, either 8 (default) or 32.
--act_bitwidth ACT_BITWIDTH
Use the --act_bitwidth option to select the bitwidth to use when quantizing
the activations, either 8 (default) or 16.
--weights_bitwidth WEIGHTS_BITWIDTH
Use the --weights_bitwidth option to select the bitwidth to use when
quantizing the weights, either 4 or 8 (default).
--float_bitwidth FLOAT_BITWIDTH
Use the --float_bitwidth option to select the bitwidth to use for float
tensors,either 32 (default) or 16.
--float_bias_bitwidth FLOAT_BIAS_BITWIDTH
Use the --float_bias_bitwidth option to select the bitwidth to use when
biases are in float, either 32 or 16.
--ignore_encodings Use only quantizer generated encodings, ignoring any user or model provided
encodings.
Note: Cannot use --ignore_encodings with --quantization_overrides
--use_per_channel_quantization
Use this option to enable per-channel quantization for convolution-based op
weights.
Note: This will replace built-in model QAT encodings when used for a given
weight.
--use_per_row_quantization
Use this option to enable rowwise quantization of Matmul and FullyConnected
ops.
--use_native_input_files
Boolean flag to indicate how to read input files:
1. float (default): reads inputs as floats and quantizes if necessary based
on quantization parameters in the model.
2. native: reads inputs assuming the data type to be native to the
model. For ex., uint8_t.
--use_native_output_files
Use this option to indicate the data type of the output files
1. float (default): output the file as floats.
2. native: outputs the file that is native to the model. For ex.,
uint8_t.
--restrict_quantization_steps ENCODING_MIN, ENCODING_MAX
Specifies the number of steps to use for computing quantization encodings
such that scale = (max - min) / number of quantization steps.
The option should be passed as a space separated pair of hexadecimal string
minimum and maximum valuesi.e. --restrict_quantization_steps "MIN MAX".
Please note that this is a hexadecimal string literal and not a signed
integer, to supply a negative value an explicit minus sign is required.
E.g.--restrict_quantization_steps "-0x80 0x7F" indicates an example 8 bit
range,
--restrict_quantization_steps "-0x8000 0x7F7F" indicates an example 16
bit range.
This argument is required for 16-bit Matmul operations.
--act_quantizer_calibration ACT_QUANTIZER_CALIBRATION
Specify which quantization calibration method to use for activations
supported values: min-max (default), sqnr, entropy, mse, percentile
This option can be paired with --act_quantizer_schema to override the
quantization
schema to use for activations otherwise default schema(asymmetric) will be
used
--param_quantizer_calibration PARAM_QUANTIZER_CALIBRATION
Specify which quantization calibration method to use for parameters
supported values: min-max (default), sqnr, entropy, mse, percentile
This option can be paired with --param_quantizer_schema to override the
quantization
schema to use for parameters otherwise default schema(asymmetric) will be
used
--act_quantizer_schema ACT_QUANTIZER_SCHEMA
Specify which quantization schema to use for activations
supported values: asymmetric (default), symmetric
--param_quantizer_schema PARAM_QUANTIZER_SCHEMA
Specify which quantization schema to use for parameters
supported values: asymmetric (default), symmetric
--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE
Specify the percentile value to be used with Percentile calibration method
The specified float value must lie within 90 and 100, default: 99.99
--use_aimet_quantizer
Use AIMET Quantizer in place of IR Quantizer. The following arguments are
not allowed together with this option, --restrict_quantization_steps,
--pack_4_bit_weights, --use_dynamic_16_bit_weights, --op_package_lib,
--keep_weights_quantized.
--config_file CONFIG_FILE
Path to a YAML quantizer config file. The config file is only required if you
need to run advance aimet quantization algorithms like AdaRound or AMP.
Currently, it is supported only along with the flag "--use_aimet_quantizer" and
"--algorithms" command line option as "adaround" or "amp". Please refer to SDK
documentation for more details.
--op_package_lib OP_PACKAGE_LIB, -opl OP_PACKAGE_LIB
Use this argument to pass an op package library for quantization. Must be in
the form <op_package_lib_path:interfaceProviderName> and be separated by a
comma for multiple package libs
--dump_encoding_json Dumps an encoding of all tensors to the specified JSON file
--debug [DEBUG] Run the quantizer in debug mode.
For more information on usage, please refer to SNPE documentation on the snpe-dlc-quant tool.
qnn-model-lib-generator¶
Note
The qnn-model-lib-generator tool compiles QNN model source code into artifacts for a specific target.
usage: qnn-model-lib-generator [-h] [-c <QNN_MODEL>.cpp] [-b <QNN_MODEL>.bin]
[-t LIB_TARGETS ] [-l LIB_NAME] [-o OUTPUT_DIR]
Script compiles provided Qnn Model artifacts for specified targets.
Required argument(s):
-c <QNN_MODEL>.cpp Filepath for the qnn model .cpp file
optional argument(s):
-b <QNN_MODEL>.bin Filepath for the qnn model .bin file
(Note: if not passed, runtime will fail if .cpp needs any items from a .bin file.)
-t LIB_TARGETS Specifies the targets to build the models for. Default: aarch64-android x86_64-linux-clang
-l LIB_NAME Specifies the name to use for libraries. Default: uses name in <model.bin> if provided,
else generic qnn_model.so
-o OUTPUT_DIR Location for saving output libraries.
Note
For Windows users, please execute this tool with python3.
qnn-op-package-generator¶
The qnn-op-package-generator tool is used to generate skeleton code for a QNN op package using an XML config file that describes the attributes of the package. The tool creates the package as a directory containing skeleton source code and makefiles that can be compiled to create a shared library object.
usage: qnn-op-package-generator [-h] --config_path CONFIG_PATH [--debug]
[--output_path OUTPUT_PATH] [-f]
optional arguments:
-h, --help show this help message and exit
required arguments:
--config_path CONFIG_PATH, -p CONFIG_PATH
The path to a config file that defines a QNN Op
package(s).
optional arguments:
--debug Returns debugging information from generating the
package
--output_path OUTPUT_PATH, -o OUTPUT_PATH
Path where the package should be saved
-f, --force-generation
This option will delete the entire existing package
Note appropriate file permissions must be set to use
this option.
--converter_op_package, -cop
Generates Converter Op Package skeleton code needed
by the output shape inference for converters
qnn-context-binary-generator¶
The qnn-context-binary-generator tool is used to create a context binary by using a particular backend and consuming a model library created by the qnn-model-lib-generator.
usage: qnn-context-binary-generator --model QNN_MODEL.so --backend QNN_BACKEND.so
--binary_file BINARY_FILE_NAME
[--model_prefix MODEL_PREFIX]
[--output_dir OUTPUT_DIRECTORY]
[--op_packages ONE_OR_MORE_OP_PACKAGES]
[--config_file CONFIG_FILE.json]
[--profiling_level PROFILING_LEVEL]
[--verbose] [--version] [--help]
REQUIRED ARGUMENTS:
-------------------
--model <FILE> Path to the <qnn_model_name.so> file containing a QNN network.
To create a context binary with multiple graphs, use
comma-separated list of model.so files. The syntax is
<qnn_model_name_1.so>,<qnn_model_name_2.so>.
--backend <FILE> Path to a QNN backend .so library to create the context binary.
--binary_file <VAL> Name of the binary file to save the context binary to with
.bin file extension. If not provided, no backend binary is created.
If absolute path is provided, binary is saved in this path.
Else binary is saved in the same path as --output_dir option.
OPTIONAL ARGUMENTS:
-------------------
--model_prefix Function prefix to use when loading <qnn_model_name.so> file
containing a QNN network. Default: QnnModel.
--output_dir <DIR> The directory to save output to. Defaults to ./output.
--op_packages <VAL> Provide a comma separated list of op packages
and interface providers to register. The syntax is:
op_package_path:interface_provider[,op_package_path:interface_provider...]
--profiling_level <VAL> Enable profiling. Valid Values:
1. basic: captures execution and init time.
2. detailed: in addition to basic, captures per Op timing
for execution.
3. backend: backend-specific profiling level specified
in the backend extension related JSON config file.
--profiling_option <VAL> Set profiling options:
1. optrace: Generates an optrace of the run.
--config_file <FILE> Path to a JSON config file. The config file currently
supports options related to backend extensions and
context priority. Please refer to SDK documentation
for more details.
--enable_intermediate_outputs Enable all intermediate nodes to be output along with
default outputs in the saved context.
Note that options --enable_intermediate_outputs and --set_output_tensors
are mutually exclusive. Only one of the options can be specified at a time.
--set_output_tensors <VAL> Provide a comma-separated list of intermediate output tensor names, for which the outputs
will be written in addition to final graph output tensors.
Note that options --enable_intermediate_outputs and --set_output_tensors
are mutually exclusive. Only one of the options can be specified at a time.
The syntax is: graphName0:tensorName0,tensorName1;graphName1:tensorName0,tensorName1.
In case of a single graph, its name is not necessary and a list of comma separated tensor
names can be provided, e.g.: tensorName0,tensorName1.
The same format can be provided in a .txt file.
--backend_binary <VAL> Name of the binary file to save a backend-specific context binary to with
.bin file extension. If not provided, no backend binary is created.
If absolute path is provided, binary is saved in this path.
Else binary is saved in the same path as --output_dir option.
--log_level Specifies max logging level to be set. Valid settings:
"error", "warn", "info" and "verbose"
--dlc_path <VAL> Paths to a comma separated list of Deep Learning Containers (DLC) from which to load the models.
Necessitates libQnnModelDlc.so as the --model argument.
To compose multiple graphs in the context, use comma-separated list of DLC files.
The syntax is <qnn_model_name_1.dlc>,<qnn_model_name_2.dlc>
Default: None
--input_output_tensor_mem_type <VAL> Specifies mem type to be used for input and output tensors during graph creation.
Valid settings:"raw" and "memhandle"
--platform_options <VAL> Specifies values to pass as platform options. Multiple platform options can be provided
using the syntax: key0:value0;key1:value1;key2:value2
--version Print the QNN SDK version.
--help Show this help message.
See qnn-net-run section for more details about --op_packages and --config_file options.
Execution¶
qnn-net-run¶
The qnn-net-run tool is used to consume a model library compiled from the output of the QNN converter, and run it on a particular backend.
DESCRIPTION:
------------
Example application demonstrating how to load and execute a neural network
using QNN APIs.
REQUIRED ARGUMENTS:
-------------------
--model <FILE> Path to the model containing a QNN network.
To compose multiple graphs, use comma-separated list of
model.so files. The syntax is
<qnn_model_name_1.so>,<qnn_model_name_2.so>.
--backend <FILE> Path to a QNN backend to execute the model.
--input_list <FILE> Path to a file listing the inputs for the network.
If there are multiple graphs in model.so, this has
to be comma-separated list of input list files.
When multiple graphs are present, to skip execution of a graph use
"__"(double underscore without quotes) as the file name in the
comma-seperated list of input list files.
--retrieve_context <VAL> Path to cached binary from which to load a saved
context from and execute graphs. --retrieve_context and
--model are mutually exclusive. Only one of the options
can be specified at a time.
OPTIONAL ARGUMENTS:
-------------------
--model_prefix Function prefix to use when loading <qnn_model_name.so>.
Default: QnnModel
--debug Specifies that output from all layers of the network
will be saved. This option can not be used when loading
a saved context through --retrieve_context option.
--output_dir <DIR> The directory to save output to. Defaults to ./output.
--use_native_output_files Specifies that the output files will be generated in the data
type native to the graph. If not specified, output files will
be generated in floating point.
--use_native_input_files Specifies that the input files will be parsed in the data
type native to the graph. If not specified, input files will
be parsed in floating point. Note that options --use_native_input_files
and --native_input_tensor_names are mutually exclusive.
Only one of the options can be specified at a time.
--native_input_tensor_names <VAL> Provide a comma-separated list of input tensor names,
for which the input files would be read/parsed in native format.
Note that options --use_native_input_files and
--native_input_tensor_names are mutually exclusive.
Only one of the options can be specified at a time.
The syntax is: graphName0:tensorName0,tensorName1;graphName1:tensorName0,tensorName1
--op_packages <VAL> Provide a comma-separated list of op packages, interface
providers, and, optionally, targets to register. Valid values
for target are CPU and HTP. The syntax is:
op_package_path:interface_provider:target[,op_package_path:interface_provider:target...]
--profiling_level <VAL> Enable profiling. Valid Values:
1. basic: captures execution and init time.
2. detailed: in addition to basic, captures per Op timing
for execution, if a backend supports it.
3. client: captures only the performance metrics
measured by qnn-net-run.
--perf_profile <VAL> Specifies performance profile to be used. Valid settings are
low_balanced, balanced, default, high_performance,
sustained_high_performance, burst, low_power_saver,
power_saver, high_power_saver, extreme_power_saver
and system_settings.
Note: perf_profile argument is now deprecated for
HTP backend, user can specify performance profile
through backend config now. Please refer to config_file
backend extensions usage section below for more details.
--config_file <FILE> Path to a JSON config file. The config file currently
supports options related to backend extensions,
context priority and graph configs. Please refer to SDK
documentation for more details.
--log_level <VAL> Specifies max logging level to be set. Valid settings:
error, warn, info, debug, and verbose.
--shared_buffer Specifies creation of shared buffers for graph I/O between the application
and the device/coprocessor associated with a backend directly.
--synchronous Specifies that graphs should be executed synchronously rather than asynchronously.
If a backend does not support asynchronous execution, this flag is unnecessary.
--num_inferences <VAL> Specifies the number of inferences. Loops over the input_list until
the number of inferences has transpired.
--duration <VAL> Specifies the duration of the graph execution in seconds.
Loops over the input_list until this amount of time has transpired.
--keep_num_outputs <VAL> Specifies the number of outputs to be saved.
Once the number of outputs reach the limit, subsequent outputs would be just discarded.
--batch_multiplier <VAL> Specifies the value with which the batch value in input and output tensors dimensions
will be multiplied. The modified input and output tensors will be used only during
the execute graphs. Composed graphs will still use the tensor dimensions from model.
--timeout <VAL> Specifies the value of the timeout for execution of graph in micro seconds. Please note
using this option with a backend that does not support timeout signals results in an error.
--retrieve_context_timeout <VAL> Specifies the value of the timeout for initialization of graph in micro seconds. Please note
using this option with a backend that does not support timeout signals results in an error.
Also note that this option can only be used when loading a saved context through
--retrieve_context option.
--max_input_cache_tensor_sets <VAL> Specifies the maximum number of input tensor sets that can be cached.
Use value "-1" to cache all the input tensors created.
Note that options --max_input_cache_tensor_sets and --max_input_cache_size_mb are mutually exclusive.
Only one of the options can be specified at a time.
--max_input_cache_size_mb <VAL> Specifies the maximum cache size in mega bytes(MB).
Note that options --max_input_cache_tensor_sets and --max_input_cache_size_mb are mutually exclusive.
Only one of the options can be specified at a time.
--set_output_tensors <VAL> Provide a comma-separated list of intermediate output tensor names, for which the outputs
will be written in addition to final graph output tensors. Note that options --debug and
--set_output_tensors are mutually exclusive. Only one of the options can be specified at a time.
Also note that this option can not be used when graph is retrieved from context binary,
since the graph is already finalized when retrieved from context binary.
The syntax is: graphName0:tensorName0,tensorName1;graphName1:tensorName0,tensorName1.
In case of a single graph, its name is not necessary and a list of comma separated tensor
names can be provided, e.g.: tensorName0,tensorName1.
The same format can be provided in a .txt file.
--use_mmap Specifies that the context binary that is being read should be loaded
using the Memory-mapped (MMAP) file I/O. Please note some platforms
may not support this due to OS limitations in which case an error
is thrown when this option is used.
--validate_binary Specifies that the context binary will be validated before creating a context.
This option can only be used with backends that support binary validation.
--platform_options <VAL> Specifies values to pass as platform options. Multiple platform options can be provided
using the syntax: key0:value0;key1:value1;key2:value2
--graph_profiling_start_delay <VAL> Specifies graph profiling start delay in seconds. Please Note that this option can only be used
in conjunction with graph-level profiling handles.
--dlc_path <VAL> Paths to a comma separated list of Deep Learning Containers (DLC) from which to load the models.
Necessitates libQnnModelDlc.so as the --model argument.
To compose multiple graphs in the context, use comma-separated list of DLC files.
The syntax is <qnn_model_name_1.dlc>,<qnn_model_name_2.dlc>
Default: None
--graph_profiling_num_executions <VAL> Specifies the maximum number of QnnGraph_execute/QnnGraph_executeAsync calls to be profiled.
Please Note that this option can only be used in conjunction with graph-level profiling handles.
--io_tensor_mem_handle_type <VAL> Specifies mem handle type to be used for Input and output tensors during graph execution.
Valid settings: "ion" and "dma_buf".
--version Print the QNN SDK version.
--help Show this help message.
EXIT CODES:
------------
List of exit codes used in qnn-net-run application.
Exit codes 1, 2, 126 – 165 and 255 should be avoided for user-defined exit codes since they have
special purpose as below:
1, 2 : Abnormal termination of a program.
126 - 165 are specifically used to indicate seg faults, bus errors etc..
3 - Application failure reason unknown. See DSP logs (logcat).
4 - Application failure due to invalid application argument.
6 - Application failure during setting log level.
7 - Application failure due to null or invalid function pointer etc.
9 - Application failure during qnn_net_run_HtpVXXHexagon initialization.
10 - Application failure during backend creation.
11 - Application failure during device creation.
12 - Application failure during Op Package registration.
13 - Application failure during creating context.
14 - Application failure during graph prepare.
15 - Application failure during graph finalize.
16 - Application failure during create from binary.
17 - Application failure during graph execution.
18 - Application failure during context free.
19 - Application failure during device free.
20 - Application failure during backend termination.
21 - Application failure during graph execution abort.
22 - Application failure during graph execution timeout.
23 - Application failure during the create from binary with suboptimal cache.
24 - Application failure during backend termination.
25 - Application failure during processing binary section or updating binary section etc.
26 - Application failure during binary update/execution.
See <QNN_SDK_ROOT>/examples/QNN/NetRun folder for reference example on how to use qnn-net-run tool.
Typical arguments:
--backend - The appropriate argument depends on what target and backend you want to run on
Android (aarch64):
<QNN_SDK_ROOT>/lib/aarch64-android/
CPU -
libQnnCpu.soGPU -
libQnnGpu.soHTA -
libQnnHta.soDSP (Hexagon v65) -
libQnnDspV65Stub.soDSP (Hexagon v66) -
libQnnDspV66Stub.soDSP -
libQnnDsp.soHTP (Hexagon v68) -
libQnnHtp.so[Deprecated] HTP Alternate Prepare (Hexagon v68) -
libQnnHtpAltPrepStub.soSaver -
libQnnSaver.soLinux x86:
<QNN_SDK_ROOT>/lib/x86_64-linux-clang/
CPU -
libQnnCpu.soHTP (Hexagon v68) -
libQnnHtp.soSaver -
libQnnSaver.soWindows x86:
<QNN_SDK_ROOT>/lib/x86_64-windows-msvc/
CPU -
QnnCpu.dllSaver -
QnnSaver.dllWoS:
<QNN_SDK_ROOT>/lib/aarch64-windows-msvc/
CPU -
QnnCpu.dllDSP (Hexagon v66) -
QnnDspV66Stub.dllDSP -
QnnDsp.dllHTP (Hexagon v68) -
QnnHtp.dllSaver -
QnnSaver.dll
Note
Hexagon based backend libraries are emulations on x86_64 platforms
--input_list - This argument provides a file containing paths to input files to be used for graph
execution. Input files can be specified with the below format:
<input_layer_name>:=<input_layer_path>[<space><input_layer_name>:=<input_layer_path>] [<input_layer_name>:=<input_layer_path>[<space><input_layer_name>:=<input_layer_path>]] ...
Below is an example containing 3 sets of inputs with layer names “Input_1” and “Input_2”, and files located in the relative path “Placeholder_1/real_input_inputs_1/”:
Input_1:=Placeholder_1/real_input_inputs_1/0-0#e6fb51.rawtensor Input_2:=Placeholder_1/real_input_inputs_1/0-1#8a171b.rawtensor Input_1:=Placeholder_1/real_input_inputs_1/1-0#67c965.rawtensor Input_2:=Placeholder_1/real_input_inputs_1/1-1#54f1ff.rawtensor Input_1:=Placeholder_1/real_input_inputs_1/2-0#b42dc6.rawtensor Input_2:=Placeholder_1/real_input_inputs_1/2-1#346a0e.rawtensor
Note: If the batch dimension of the model is greater than 1, the number of batch elements in the input file has to either match the batch dimension specified in the model or it has to be one. In the latter case, qnn-net-run will combine multiple lines into a single input tensor.
--op_packages - This argument is only needed if you are using custom op packages. The native QNN
ops are already included as part of the backend libraries.
When using custom op packages, each provided op package requires a colon separated command line argument containing the path to the op package shared library (.so) file, as well as the name of the interface provider, formatted as
<op_package_path>:<interface_provider>.The interface_provider argument must be the name of the function in the op package library that satisfies the QnnOpPackage_InterfaceProvider_t interface. In the skeleton code created by
qnn-op-package-generator, this function will be named<package_name><backend>InterfaceProvider.See Generating Op Packages for more information.
--config_file - This argument is only needed if you need to specify context priority or provide backend extensions
related parameters. These parameters are specified through a JSON file. The template of the JSON file is shown below:
{ "backend_extensions" : { "shared_library_path" : "path_to_shared_library", "config_file_path" : "path_to_config_file" }, "context_configs" : { "context_priority" : "low | normal | normal_high | high", "async_execute_queue_depth" : uint32_value, "enable_graphs" : ["<graph_name_1>", "<graph_name_2>", ...], "memory_limit_hint" : uint64_value, "is_persistent_binary" : boolean_value, "cache_compatibility_mode" : "permissive | strict" }, "graph_configs" : [ { "graph_name" : "graph_name_1", "graph_priority" : "low | normal | normal_high | high" "graph_profiling_start_delay" : double_value "graph_profiling_num_executions" : uint64_value } ], "profile_configs" : { "num_max_events" : uint64_value }, "async_graph_execution_config" : { "input_tensors_creation_tasks_limit" : uint32_value, "execute_enqueue_tasks_limit" : uint32_value } }All the options in the JSON file are optional. context_priority is used to specify priority of the context as a context config. async_execute_queue_depth is used to specify the number of executions that can be in the queue at a given time. While using a context binary, enable_graphs is used to implement the graph selection functionality. memory_limit_hint is used to set the peak memory limit hint of a deserialized context in MBs. is_persistent_binary indicates that the context binary pointer is available during QnnContext_createFromBinary and until QnnContext_free is called.
Set Cache Compatibility Mode : cache_compatibility_mode specifies the mode used to check whether cache record is optimal for the device. The available modes indicate binary cache compatibility:
“permissive”: Binary cache is compatible if it could run on the device; default.
“strict”: Binary cache is compatible if it could run on the device and fully utilize hardware capability. If it cannot fully utilize hardware, selecting this option results in a recommendation to prepare the cache again. This option returns an error if it is not supported by the selected backend.
Graph Selection : Allows to specify a subset of graphs in a context to be loaded and executed. If enable_graphs is specified, only those graphs are loaded. If a graph name is selected and it doesn’t exist, that would be an error. If enable_graphs is not specified or passed as an empty list, default behaviour continues where all graphs in a context are loaded.
graph_configs can be used to specify asynchronous execution order and depth, if a backend supports asynchronous execution. Every set of graph configs has to be specified along with a graph name. graph_profiling_start_delay is used to set the profiling start delay time in seconds. graph_profiling_num_executions is used to set the maximum number of QnnGraph_execute/QnnGraph_executeAsync calls that will be profiled.
profile_configs can be used to specify the max profile events per profiling handle.
async_graph_execution_config can be used to specify the limits on number of tasks that run in parallel when graphs are executed asynchronously using graphExecuteAsync. input_tensors_creation_tasks_limit specifies the maximum number of tasks in which input tensor sets are populated, which can be used for graph execution. execute_enqueue_tasks_limit specifies the maximum number of tasks in which the backend graphExecuteAsync will be called using the pre-populated input tensors. If unspecified, these values will be set to the specified “async_execute_queue_depth” or 10 which is the default for “async_execute_queue_depth”.
backend_extensions is used to exercise custom options in a particular backend. This can be done by providing an extensions shared library (.so) and a config file, if necessary. This is also required to enable various performance modes, which can be exercised using backend config. Currently, HTP supports it through
libQnnHtpNetRunExtensions.soshared library, DSP supports it throughlibQnnDspNetRunExtensions.soand GPU supports it throughlibQnnGpuNetRunExtensions.so. For different custom options which can be enabled with HTP see HTP Backend Extensions
--shared_buffer - This argument is only needed to indicate qnn-net-run to use shared buffers for zero-copy use case with
a device/coprocessor associated with a particular backend (for ex., DSP with HTP backend) for graph input and output tensor data.
This option is supported on Android only. qnn-net-run implements this feature using rpcmem APIs, which further create shared
buffers using ION/DMA-BUF memory allocator on Android, available through the shared library libcdsprpc.so. In addition to
specifying this option, for qnn-net-run to be able to discover libcdsprpc.so, the path in which the shared library is present
needs to be appended to LD_LIBRARY_PATH variable.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/vendor/lib64
Running Quantized Model on HTP backend with qnn-net-run¶
The HTP backend currently allows to finalize / create an optimized version of a quantized QNN model
offline, on Linux development host (using x86_64-linux-clang backend library) and then execute
the finalized model on device (using hexagon-v68 backend libraries).
First, configure the environment by following instructions in Setup section. Next,
build QNN Model library from your network, using artifacts produced by one of QNN converters.
See Building Example Model for reference.
Lastly, use the qnn-context-binary-generator utility to generate a serialized representation of the
finalized graph to execute the serialized binary on device.
1# Generate the optimized serialized representation of QNN Model on Linux development host.
2$ qnn-context-binary-generator --binary_file qnngraph.serialized.bin \
3 --model <path_to_model_library>/libQnnModel.so \ # a x86_64-linux-clang built quantized QNN model
4 --backend ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnHtp.so \
5 --output_dir <output_dir_for_result_and_qnngraph_serialized_binary> \
To use produced serialized representation of the finalized graph (qnngraph.serialized.bin)
ensure the below binaries are available on the android device:
libQnnHtpV68Stub.so(ARM)libQnnHtpPrepare.so(ARM)libQnnModel.so(ARM)libQnnHtpV68Skel.so(cDSP v68)qnngraph.serialized.bin(serialized binary from run on Linux development host)
See <QNN_SDK_ROOT>/examples/QNN/NetRun/android/android-qnn-net-run.sh script for reference
on how to use qnn-net-run tool on android device.
1# Run the optimized graph on HTP target
2$ qnn-net-run --retrieve_context qnngraph.serialized.bin \
3 --backend <path_to_model_library>/libQnnHtp.so \
4 --output_dir <output_dir_for_result> \
5 --input_list <path_to_input_list.txt>
Running Float Model on HTP backend with qnn-net-run¶
The QNN HTP backend can support running float32 models on select Qualcomm SoCs.
First, configure the environment by following instructions in Setup section. Next, build QNN Model library from your network, using artifacts produced by one of QNN converters. See Building Example Model for reference.
Lastly, configure backend_extensions parameters through a JSON file and set custom options for the HTP backend.
Pass this file to qnn-net-run using --config_file argument. backend_extensions take two parameters, an extensions shared library (.so) (for HTP use libQnnHtpNetRunExtensions.so) and
a config file for the backend.
Below is the template for the JSON file:
{ "backend_extensions" : { "shared_library_path" : "path_to_shared_library", "config_file_path" : "path_to_config_file" } }For HTP backend extensions configurations, you can set “vtcm_mb”, “fp16_relaxed_precision” and “graph_names” through a config file.
Here is an example of the config file:
1{
2 "graphs": [
3 {
4 "vtcm_mb": 8, // Provides performance infrastructure configuration options that are memory specific.
5 // Optional; if not set, QNN HTP defaults to 4.
6 "fp16_relaxed_precision": 1, // Ensures that operations will run with relaxed precision math i.e. float16 math
7
8 "graph_names": [ "qnn_model" ] // Provide the list of names of the graph for the inference as specified when using qnn converter tools
9 // "qnn_model" must be the name of the .cpp file generated during the model conversion (without the .cpp file extension)
10 .....
11 },
12 {
13 ..... // Other graph object
14 }
15 ]
16}
Note
“fp16_relaxed_precision” is the key configuration to enable running QNN float models on HTP float runtime. HTP Graph Configurations such as fp16_relaxed_precision, vtcm_mb etc are only applied if at least one “graph_name” is provided in backend extensions config.
See <QNN_SDK_ROOT>/examples/QNN/NetRun/android/android-qnn-net-run.sh script for reference
on how to use qnn-net-run tool on android device.
1# Run the optimized graph on HTP target
2$ qnn-net-run --model <path_to_model_library>/libQnnModel.so \ # a x86_64-linux-clang built float QNN model
3 --backend ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnHtp.so \
4 --config_file <path_to_JSON_file.json> \
5 --output_dir <output_dir_for_result> \
6 --input_list <path_to_input_list.txt>
qnn-throughput-net-run¶
The qnn-throughput-net-run tool is used to exercise the execution of multiple models on a QNN backend or on different backends in a multi-threaded fashion. It allows repeated execution of models on a specified backend for a specified duration or number of iterations.
Usage:
------
qnn-throughput-net-run [--config <config_file>.json]
[--output <results>.json]
REQUIRED argument(s):
--config <FILE>.json Path to the json config file .
OPTIONAL argument(s):
--output <FILE>.json Specify the json file used to save the performance test results.
Configuration JSON File:
qnn-throughput-net-run uses configuration file as input to run the models on the backends. The configuration json file comprises of four objects (required) - backends, models, contexts and testCase.
Below is an example of a json configuration file. Please refer the following section for detailed information on the four configuration objects backends, models, contexts and testCase.
{
"backends": [
{
"backendName": "cpu_backend",
"backendPath": "libQnnCpu.so",
"profilingLevel": "BASIC",
"backendExtensions": "libQnnHtpNetRunExtensions.so",
"perfProfile": "high_performance"
},
{
"backendName": "gpu_backend",
"backendPath": "libQnnGpu.so",
"profilingLevel": "OFF"
}
],
"models": [
{
"modelName": "model_1",
"modelPath": "libqnn_model_1.so",
"loadFromCachedBinary": false,
"inputPath": "model_1-input_list.txt",
"inputDataType": "FLOAT",
"postProcessor": "MSE",
"outputPath": "model_1-output",
"outputDataType": "FLOAT_ONLY",
"saveOutput": "NATIVE_ALL",
"groundTruthPath": "model_1-golden_list.txt"
},
{
"modelName": "model_2",
"modelPath": "libqnn_model_2.so",
"loadFromCachedBinary": false,
"inputPath": "model_2-input_list.txt",
"inputDataType": "FLOAT",
"postProcessor": "MSE",
"outputPath": "model_2-output",
"outputDataType": "FLOAT_ONLY",
"saveOutput": "NATIVE_LAST"
}
],
"contexts": [
{
"contextName": "cpu_context_1"
},
{
"contextName": "gpu_context_1"
}
],
"testCase": {
"iteration": 5,
"logLevel": "error",
"threads": [
{
"threadName": "cpu_thread_1",
"backend": "cpu_backend",
"context": "cpu_context_1",
"model": "model_1",
"interval": 10,
"loopUnit": "count",
"loop": 1
},
{
"threadName": "gpu_thread_1",
"backend": "gpu_backend",
"context": "gpu_context_1",
"model": "model_2",
"interval": 0,
"loopUnit": "count",
"loop": 10
}
]
}
}
backends : Property value is an array of json objects, where each object contains the needed backend information on which the models are executed. Each object of the array has the following properties as key/value pairs.
Key |
Value Type |
Default Value |
Optional / Required |
Description |
|---|---|---|---|---|
|
|
|
|
Is a unique identifier for the testcase to designate on which backend the model should be run. |
|
|
|
|
Specifies the on device backend .so library file path. |
|
|
|
|
Sets the QNN profiling level for the backend. Possible values: OFF, BASIC, DETAILED.
|
|
|
|
|
Enables backend specific options through optional backend extensions
shared library and config file.
This is required to enable various performance modes which are
exercised using |
|
|
|
|
Specifies performance profile to set. Possible values: |
|
|
|
|
Comma seperated list of custom op packages and interface providers for registration.
|
|
|
|
|
Enables backend specific platform options through QnnBackend_Config_t.
|
models : Property value is an array of json objects, where each object contatins details about a model and corresponding input data and post-processing information. Each object of the array has the following properties as key/value pairs.
Key |
Value Type |
Default Value |
Optional / Required |
Description |
|---|---|---|---|---|
|
|
|
|
Is a unique identifier for the testcase to designate which model to run. |
|
|
|
|
Specifies the <model>.so / <serialized_context>.bin file path. |
|
|
|
|
Set to |
|
|
|
|
Path to a file listing the inputs for the model. If there are multiple graphs in the <model>.so / <serialized_context>.bin, this has to be comma-separated list of input path of individual graph. Syntax: Graph1_input_path[,Graph2_input_path,…] If not set, Random Input Data is used. |
|
|
|
|
Possible values: NATIVE, FLOAT. |
|
|
|
|
Possible values: NONE, MSE, MSE_FLOAT32, MSE_INT8, MSE_INT16. If there are multiple graphs in the <model>.so / <serialized_context>.bin, this has to be comma-separated list of postProcessor values. Syntax: MSE[,NONE,…] MSE will output a mean squared error result for each execution with the golden file specified by the parameter
|
|
|
|
|
If |
|
|
|
|
Possible values: NATIVE_ONLY, FLOAT_ONLY, FLOAT_AND_NATIVE. |
|
|
|
|
Possible values: NONE, NATIVE_LAST,NATIVE_ALL.
|
|
|
|
|
Specifies the golden file path for computing the MSE. If there are multiple graphs in the <model>.so / <serialized_context>.bin, this has to be comma-separated list of ground truth path of individual graph. Syntax: Graph1_ground_truth_path_[,Graph2_ground_truth_path_,…] |
contexts : Property value is an array of json objects, where each object contains all the context information. Each object of the array has the following properties as key/value pairs.
Key |
Value Type |
Default Value |
Optional / Required |
Description |
|---|---|---|---|---|
|
|
|
|
Is a unique identifier for the testcase to designate the context in which a model should be created. |
|
|
|
|
Specifies the priority of the context. Possible values: DEFAULT, LOW, NORMAL, HIGH. |
|
|
|
|
Specfies the queue depth for async execution. |
|
|
|
|
Specifies the cache compatibility check mode; valid values are: “permissive” (default), and “strict”. |
testCase : Property value is a json object that specifies the testing configuration that controls multi-threaded execution.
Key |
Value Type |
Default Value |
Optional / Required |
Description |
|---|---|---|---|---|
|
|
|
|
Number of times the entire use case is repeated. If the value is |
|
|
|
|
Specifies max logging level to be set. Valid settings: |
|
|
|
|
Property value is an array of json objects, where each object contains all the thread details, that are to be executed by the qnn-throughput-net-run. Each object of the array has the below properties listed under threads as key/value pairs. |
threads : Property value is an array containing all the threads and corresponding backend, context
and models information.
Each element of the array can have the following required/optional property.
Key |
Value Type |
Default Value |
Optional / Required |
Description |
|---|---|---|---|---|
|
|
|
|
Is a unique identifier for the testcase to identify the thread and save the output results. |
|
|
|
|
Specifies the backend to be used when this thread executes the graph.
The value specified should match with one of the |
|
|
|
|
Specifies the context to be used when this thread executes the graph.
The value specified should match with one of the |
|
|
|
|
Specifies the model to be used by the thread for execution.
The value specified should match with one of the |
|
|
|
|
Set it to |
|
|
|
|
Set it to |
|
|
|
|
Set it to |
|
|
|
|
Repesents the interval (in microseconds) between each graph execution in the thread. |
|
|
|
|
Possible values: count, second. |
|
|
|
|
Value is taken either as seconds or count based on the value for the |
|
|
|
|
Set it to |
|
|
|
|
Specifies the backend config file to enable backend specific options through |
An example json file sample_config.json file can be found at <QNN_SDK_ROOT>/examples/QNN/ThroughputNetRun.
Analysis¶
qairt-accuracy-evaluator (Beta)¶
The qairt-accuracy-evaluator tool provides a framework to evaluate end-to-end accuracy metrics for a model on a given dataset. In addition, the tool can be used to identify the best quantization options for a model on a given set of inputs.
Dependencies
The QNN Accuracy Evaluator assumes that the platform dependencies and environment setup instructions have been followed as outlined in the Setup page. Certain additional python packages are required by this tool, refer to Optional Python packages.
Note: The qairt-accuracy-evaluator currently supports only ONNX models.
Usage¶
User needs to set QNN_SDK_ROOT environment variable to root directory of QNN SDK. The following environment variables might need to be set with appropriate values: QNN_MODEL_ZOO : Path to model zoo. If not set model_zoo base directory path is assumed to present at “/home/model_zoo”.
Note: This environment variable is required only if the model path supplied is not absolute and relative to the set model zoo path. ADB_PATH : Set the path to the ADB binary. If not set, it is queried and set from its executable path.
To conduct an accuracy analysis of a given model using a specific dataset, the user must create a configuration that specifies the backends, quantization options, and reference inference frameworks. Sample config files can be found at ${QNN_SDK_ROOT}/lib/python/qti/aisw/accuracy_evaluator/configs/samples/model_configs.
The high-level structure of a model config is shown below:
model
info
globals
dataset
preprocessing
postprocessing
inference-engine
verifier
metrics
User needs to provide all dataset information under the dataset section in the model config file, failing which, an error is thrown. An example of this is shown below:
dataset:
name: COCO2014
path: '/home/ml-datasets/COCO/2014/'
inputlist_file: inputlist.txt
calibration:
type: index
file: calibration-index.txt
Details of the dataset fields is as follows:
Field |
Description |
|---|---|
name |
Name of the dataset |
path |
Base directory of the dataset files |
inputlist_file |
Text file containing all the pre-processed input files relative to the path field, one input per line.
For models having multiple inputs, the inputs in each line have to be comma separated
|
calibration |
|
The inference engine is used to run the model on multiple inference schemas. A sample inference engine section is shown below, followed by the description of the different configurable entries in the inference section.
inference-engine:
model_path: MLPerfModels/ResNetV1.5/modelFiles/ONNX/resnet50_v1.onnx
simplify_model : True
inference_schemas:
- inference_schema:
name: qnn
precision: quant
target_arch: x86_64-linux-clang
backend: htp
tag: qnn_int8_htp_x86
converter_params:
float_bias_bitwidth: 32
quantizer_params:
param_quantizer_schema: symmetric
act_quantizer_calibration: min-max
use_per_channel_quantization: True
backend_extensions:
vtcm_mb: 4
rpc_control_latency: 100
dsp_arch: v75 #mandatory
inputs_info:
- input_tensor_0:
type: float32
shape: ["*", 3, 224, 224]
outputs_info:
- ArgMax_0:
type: int64
shape: ["*"]
- softmax_tensor_0:
type: float32
shape: ["*", 1001]
Details of each configurable entry is given below:
Field |
Description |
|---|---|
model_path |
Absolute or relative path of the model. If the path is relative, it would be taken relative to MODEL_ZOO_PATH, if set, else default /home/model_zoo |
simplify_model |
Flag to enable or disable model simplification for ONNX models. By default, this flag is set to True and the model would be simplified. Note: Model simplification would be skipped for models having custom operators or for inference schemas having quantization_overrides parameter configured. |
inference_schemas |
|
input_info |
|
output_info |
|
Note
Command line options available for config mode are as follows:
qairt-acc-evaluator options
options:
-config CONFIG path to model config yaml
-work_dir WORK_DIR working directory path. default is ./qacc_temp
-preproc_file PREPROC_FILE
Path to the text file containing list of preprocessed files.
If this file is provided evaluator will start at infer stage
-calib_file CALIB_FILE
Path to the text file containing list of calibration files.
-onnx_symbol ONNX_SYMBOL [ONNX_SYMBOL ...]
Replace onnx symbols in input/output shapes. Can be passed as list of multiple items.
Default replaced by 1. Example: __unk_200:1
-device_id DEVICE_ID Target device id to be provided
-inference_schema_type INFERENCE_SCHEMA_TYPE
run only the inference schemas with this name. Example: qnn, onnxrt
-inference_schema_tag INFERENCE_SCHEMA_TAG
run only this inference schema tag
-cleanup CLEANUP end: deletes the files after all stages are completed.
intermediate: deletes after previous stage outputs are used. (default:'')
-use_memory_plugins Flag to enable memory plugins.
-silent Run in silent mode. Do not expect any CLI input from user.
-debug Enable debug logs on console and the file. (default: False)
-set_global SET_GLOBAL [SET_GLOBAL ...]
Option used to set a global variable. It can be repeated.
Example: -set_global count:10 -set_global calib:5 (default: None)
Note
Users can accelerate their evaluations using memory plugins to minimize unnecessary reading and writing of data during evaluation by passing the -use_memory_plugins flag to the evaluator command.
Config file options
- inference_schema:
name: qnn
target_arch: x86_64-linux-clang
backend: cpu
precision: fp32
tag: qnn_cpu_x86
- inference_schema:
name: qnn
target_arch: aarch64-android
backend: cpu
precision: fp32
tag: qnn_cpu_android
- inference_schema:
name: qnn
target_arch: aarch64-android
backend: gpu
precision: fp32
tag: qnn_gpu_android
- inference_schema:
name: qnn
target_arch: x86_64-linux-clang
backend: htp
precision: quant
tag: htp_int8
converter_params:
quantization_overrides: "path to the ext quant json"
quantizer_params:
param_quantizer_calibration: min-max | sqnr
param_quantizer_schema: asymmetric | symmetric
use_per_channel_quantization: True | False
use_per_row_quantization: True | False
act_bitwidth: 8 | 16
bias_bitwidth: 8 | 32
weights_bitwidth: 8 | 4
backend_extensions:
dsp_arch: v79 # mandatory
vtcm_mb: 4
rpc_control_latency: 100
- inference_schema:
name: qnn
target_arch: aarch64-android
backend: htp
precision: quant
tag: htp_int8
converter_params:
quantization_overrides: "path to the ext quant json"
quantizer_params:
param_quantizer_calibration: min-max | sqnr
param_quantizer_schema: asymmetric | symmetric
use_per_channel_quantization: True | False
use_per_row_quantization: True | False
act_bitwidth: 8 | 16
bias_bitwidth: 8 | 32
weights_bitwidth: 8 | 4
backend_extensions:
dsp_arch: v79 # mandatory
vtcm_mb: 4
rpc_control_latency: 100
Verifiers
The verifier section provides information about the verifier being used to compare the inference outputs, in case of multiple inference schemas. A sample verifier section is shown below, followed by the description of the different configurable entries in the section.
verifier:
enabled: True
fetch_top: 1
type: avg
tol: 0.01
Details of each configurable entry is given below:
Field |
Description |
|---|---|
verifier |
|
Following are the verifiers that can be used to compare the outputs. Some of the verifiers output percentage match between the two tensors and some output the absolute value, corresponding to the selected verifier.
abs - Percentage match between the two tensors based on the relative tolerance threshold value
cos - Percentage match between the two tensors based on the Cosine Similarity score
topk - Percentage match between the two tensors based on the topk match between the two tensors
avg - Percentage match between the two tensors based on the average difference between the two tensors
l1norm - Percentage match between the two tensors based on the L1 Norm of the diff
l2norm - Percentage match between the two tensors based on the L2 Norm of the diff
std - Percentage match between the two tensors based on the standard deviation difference
rme - Percentage match between the two tensors based on the RMSE between the tensors
snr - Signal to Noise Ratio between the two tensors
maxerror - max error value between the two tensors
kld - KL Divergence value between the two tensors
pixelbypixel - pixel by pixel plot difference between the two tensors. For each input i, plot is saved at {work_dir}/{schema}/Result_{i}
box - The box verifier requires the –box_input parameter which accepts the filename of a json file with the following format:
{"box":"Result_0/detection_boxes_0.raw", "class":"Result_0/detection_classes_0.raw", "score":"Result_0/detection_scores_0.raw"}
Plugins
Plugins are Python classes used to implement different stages of the inference pipeline, such as dataset handling, preprocessing, postprocessing, and metrics logic.
Dataset and pre-processing plugins perform transformations to the input before they are passed to inference.
Post-processing plugins transform inference outputs.
Metric plugins analyze inference outputs to assess their accuracy
Sample plugins are provided in the SDK at ${QNN_SDK_ROOT}/lib/python/qti/aisw/accuracy_evaluator/plugins.
Users can implement their own plugins (custom plugins) to meet their specific requirements. To include custom plugins, export the CUSTOM_PLUGIN_PATH environment variable pointing to the location of the custom plugin(s), so that they are also included while registering the plugin(s).
export CUSTOM_PLUGIN_PATH=/path/to/custom/plugins/directory
In the model configuration file, plugins are defined as a transformation chain, as shown below:
transformations:
- plugin:
name: resize
params:
dims: 416,416
channel_order: RGB
type: letterbox
- plugin:
name: normalize
- plugin:
name: convert_nchw
Plugins required for dataset transformation are configured in the dataset section as shown below.
dataset:
name: ILSVRC2012
path: '/home/ml-datasets/imageNet/'
inputlist_file: inputlist.txt
annotation_file: ground_truth.txt
calibration:
type: dataset
file: calibration.txt
transformations:
- plugin:
name: filter_dataset
params:
random: False
max_inputs: -1
max_calib: -1
The preprocessing and postprocessing plugins that the user wishes to use are configured in the processing section as shown below:
preprocessing:
transformations:
- plugin:
name: resize
params:
dims: 416,416
channel_order: RGB
type: letterbox
- plugin:
name: normalize
postprocessing:
squash_results: True
transformations:
- plugin:
name: object_detection
params:
dims: 416,416
type: letterbox
dtypes: [float32, float32, float32, float32]
Metric calculation plugins are configured in the metrics section as shown below.
metrics:
transformations:
- plugin:
name: topk
params:
kval: 1,5
softmax_index: 1
round: 7
label_offset: 1
Plugins that need to be executed for a pipeline stage are listed under ‘transformations’ and preceded by the ‘plugin’ keyword. The following table lists details of each configurable entry for a plugin.
Field |
Description |
|---|---|
name |
Name of the plugin |
params |
Parameters expected and required by the plugin |
A complete list of all plugins and their parameters can be found at Accuracy Evaluator Plugins
Sample Command
qairt-accuracy-evaluator -config {path to configs}/qnn_resnet50_config.yaml
Results
The tool displays a table with quantization options ordered by output match based on the selected verifier and also generates a csv file with the same data. The comparator column shows output match percentage/value based on the selected verifier.The quant params column displays the quantization params used for that run. Other columns also show backend, runtime/compile params used. The information is also stored in a csv file at {work_dir}/metrics-info.csv.
Artifacts associated with each of the configured quantization option are stored at ` {work_dir}/infer/schema{i}_qnn_{backend}_{precision}_{j}`. Model outputs are stored at ` {work_dir}/infer/schema{i}_qnn_{backend}_{precision}_{j}/Result_{k}`.
Note
Snapshot of console log has been added for clarity.
Note
Snapshot of csv file has been added for clarity.
qnn-architecture-checker (Beta)¶
Architecture Checker is a tool made for models running with HTP backend, including quantized 8-bit, quantized 16-bit and FP16 models. It outputs a list of issues in the model that keep the model from getting better performance while running on the HTP backend. Architecture checker tool can be invoked with the modifier feature which will apply the recommended modifications for these issues. This will help in visualizing the changes that can be applied to the model to make it a better fit on the HTP backend.
X86-Linux/ WSL Usage:
$ qnn-architecture-checker -i <path>/model.json
-b <optional_path>/model.bin
-o <optional_output_path>
-m <optional_modifier_argument>
X86-Windows/ Windows on Snapdragon Usage:
$ python qnn-architecture-checker -i <path>/model.json
-b <optional_path>/model.bin
-o <optional_output_path>
-m <optional_modifier_argument>
required arguments:
-i INPUT_JSON, --input_json INPUT_JSON
Path to json file
optional arguments:
-b BIN, --bin BIN
Path to a bin file
-o OUTPUT_PATH, --output_path OUTPUT_PATH
Path where the output csv should be saved. If not specified, the output csv will be written to the same path as the input file
-m MODIFY, --modify MODIFY
The query to select the modifications to apply.
--modify or --modify show - To see all the possible modifications. Display list of rule names and details of the modifications.
--modify all - To apply all the possible modifications found for the model.
--modify apply=rule_name1,rule_name2 - To apply modifications for specified rule names. The list of rules should be comma separated without spaces
The output is a csv file and will be saved as <optional_output_path>/<model_name>_architecture_checker.csv. An example output is shown below:
Graph/Node_name |
Issue |
Recommendation |
Type |
Input_tensor_name:[dims] |
Output_tensor_name:[dims] |
Parameters |
Previous node |
Next nodes |
Modification |
Modification_info |
|
|---|---|---|---|---|---|---|---|---|---|---|---|
1 |
Graph |
This model uses 16-bit activation data. 16-bit activation data takes twice the amount of memory than 8-bit activation data does. |
Try to use a smaller datatype to get better performance. E.g., 8-bit |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
2 |
Node_name_1 |
The number of channels in the input/output tensor of this convolution node is low (smaller than 32). |
Try increasing the number of channels in the input/output tensor to 32 or greater to get better performance. |
Conv2d |
input_1:[1, 250, 250, 3], __param_1:[5, 5, 3, 32], convolution_0_bias:[32] |
output_1:[1, 123, 123, 32] |
{‘package’: ‘qti.aisw’, ‘type’: ‘Conv2d’, …} |
[‘previous_node_name’] |
[‘next_node_name1’, ‘next_node_name2’] |
N/A |
N/A |
Sample Command
qnn-architecture-checker --input_json ./model_net.json
--bin ./model.bin
--output_path ./archCheckerOutput
Architecture Checker - Model Modifier
For appying modifications to the model, the Architecture Checker can be invoked with “–modify” or “–modify show” which will display a list of possible modifications. In this case, the Architecture Checker tool will only show the rule names and modification detail. It will run without making any changes to the model and generate the csv output. Using the rule names from the above run, the Architecture Checker can be invoked with “–modify all” or “–modify apply=rule_name1,rule_name2”. In this case, the rule specific changes will be applied to the model and the changes can be viewed in the updated model json. Additionally, the output csv will also contain information related to the modifications.
Consider the below csv output generated after applying “–modify apply=elwisediv” modification on an example model.
Graph/Node_name |
Issue |
Recommendation |
Type |
Input_tensor_name:[dims] |
Output_tensor_name:[dims] |
Parameters |
Previous node |
Next nodes |
Modification |
Modification_info |
|
|---|---|---|---|---|---|---|---|---|---|---|---|
1 |
Node_name_1 |
ElementWiseDivide usually has poor performance compared to ElementWiseMultiply. |
Try replacing ElementWiseDivide with ElementWiseMultiply using the reciprocal value to get better performance. |
Eltwise_Binary |
input_1:[1, 52, 52, 6], input_2:[1] |
output_1:[1, 52, 52, 6] |
{‘package’: ‘qti.aisw’, ‘eltwise_type’: ‘ElementWiseDivide’, …} |
[‘previous_node_name’] |
[‘next_node_name1’, ‘next_node_name2’] |
Done |
ElementWiseDivide has been replaced by ElementWiseMultiply using the reciprocal value |
2 |
Node_name_2 |
The number of channels in the input/output tensor of this convolution node is low (smaller than 32). |
Try increasing the number of channels in the input/output tensor to 32 or greater to get better performance. |
Conv2d |
input_3:[1, 250, 250, 3], __param_1:[5, 5, 3, 32], convolution_1_bias:[32] |
output_2:[1, 123, 123, 32] |
{‘package’: ‘qti.aisw’, ‘type’: ‘Conv2d’, …} |
[‘previous_node_name’] |
[‘next_node_name1’, ‘next_node_name2’] |
N/A |
N/A |
Following are the commands to invoke Architecture Checker with Modifier to display list of modifications:
Sample Command
qnn-architecture-checker --input_json ./model_net.json
--bin ./model.bin
--output_path ./archCheckerOutput
--modify
Sample Command
qnn-architecture-checker --input_json ./model_net.json
--bin ./model.bin
--output_path ./archCheckerOutput
--modify show
Following are the commands to apply the modifications either on all possible modifications or specific rules:
Sample Command
qnn-architecture-checker --input_json ./model_net.json
--bin ./model.bin
--output_path ./archCheckerOutput
--modify all
Sample Command
qnn-architecture-checker --input_json ./model_net.json
--bin ./model.bin
--output_path ./archCheckerOutput
--modify apply=prelu,elwisediv
qnn-accuracy-debugger (Beta)¶
Dependencies
The Accuracy Debugger depends on the setup outlined in Setup. In particular, the following are required:
Platform dependencies are need to be met as per Platform Dependencies
The desired ML frameworks need to be installed. Accuracy debugger is verified to work with the ML framework versions mentioned at Environment Setup
The following environment variables are used inside this guide (User may change the following path depending on their needs):
RESOURCESPATH = {Path to the directory where all models and input files reside}
PROJECTREPOPATH = {Path to your accuracy debugger project directory}
Supported models
The qnn-accuracy-debugger currently supports ONNX, TFLite, and Tensorflow 1.x models. Pytorch models are supported only in oneshot-layerwise debugging algorithm of tool.
Overview
The accuracy-debugger tool finds inaccuracies in a neural-network at the layer level. The tool compares the golden outputs produced by running a model through a specific ML framework (ie. Tensorflow, Onnx, TFlite) with the results produced by running the same model through Qualcomm’s QNN Inference Engine. The inference engine can be run on a variety of computing mediums including GPU, CPU and DSP.
The following features are available in Accuracy Debugger. Each feature can be run with its corresponding option; for example, qnn-accuracy-debugger --{option}.
qnn-accuracy-debugger -–framework_runner This feature uses a ML framework e.g. tensorflow, tflite or onnx, to run the model to get intermediate outputs. Note: The argument –framewok_diagnosis has been replaced by –framework_runner. –framework_diagnosis will be deprecated in the future release.
qnn-accuracy-debugger –-inference_engine This feature uses the QNN engine to run a model to retrieve intermediate outputs.
qnn-accuracy-debugger –-verification This feature compares the output generated by the framework runner and inference engine features using verifiers such as CosineSimilarity, RtolAtol, etc.
qnn-accuracy-debugger –compare_encodings This feature extracts encodings from a given QNN net JSON file, compares them with the given AIMET encodings, and outputs an Excel sheet highlighting mismatches.
qnn-accuracy-debugger –tensor_inspection This feature compares given target outputs with reference outputs.
qnn-accuracy-debugger –quant_checker This feature analyzes the activations, weights, and biases of all the possible quantization options available in the qnn-converters for each subsequent layer of a given model.
- Tip:
You can use –help after the bin commands to see what other options (required or optional) you can add.
If no option is provided, Accuracy Debugger runs framework_runner, inference_engine, and verification sequentially.
Below are the instructons for running the Accuracy Debugger:
Framework Runner¶
The Framework Runner feature is designed to run models with different machine learning frameworks (e.g. Tensorflow, etc). A selected model is run with a specific ML framework. Golden outputs are produced for future comparison with inference results from the Inference Engine step.
Usage¶
usage: qnn-accuracy-debugger --framework_runner [-h]
-f FRAMEWORK [FRAMEWORK ...]
-m MODEL_PATH
-i INPUT_TENSOR [INPUT_TENSOR ...]
-o OUTPUT_TENSOR
[-w WORKING_DIR]
[--output_dirname OUTPUT_DIRNAME]
[-v]
[--disable_graph_optimization]
[--onnx_custom_op_lib ONNX_CUSTOM_OP_LIB]
[--add_layer_outputs ADD_LAYER_OUTPUTS]
[--add_layer_types ADD_LAYER_TYPES]
[--skip_layer_types SKIP_LAYER_TYPES]
[--skip_layer_outputs SKIP_LAYER_OUTPUTS]
[--start_layer START_LAYER]
[--end_layer END_LAYER]
Script to generate intermediate tensors from an ML Framework.
optional arguments:
-h, --help show this help message and exit
required arguments:
-f FRAMEWORK [FRAMEWORK ...], --framework FRAMEWORK [FRAMEWORK ...]
Framework type and version, version is optional. Currently
supported frameworks are ["tensorflow","onnx","tflite"] case
insensitive but spelling sensitive
-m MODEL_PATH, --model_path MODEL_PATH
Path to the model file(s).
-i INPUT_TENSOR [INPUT_TENSOR ...], --input_tensor INPUT_TENSOR [INPUT_TENSOR ...]
The name, dimensions, raw data, and optionally data
type of the network input tensor(s) specifiedin the
format "input_name" comma-separated-dimensions path-
to-raw-file, for example: "data" 1,224,224,3 data.raw
float32. Note that the quotes should always be
included in order to handle special characters,
spaces, etc. For multiple inputs specify multiple
--input_tensor on the command line like:
--input_tensor "data1" 1,224,224,3 data1.raw
--input_tensor "data2" 1,50,100,3 data2.raw float32.
-o OUTPUT_TENSOR, --output_tensor OUTPUT_TENSOR
Name of the graph's specified output tensor(s).
optional arguments:
-w WORKING_DIR, --working_dir WORKING_DIR
Working directory for the framework_runner to store
temporary files. Creates a new directory if the
specified working directory does not exist
--output_dirname OUTPUT_DIRNAME
output directory name for the framework_runner to
store temporary files under
<working_dir>/framework_runner. Creates a new
directory if the specified working directory does not
exist
-v, --verbose Verbose printing
--disable_graph_optimization
Disables basic model optimization
--onnx_custom_op_lib ONNX_CUSTOM_OP_LIB
path to onnx custom operator library
(below options are supported only for onnx and ignored for other frameworks)
--add_layer_outputs ADD_LAYER_OUTPUTS
Output layers to be dumped. example:1579,232
--add_layer_types ADD_LAYER_TYPES
outputs of layer types to be dumped. e.g
:Resize,Transpose. All enabled by default.
--skip_layer_types SKIP_LAYER_TYPES
comma delimited layer types to skip snooping. e.g
:Resize, Transpose
--skip_layer_outputs SKIP_LAYER_OUTPUTS
comma delimited layer output names to skip debugging.
e.g :1171, 1174
--start_layer START_LAYER
save all intermediate layer outputs from provided
start layer to bottom layer of model
--end_layer END_LAYER
save all intermediate layer outputs from top layer to
provided end layer of model
Please note: All the command line arguments should either be provided through command line or through the config file. They will not override those in the config file if there is overlap.
Sample Commands
qnn-accuracy-debugger \
--framework_runner \
--framework tensorflow \
--model_path $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen.pb \
--input_tensor "input:0" 1,299,299,3 $RESOURCESPATH/samples/InceptionV3Model/data/chairs.raw \
--output_tensor InceptionV3/Predictions/Reshape_1:0
qnn-accuracy-debugger \
--framework_runner \
--framework onnx \
--model_path $RESOURCESPATH/samples/dlv3onnx/dlv3plus_mbnet_513-513_op9_mod_basic.onnx \
--input_tensor Input 1,3,513,513 $RESOURCESPATH/samples/dlv3onnx/data/00000_1_3_513_513.raw \
--output_tensor Output
To run model with custom operator:
qnn-accuracy-debugger \
--framework_runner \
--framework onnx \
-input_tensor "image" 1,3,640,640 $RESOURCESPATH/models/yolov3/batched-inp-107-0.raw \
--model_path $RESOURCESPATH/models/yolov3/yolov3_640_640_with_abp_qnms.onnx \
--output_tensor detection_boxes \
--onnx_custom_op_lib $RESOURCESPATH/models/libCustomQnmsYoloOrt.so
- TIP:
a working_directory, if not otherwise specified, is generated from wherever you are calling the script from; it is recommended to call all scripts from the same directory so all your outputs and results are stored under the same directory without having outputs everywhere
for tensorflow it is sometimes necessary to add the :0 after the input and output node name to signify the index of the node. Notice the :0 is dropped for onnx models.
Output
The program also creates a directory named latest in working_directory/framework_runner which is symbolically linked to the most recently generated directory. In the example below, latest will have data that is symlinked to the data in the most recent directory YYYY-MM-DD_HH:mm:ss. Users may choose to override the directory name by passing it to –output_dirname (i.e. –output_dirname myTest1Ouput).
The float data produced by the Framework Runner step offers precise reference material for the Verification component to diagnose the accuracy of the network generated by the Inference Engine. Unless a path is otherwise specified, the Accuracy Debugger will create directories within the working_directory/framework_runner directory found in the current working directory. The directories will be named with the date and time of the program’s execution, and contain tensor data. Depending on the tensor naming convention of the model, there may be numerous sub-directories within the new directory. This occurs when tensor names include a slash “/”. For example, for the tensor names ‘inception_3a/1x1/bn/sc’, ‘inception_3a/1x1/bn/sc_internal’ and ‘inception_3a/1x1/bn’, subdirectories will be generated.
The figure above shows a sample output from a framework_runner run. InceptionV3 and Logits contain the outputs of each layer before the last layer. Each output directory contains the .raw files corresponding to each node. Every raw file that can be seen is the output of an operation. The outputs of the final layer are saved inside the Predictions directory. The file framework_runner_options.json contains all the options used to run this feature.
Inference Engine¶
The Inference Engine feature is designed to find the outputs for a QNN model. The output produced by this step can be compared with the golden outputs produced by the framework runner step.
Usage¶
usage: qnn-accuracy-debugger --inference_engine [-h]
-p ENGINE_PATH
-l INPUT_LIST
-r {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,htp}
-a {aarch64-android,x86_64-linux-clang,x86_64-windows-msvc,wos}
[--stage {source,converted,compiled}]
[-i INPUT_TENSOR [INPUT_TENSOR ...]]
[-o OUTPUT_TENSOR] [-m MODEL_PATH]
[-f FRAMEWORK [FRAMEWORK ...]]
[-qmcpp QNN_MODEL_CPP_PATH]
[-qmbin QNN_MODEL_BIN_PATH]
[-qmb QNN_MODEL_BINARY_PATH]
[--deviceId DEVICEID] [-v]
[--host_device {x86,x86_64-windows-msvc,wos}] [-w WORKING_DIR]
[--output_dirname OUTPUT_DIRNAME]
[--engine_version ENGINE_VERSION]
[--debug_mode_off]
[--print_version PRINT_VERSION]
[--offline_prepare] [-bbw {8,32}]
[-abw {8,16}]
[--golden_dir_for_mapping GOLDEN_DIR_FOR_MAPPING]
[-wbw {8}] [--lib_name LIB_NAME]
[-bd BINARIES_DIR] [-qmn MODEL_NAME]
[-pq {tf,enhanced,adjusted,symmetric}]
[-qo QUANTIZATION_OVERRIDES]
[--act_quantizer {tf,enhanced,adjusted,symmetric}]
[--act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}]
[--param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}]
[--act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}]
[--param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}]
[--algorithms ALGORITHMS]
[--ignore_encodings]
[--per_channel_quantization]
[-idt {float,native}]
[-odt {float_only,native_only,float_and_native}]
[--profiling_level {basic,detailed}]
[--perf_profile {low_balanced,balanced,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}]
[--log_level {error,warn,info,debug,verbose}]
[--qnn_model_net_json QNN_MODEL_NET_JSON]
[--qnn_netrun_config_file QNN_NETRUN_CONFIG_FILE]
[--extra_converter_args EXTRA_CONVERTER_ARGS]
[--extra_runtime_args EXTRA_RUNTIME_ARGS]
[--compiler_config COMPILER_CONFIG]
[--context_config_params CONTEXT_CONFIG_PARAMS]
[--graph_config_params GRAPH_CONFIG_PARAMS]
[--precision {int8,fp16,fp32}]
[--add_layer_outputs ADD_LAYER_OUTPUTS]
[--add_layer_types ADD_LAYER_TYPES]
[--skip_layer_types SKIP_LAYER_TYPES]
[--skip_layer_outputs SKIP_LAYER_OUTPUTS]
Script to run QNN inference engine.
optional arguments:
-h, --help show this help message and exit
Core Arguments:
--stage {source,converted,compiled}
Specifies the starting stage in the Accuracy Debugger
pipeline.
Source: starting with source framework model [default].
Converted: starting with model.cpp and .bin files.
Compiled: starting with a model's .so binary.
-r {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,htp}, --runtime {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,htp}
Runtime to be used.
Use HTP runtime for emulation on x86 host.
-a {aarch64-android,x86_64-linux-clang,x86_64-windows-msvc,wos}, --architecture {aarch64-android,x86_64-linux-clang,x86_64-windows-msvc,wos}
Name of the architecture to use for inference engine.
-l INPUT_LIST, --input_list INPUT_LIST
Path to the input list text.
Arguments required for SOURCE stage:
-i INPUT_TENSOR [INPUT_TENSOR ...], --input_tensor INPUT_TENSOR [INPUT_TENSOR ...]
The name, dimension, and raw data of the network input
tensor(s) specified in the format "input_name" comma-
separated-dimensions path-to-raw-file, for example:
"data" 1,224,224,3 data.raw. Note that the quotes
should always be included in order to handle special
characters, spaces, etc. For multiple inputs specify
multiple --input_tensor on the command line like:
--input_tensor "data1" 1,224,224,3 data1.raw
--input_tensor "data2" 1,50,100,3 data2.raw.
-o OUTPUT_TENSOR, --output_tensor OUTPUT_TENSOR
Name of the graph's output tensor(s).
-m MODEL_PATH, --model_path MODEL_PATH
Path to the model file(s).
-f FRAMEWORK [FRAMEWORK ...], --framework FRAMEWORK [FRAMEWORK ...]
Framework type to be used, followed optionally by
framework version.
Arguments required for CONVERTED stage:
-qmcpp QNN_MODEL_CPP_PATH, --qnn_model_cpp_path QNN_MODEL_CPP_PATH
Path to the qnn model .cpp file
-qmbin QNN_MODEL_BIN_PATH, --qnn_model_bin_path QNN_MODEL_BIN_PATH
Path to the qnn model .bin file
Arguments required for COMPILED stage:
-qmb QNN_MODEL_BINARY_PATH, --qnn_model_binary_path QNN_MODEL_BINARY_PATH
Path to the qnn model .so binary.
Optional Arguments:
--deviceId DEVICEID The serial number of the device to use. If not
available, the first in a list of queried devices will
be used for validation.
-v, --verbose Verbose printing
--host_device {x86,x86_64-windows-msvc,wos} The device that will be running conversion. Set to x86
by default.
-w WORKING_DIR, --working_dir WORKING_DIR
Working directory for the inference_engine to store
temporary files. Creates a new directory if the
specified working directory does not exist
--output_dirname OUTPUT_DIRNAME
output directory name for the inference_engine to
store temporary files under
<working_dir>/inference_engine .Creates a new
directory if the specified working directory does not
exist
-p ENGINE_PATH, --engine_path ENGINE_PATH
Path to the inference engine.
--debug_mode_off Specifies if wish to turn off debug_mode mode.
--print_version PRINT_VERSION
Print the QNN SDK version alongside the output.
--offline_prepare Use offline prepare to run qnn model.
-bbw {8,32}, --bias_bitwidth {8,32}
option to select the bitwidth to use when quantizing
the bias. default 8
-abw {8,16}, --act_bitwidth {8,16}
option to select the bitwidth to use when quantizing
the activations. default 8
--golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY, --golden_dir_for_mapping GOLDEN_DIR_FOR_MAPPING
Optional parameter to indicate the directory of the
goldens, it's used for tensor mapping without
framework.
-wbw {8}, --weights_bitwidth {8}
option to select the bitwidth to use when quantizing
the weights. Only support 8 atm
-nif, --use_native_input_files
Specifies that the input files will be parsed in the
data type native to the graph. If not specified, input
files will be parsed in floating point.
-nof, --use_native_output_files
Specifies that the output files will be generated in
the data type native to the graph. If not specified,
output files will be generated in floating point.
--lib_name LIB_NAME Name to use for model library (.so file)
-bd BINARIES_DIR, --binaries_dir BINARIES_DIR
Directory to which to save model binaries, if they
don't yet exist.
-mn MODEL_NAME, --model_name MODEL_NAME
Name of the desired output qnn model
-pq {tf,enhanced,adjusted,symmetric}, --param_quantizer {tf,enhanced,adjusted,symmetric}
Param quantizer algorithm used.
-qo QUANTIZATION_OVERRIDES, --quantization_overrides QUANTIZATION_OVERRIDES
Path to quantization overrides json file.
--act_quantizer {tf,enhanced,adjusted,symmetric}
Optional parameter to indicate the activation
quantizer to use
--act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}
Specify which quantization calibration method to use for activations.
This option has to be paired with --act_quantizer_schema.
--param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}
Specify which quantization calibration method to use for parameters.
This option has to be paired with --param_quantizer_schema.
--act_quantizer_schema {asymmetric,symmetric}
Specify which quantization schema to use for
activations. Can not be used together with
act_quantizer. Note: This argument mandates --act_quantizer_calibration to be passed.
--param_quantizer_schema {asymmetric,symmetric}
Specify which quantization schema to use for
parameters. Can not be used together with
param_quantizer. Note: This argument mandates --param_quantizer_calibration to be passed.
-fbw {16,32}, --float_bias_bitwidth {16,32}
option to select the bitwidth to use when biases are in float; default is 32
-rqs RESTRICT_QUANTIZATION_STEPS, --restrict_quantization_steps RESTRICT_QUANTIZATION_STEPS
ENCODING_MIN, ENCODING_MAX
Specifies the number of steps to use to compute quantization encodings such that
scale = (max - min) / number of quantization steps.
The option should be passed as a space separated pair of hexadecimal string minimum and maximum values,
i.e. --restrict_quantization_steps 'MIN MAX'. Note that this is a hexadecimal string
literal and not a signed integer. To supply a negative value an explicit minus sign is required.
e.g.: 8-bit range: --restrict_quantization_steps '-0x80 0x7F'
16-bit range: --restrict_quantization_steps '-0x8000 0x7F7F'
--algorithms ALGORITHMS
Use this option to enable new optimization algorithms.
Usage is: --algorithms <algo_name1> ... The available
optimization algorithms are: 'cle ' - Cross layer
equalization includes a number of methods for
equalizing weights and biases across layers in order
to rectify imbalances that cause quantization errors.
--ignore_encodings Use only quantizer generated encodings, ignoring any
user or model provided encodings.
--per_channel_quantization
Use per-channel quantization for convolution-based op
weights.
-idt {float,native}, --input_data_type {float,native}
the input data type, must match with the supplied
inputs
-odt {float_only,native_only,float_and_native}, --output_data_type {float_only,native_only,float_and_native}
the desired output data type
--profiling_level {basic,detailed,backend}
Enables profiling and sets its level.
--perf_profile {low_balanced,balanced,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}
--log_level {error,warn,info,debug,verbose}
Enable verbose logging.
--qnn_model_net_json QNN_MODEL_NET_JSON
Path to the qnn model net json. Only necessary if its being run from the converted stage. It has information about what structure the data is in within framework_runner and inference_engine steps.
This file is required to generate model_graph_struct.json file which is good to have in the verification step.
--qnn_netrun_config_file QNN_NETRUN_CONFIG_FILE
allow backend_extention features to be applied during
qnn-net-run
--extra_converter_args EXTRA_CONVERTER_ARGS
additional convereter arguments in a string. example:
--extra_converter_args input_dtype=data
float;input_layout=data1 NCHW
--extra_contextbin_args EXTRA_CONTEXTBIN_ARGS
additional context binary generator arguments in a quoted string.
example: --extra_contextbin_args 'arg1=value1;arg2=value2'
--extra_runtime_args EXTRA_RUNTIME_ARGS
additional convereter arguments in a quoted string.
example: --extra_runtime_args
profiling_level=basic;log_level=debug
--compiler_config COMPILER_CONFIG
Path to the compiler config file.
--context_config_params CONTEXT_CONFIG_PARAMS
optional context config params in a quoted string.
example: --context_config_params 'context_priority=high; cache_compatibility_mode=strict'
--graph_config_params GRAPH_CONFIG_PARAMS
optional graph config params in a quoted string.
example: --graph_config_params 'graph_priority=low; graph_profiling_num_executions=10'
--precision {int8,fp16,fp32}
Choose the precision. Default is int8.
Note: This option is not applicable when --stage is set to converted or compiled.
--add_layer_outputs ADD_LAYER_OUTPUTS
Output layers to be dumped, e.g., 1579,232
--add_layer_types ADD_LAYER_TYPES
Outputs of layer types to be dumped, e.g., Resize, Transpose; all enabled by default
--skip_layer_types SKIP_LAYER_TYPES
Comma delimited layer types to skip snooping, e.g., Resize, Transpose
--skip_layer_outputs SKIP_LAYER_OUTPUTS
Comma delimited layer output names to skip debugging, e.g., 1171, 1174
Please note: All the command line arguments should either be provided through command line or through the config file. They will not override those in the config file if there is overlap.
The inference engine config file can be found in {accuracy_debugger tool root directory}/python/qti/aisw/accuracy_debugger/lib/inference_engine/configs/config_files and is a JSON file. This config file stores information that helps the inference engine determine which tool and parameters to read in.
Sample Command
qnn-accuracy-debugger \
--inference_engine \
--framework tensorflow \
--runtime dspv73 \
--model_path $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen.pb \
--input_tensor "input:0" 1,299,299,3 $RESOURCESPATH/samples/InceptionV3Model/data/chairs.raw \
--output_tensor InceptionV3/Predictions/Reshape_1 \
--architecture x86_64-linux-clang \
--input_list $RESOURCESPATH/samples/InceptionV3Model/data/image_list.txt \
--verbose
Sample Command
qnn-accuracy-debugger \
--inference_engine \
--framework tensorflow \
--runtime dspv73 \
--host_device wos \
--model_path <RESOURCESPATH>\InceptionV3Model\inception_v3_2016_08_28_frozen.pb \
--input_tensor "input:0" 1,299,299,3 <RESOURCESPATH>\samples\InceptionV3Model\data\chairs.raw \
--output_tensor InceptionV3\Predictions\Reshape_1 \
--architecture wos \
--input_list <RESOURCESPATH>\samples\InceptionV3Model\data\image_list.txt \
--verbose
Sample Command
qnn-accuracy-debugger \
--inference_engine \
--framework tensorflow \
--runtime cpu \
--host_device x86_64-windows-msvc \
--model_path <RESOURCESPATH>\InceptionV3Model\inception_v3_2016_08_28_frozen.pb \
--input_tensor "input:0" 1,299,299,3 <RESOURCESPATH>\samples\InceptionV3Model\data\chairs.raw \
--output_tensor InceptionV3\Predictions\Reshape_1 \
--architecture x86_64-windows-msvc \
--input_list <RESOURCESPATH>\samples\InceptionV3Model\data\image_list.txt \
--verbose
Sample Command
qnn-accuracy-debugger \
--inference_engine \
--deviceId 357415c4 \
--framework tensorflow \
--runtime dspv73 \
--architecture aarch64-android \
--model_path $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen.pb \
--input_tensor "input:0" 1,299,299,3 $RESOURCESPATH/samples/InceptionV3Model/data/chairs.raw \
--output_tensor InceptionV3/Predictions/Reshape_1 \
--input_list $RESOURCESPATH/samples/InceptionV3Model/data/image_list.txt \
--verbose
- Tip:
for runtime (choose from ‘cpu’, ‘gpu’, ‘dsp’, ‘dspv65’, ‘dspv66’, ‘dspv68’, ‘dspv69’, ‘dspv73’, ‘htp’). Make sure the runtime is 73 for kailua, 69 for waipio, etc. Choose HTP runtime for emulation on x86 host.
the input_tensor (–i) and output_tensor (-o) does not need the :0 indexing like when runing tensorflow framework runner
two files, namely tensor_mapping.json and qnn_model_graph_struct.json are generated to be used in verification, be sure to locate these 2 files in the working_directory/inference_engine/latest
Before running the qnn-accuracy-debugger on a Windows x86 system/Windows on Snapdragon system, ensure that you have configured the environment. And, Specify the host and target machine as x86_64-windows-msvc/wos respectively.
Note that qnn-accuracy-debugger on Windows x86 system is tested only for CPU runtime currently.
More example commands running from different stages:
Sample Command
source file stage: same as example from above section (stage default is "source")
running from converted stage (x86):
qnn-accuracy-debugger \
--inference_engine \
--stage converted \
-qmcpp $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen_qnn_model.cpp \
-qmbin $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen_qnn_model.bin \
--runtime dspv73 \
--architecture x86_64-linux-clang \
--input_list $RESOURCESPATH/samples/InceptionV3Model/data/image_list.txt \
--qnn_model_net_json $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen_qnn_model_net.json \
--verbose \
--framework tensorflow \
--golden_output_reference_directory $RESOURCESPATH/samples/InceptionV3Model/golden_from_framework_runner/
Android Devices (ie. MTP):
qnn-accuracy-debugger \
--inference_engine \
--stage converted \
-qmcpp $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen_qnn_model.cpp \
-qmbin $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen_qnn_model.bin \
--deviceId f366ce60 \
--runtime dspv73 \
--architecture aarch64-android \
--input_list $RESOURCESPATH/samples/InceptionV3Model/data/image_list.txt \
--qnn_model_net_json $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen_qnn_model_net.json \
--verbose \
--framework tensorflow \
--golden_output_reference_directory $RESOURCESPATH/samples/InceptionV3Model/golden_from_framework_runner/
running in compiled stage (x86):
qnn-accuracy-debugger \
--inference_engine \
--stage compiled \
--qnn_model_binary $RESOURCESPATH/samples/InceptionV3Model/qnn_model_binaries/x86_64-linux-clang/libqnn_model.so \
--runtime dspv73 \
--architecture x86_64-linux-clang \
--input_list $RESOURCESPATH/samples/InceptionV3Model/data/image_list.txt \
--verbose \
--qnn_model_net_json $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen_qnn_model_net.json \
--golden_output_reference_directory $RESOURCESPATH/samples/InceptionV3Model/golden_from_framework_runner/
running in compiled stage (wos):
qnn-accuracy-debugger \
--inference_engine \
--stage compiled \
--qnn_model_binary <RESOURCESPATH>\samples\InceptionV3Model\qnn_model_binaries\x86_64-linux-clang\libqnn_model.so \
--runtime dspv73 \
--architecture wos \
--input_list <RESOURCESPATH>\samples\InceptionV3Model\data\image_list.txt \
--verbose \
--qnn_model_net_json <RESOURCESPATH>\samples\InceptionV3Model\inception_v3_2016_08_28_frozen_qnn_model_net.json \
--golden_output_reference_directory <RESOURCESPATH>\samples\InceptionV3Model\golden_from_framework_runner\
Android devices (ie MTP):
qnn-accuracy-debugger \
--inference_engine \
--stage compiled \
--qnn_model_binary $RESOURCESPATH/samples/InceptionV3Model/qnn_model_binaries/aarch64-android/libqnn_model.so \
--runtime dspv73 \
--architecture aarch64-android \
--input_list $RESOURCESPATH/samples/InceptionV3Model/data/image_list.txt \
--verbose \
--qnn_model_net_json $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen_qnn_model_net.json \
--framework tensorflow \
--golden_output_reference_directory $RESOURCESPATH/samples/InceptionV3Model/golden_from_framework_runner/
To run onnx model with custom operator:
qnn-accuracy-debugger \
--inference_engine \
--framework onnx \
--runtime dspv75
--architecture aarch64_android \
--model_path $RESOURCESPATH/AISW-77095/model.onnx \
--input_tensor "image" 1,3,640,1794 $RESOURCESPATH/inputs/image.raw \
--output_tensor uncertainty_jacobian_bb \
--input_list $RESOURCESPATH/input_list.txt \
--default_verifier mse \
--engine QNN \
--engine_path $QNN_SDK_ROOT \
--extra_converter_args 'op_package_config=$RESOURCESPATH/CustomPreTopKOpPackageCPU_v2.xml;op_package_lib=$RESOURCESPATH/libCustomPreTopKOpPackageHtp.so:CustomPreTopKOpPackageHtpInterfaceProvider:' \
--extra_contextbin_args 'op_packages=$RESOURCESPATH/libQnnCustomPreTopKOpPackageHtp.so:CustomPreTopKOpPackageHtpInterfaceProvider:' \
--extra_runtime_args 'op_packages=$RESOURCESPATH/AISW-77095/libQnnCustomPreTopKOpPackageHtp_v75.so:CustomPreTopKOpPackageHtpInterfaceProvider' \
--debug_mode_off \
--offline_prepare \
--verbose
- Tip:
The qnn_model_net_json file is not required to run this step. However, it is needed to build the qnn_model_graph_struct.json, which can be used in the Verification step. The model_net.json file is generated when the original model is converted into a converted model. Hence if you are debugging this model from the converted model stage, it is recommended to ask for this model_net.json file.
framework and golden_dir_for_mapping, or just golden_dir_for_mapping itself is an alternative to the original model to be provided to generate the tensor_mapping.json. However, providing only the golden_dir_for_mapping, the get_tensor_mapping module will try its best to map but it is not guaranteed this mapping will be 100% accurate.
Output
Once the inference engine has finished running, it will store the output in the specified directory (or the current working directory by default) and store the files in that directory. By default, it will store the output in working_directory/inference_engine in the current working directory.
The figure above shows the sample output from one of the runs of inference engine step. The output directory contains raw files. Each raw file is an output of an operation in the network. The model.bin and model.cpp files are created by the model converter. The qnn_model_binaries directory contains the .so file that is generated by the modellibgenerator utility. The file image_list.txt contains the path for sample test images. The inference_engine_options.json file contains all the options with which this run was launched. In addition to generating the .raw files, the inference_engine also generates the model’s graph structure in a .json file. The name of the file is the same as the name of the protobuf model file. The model_graph_struct.json aids in providing structure related information of the converted model graph during the verification step. Specifically, it helps with organizing the nodes in order (for i.e. the beginning nodes should come earlier than ending nodes). The model_net.json has information about what structure the data is in within the framework_runner and inference_engine steps (data can be in different formats for e.g. channels first vs channels last). The verification step uses this information so that data can be properly transposed and compared. It is an optional parameter which can be provided during inference engine step for generating the model_graph_struct.json file (mandated only when running inference engine from the converted stage). Finally, the tensor_mapping file contains a mapping of the various intermediate output file names generated from the framework runner step and the inference engine step.
The created .raw files are organized in the same manner as framework_runner (see above).
Verification¶
The Verification step compares the output (from the intermediate tensors of a given model) produced by the framework runner step with the output produced by the inference engine step. Once the comparison is complete, the verification results are compiled and displayed visually in a format that can be easily interpreted by the user.
There are different types of verifiers for e.g.: CosineSimilarity, RtolAtol, etc. To see available verifiers please use the –help option (qnn-accuracy-debugger –verification –help). Each verifier compares the Framework Runner and Inference Engine output using an error metric. It also prepares reports and/or visualizations to help the user analyze the network’s error data.
Usage¶
usage: qnn-accuracy-debugger --verification [-h]
--default_verifier DEFAULT_VERIFIER
[DEFAULT_VERIFIER ...]
--golden_output_reference_directory
GOLDEN_OUTPUT_REFERENCE_DIRECTORY
--inference_results INFERENCE_RESULTS
[--tensor_mapping TENSOR_MAPPING]
[--qnn_model_json_path QNN_MODEL_JSON_PATH]
[--dlc_path DLC_PATH]
[--verifier_config VERIFIER_CONFIG]
[--graph_struct GRAPH_STRUCT] [-v]
[-w WORKING_DIR]
[--output_dirname OUTPUT_DIRNAME]
[--args_config ARGS_CONFIG]
[--target_encodings TARGET_ENCODINGS]
[-e ENGINE [ENGINE ...]]
Script to run verification.
required arguments:
--default_verifier DEFAULT_VERIFIER [DEFAULT_VERIFIER ...]
Default verifier used for verification. The options
"RtolAtol", "AdjustedRtolAtol", "TopK", "L1Error",
"CosineSimilarity", "MSE", "MAE", "SQNR", "ScaledDiff"
are supported. An optional list of hyperparameters can
be appended. For example: --default_verifier
rtolatol,rtolmargin,0.01,atolmargin,0.01 An optional
list of placeholders can be appended. For example:
--default_verifier CosineSimilarity param1 1 param2 2.
to use multiple verifiers, add additional
--default_verifier CosineSimilarity
--golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY, --framework_results GOLDEN_OUTPUT_REFERENCE_DIRECTORY
Path to root directory of golden output files. Paths
may be absolute, or relative to the working directory.
--inference_results INFERENCE_RESULTS
Path to root directory generated from inference engine
diagnosis. Paths may be absolute, or relative to the
working directory.
optional arguments:
--tensor_mapping TENSOR_MAPPING
Path to the file describing the tensor name mapping
between inference and golden tensors.
--qnn_model_json_path QNN_MODEL_JSON_PATH
Path to the qnn model net json, used for transforming
axis of golden outputs w.r.t to qnn outputs. Note:
Applicable only for QNN
--dlc_path DLC_PATH Path to the dlc file, used for transforming axis of
golden outputs w.r.t to target outputs. Note:
Applicable for QAIRT/SNPE
--verifier_config VERIFIER_CONFIG
Path to the verifiers' config file
--graph_struct GRAPH_STRUCT
Path to the inference graph structure .json file. This
file aids in providing structure related information
of the converted model graph during this stage.Note:
This file is mandatory when using ScaledDiff verifier
-v, --verbose Verbose printing
-w WORKING_DIR, --working_dir WORKING_DIR
Working directory for the verification to store
temporary files. Creates a new directory if the
specified working directory does not exist
--output_dirname OUTPUT_DIRNAME
output directory name for the verification to store
temporary files under <working_dir>/verification.
Creates a new directory if the specified working
directory does not exist
--args_config ARGS_CONFIG
Path to a config file with arguments. This can be used
to feed arguments to the AccuracyDebugger as an
alternative to supplying them on the command line.
--target_encodings TARGET_ENCODINGS
Path to target encodings json file.
Arguments for generating Tensor mapping (required when --tensor_mapping is not specified):
-e ENGINE [ENGINE ...], --engine ENGINE [ENGINE ...]
Name of engine(qnn/snpe) that is used for running
inference.
Please note: All the command line arguments should either be provided through command line or through the config file. They will not override those in the config file if there is overlap.
The main verification process run using qnn-accuracy-debugger –verification optionally uses –tensor_mapping and –graph_struct to find files to compare. These files are generated by the inference engine step, and should be supplied to verification for best results. By default they are named tensor_mapping.json and {model name}_graph_struct.json, and can be found in the output directory of the inference engine results.
Sample Command
Compare output of framework runner with inference engine:
qnn-accuracy-debugger \
--verification \
--default_verifier CosineSimilarity param1 1 param2 2 \
--default_verifier SQNR param1 5 param2 1 \
--golden_output_reference_directory $PROJECTREPOPATH/working_directory/framework_runner/2022-10-31_17-07-58/ \
--inference_results $PROJECTREPOPATH/working_directory/inference_engine/latest/output/Result_0/ \
--tensor_mapping $PROJECTREPOPATH/working_directory/inference_engine/latest/tensor_mapping.json \
--graph_struct $PROJECTREPOPATH/working_directory/inference_engine/latest/qnn_model_graph_struct.json \
--qnn_model_json_path $PROJECTREPOPATH/working_directory/inference_engine/latest/qnn_model_net.json
- Tip:
If you passed multiple images in the image_list.txt from run inference engine diagnosis, you’ll receive multiple output/Result_x, choose result that matches the input you used for framework runner for comparison (ie. in framework you used chair.raw and inference chair.raw was the first item in the image_list.txt then choose output/Result_0, if chair.raw was the second item in image_list.txt, then choose output/Result_1).
It is recommended to always supply ‘graph_struct’ and ‘tensor_mapping’ to the command as it is used to line up the report and find the corresponding files for comparison. if tensor_mapping did not get generated from previous steps, you can supplement with ‘model_path’, ‘engine’, ‘framework’ to have module generate ‘tensor_mapping’ during runtime.
You can also compare inference_engine outputs to inference_engine outputs by passing the /output of the inference_engine output to the ‘framework_results’. If you want the outputs to be exact-name-matching, then you do not need to provide a tensor_mapping file.
Note that if you need to generate a tensor mapping instead of providing a path to prexisting tensor mapping file. You can provide the ‘model_path’ option.
Verifier uses two optional config files. The first file is used to set parameters for specific verifiers, as well as which tensors to use these verifiers on. The second file is used to map tensor names from framework_runner to the inference_engine, since certain tensors generated by framework_runner may have different names than tensors generated by inference_engine.
Verifier Config:
The verifier config file is a JSON file that tells verification which verifiers (asides from the default verifier) to use and with which parameters and on what specific tensors. If no config file is provided, the tool will only use the default verifier specified from the command line, with its default parameters, on all the tensors. The JSON file is keyed by verifier names, with each verifier as its own dictionary keyed by “parameters” and “tensors”.
Config File
```json
{
"MeanIOU": {
"parameters": {
"background_classification": 1.0
},
"tensors": [["Postprocessor/BatchMultiClassNonMaxSuppression_boxes", "detection_classes:0"]]
},
"TopK": {
"parameters": {
"k": 5,
"ordered": false
},
"tensors": [["Reshape_1:0"], ["detection_classes:0"]]
}
}
```
Note that the “tensors” field is a list of lists. This is done because specific verifiers runs on two tensor at a time. Hence the two tensors are placed in a list. Otherwise if a verifier only runs on one tensor, it will have a list of lists with only one tensor name in each list. MeanIOU is not supported as verifer in Debugger.
Tensor Mapping:
Tensor mapping is a JSON file keyed by inference tensor names, of framework tensor names. If the tensor mapping is not provided, the tool will assume inference and golden tensor names are identical.
Tensor Mapping File
```json
{
"Postprocessor/BatchMultiClassNonMaxSuppression_boxes": "detection_boxes:0",
"Postprocessor/BatchMultiClassNonMaxSuppression_scores": "detection_scores:0"
}
```
Output
Verification’s output is divided into different verifiers. For example, if both RtolAtol and TopK verifiers are used, there will be two sub-directories named “RtolAtol” and “TopK”. Availble verifiers can be found by issuing –help option.
Under each sub-directory, the verification analysis for each tensor is organized similar to how framework_runner (see above) and inference_engine are organized. For each tensor, a CSV and HTML file is generated. In addition to the tensor-specific analysis, the tool also generates a summary CSV and HTML file which summarizes the data from all verifiers and their subsequent tensors. The following figure shows how a sample summary generated in the verification step looks. Each row in this summary corresponds to one tensor name that is identified by the framework runner and inference engine steps. The final column shows cosinesimilarity score which can vary between 0 to 1 (this range might be different for other verifiers). Higher scores denote similarity while lower scores indicate variance. The developer can then further investigate those specific tensor details. Developer should inspect tensors from top-to-bottom order, meaning if a tensor is broken at an earlier node, anything that was generated post that node is unreliable until that node is properly fixed.
Compare Encodings¶
The Compare Encodings feature is designed to compare QNN and AIMET encodings. This feature takes QNN model net and AIMET encoding JSON files as inputs. This feature executes in the following order.
Extracts encodings from the given QNN model net JSON.
Compares extracted QNN encodings with given AIMET encodings.
Writes results to an Excel file that highlights mismatches.
Throws warnings if some encodings are present in QNN but not in AIMET and vice-versa.
Writes the extracted QNN encodings JSON file (for reference).
Usage¶
usage: qnn-accuracy-debugger --compare_encodings [-h]
--input INPUT
--aimet_encodings_json AIMET_ENCODINGS_JSON
[--precision PRECISION]
[--params_only]
[--activations_only]
[--specific_node SPECIFIC_NODE]
[--working_dir WORKING_DIR]
[--output_dirname OUTPUT_DIRNAME]
[-v]
Script to compare QNN encodings with AIMET encodings
optional arguments:
-h, --help Show this help message and exit
required arguments:
--input INPUT
Path to QNN model net JSON file
--aimet_encodings_json AIMET_ENCODINGS_JSON
Path to AIMET encodings JSON file
optional arguments:
--precision PRECISION
Number of decimal places up to which comparison will be done (default: 17)
--params_only Compare only parameters in the encodings
--activations_only Compare only activations in the encodings
--specific_node SPECIFIC_NODE
Display encoding differences for the given node
--working_dir WORKING_DIR
Working directory for the compare_encodings to store temporary files.
Creates a new directory if the specified working directory does not exist.
--output_dirname OUTPUT_DIRNAME
Output directory name for the compare_encodings to store temporary files
under <working_dir>/compare_encodings. Creates a new directory if the
specified working directory does not exist.
-v, --verbose Verbose printing
Sample Commands
# Compare both params and activations
qnn-accuracy-debugger \
--compare_encodings \
--input QNN_model_net.json \
--aimet_encodings_json aimet_encodings.json
# Compare only params
qnn-accuracy-debugger \
--compare_encodings \
--input QNN_model_net.json \
--aimet_encodings_json aimet_encodings.json \
--params_only
# Compare only activations
qnn-accuracy-debugger \
--compare_encodings \
--input QNN_model_net.json \
--aimet_encodings_json aimet_encodings.json \
--activations_only
# Compare only a specific encoding
qnn-accuracy-debugger \
--compare_encodings \
--input QNN_model_net.json \
--aimet_encodings_json aimet_encodings.json \
--specific_node _2_22_Conv_output_0
Tip
A working_directory is generated from wherever this script is called from unless otherwise specified.
Output
The program creates a directory named latest in working_directory/compare_encodings which is symbolically linked to the most recently generated directory. In the example below, latest will have data that is symlinked to the data in the most recent directory YYYY-MM-DD_HH:mm:ss. Users may choose to override the directory name by passing it to –output_dirname, e.g., –output_dirname myTest.
The figure above shows a sample output from a compare_encodings run. The following details what each file contains.
compare_encodings_options.json contains all the options used to run this feature
encodings_diff.xlsx contains comparison results with mismatches highlighted
log.txt contains log statements for the run
extracted_encodings.json contains extracted QNN encodings
Tensor inspection¶
Tensor inspection compares given reference output and target output tensors and dumps various statistics to represent differences between them.
The Tensor inspection feature can:
Plot histograms for golden and target tensors
Plot a graph indicating deviation between golden and target tensors
Plot a cumulative distribution graph (CDF) for golden vs target tensors
Plot a density (KDE) graph for target tensor highlighting target min/max and calibrated min/max values
Create a CSV file containing information about: target min/max; calibrated min/max; golden output min/max; target/calibrated min/max differences; and computed metrics (verifiers).
Note
Usage¶
usage: qnn-accuracy-debugger --tensor_inspection [-h]
--golden_data GOLDEN_DATA
--target_data TARGET_DATA
--verifier VERIFIER [VERIFIER ...]
[-w WORKING_DIR]
[--data_type {int8,uint8,int16,uint16,float32}]
[--target_encodings TARGET_ENCODINGS]
[-v]
Script to inspection tensor.
required arguments:
--golden_data GOLDEN_DATA
Path to golden/framework outputs folder. Paths may be absolute or
relative to the working directory.
--target_data TARGET_DATA
Path to target outputs folder. Paths may be absolute or relative to the
working directory.
--verifier VERIFIER [VERIFIER ...]
Verifier used for verification. The options "RtolAtol",
"AdjustedRtolAtol", "TopK", "L1Error", "CosineSimilarity", "MSE", "MAE",
"SQNR", "ScaledDiff" are supported.
An optional list of hyperparameters can be appended, for example:
--verifier rtolatol,rtolmargin,0.01,atolmargin,0,01.
To use multiple verifiers, add additional --verifier CosineSimilarity
optional arguments:
-w WORKING_DIR, --working_dir WORKING_DIR
Working directory to save results. Creates a new directory if the
specified working directory does not exist
--data_type {int8,uint8,int16,uint16,float32}
DataType of the output tensor.
--target_encodings TARGET_ENCODINGS
Path to target encodings json file.
-v, --verbose Verbose printing
Sample Commands
# Basic run
qnn-accuracy-debugger --tensor_inspection \
--golden_data golden_tensors_dir \
--target_data target_tensors_dir \
--verifier sqnr
# Pass target encodings file and enable multiple verifiers
qnn-accuracy-debugger --tensor_inspection \
--golden_data golden_tensors_dir \
--target_data target_tensors_dir \
--verifier mse \
--verifier sqnr \
--verifier rtolatol,rtolmargin,0.01,atolmargin,0.01 \
--target_encodings qnn_encoding.json
Tip
A working_directory is generated from wherever this script is called from unless otherwise specified.
The figure above shows a sample output from a Tensor inspection run. The following details what each file contains.
Each tensor will have its own directory; the directory name matches the tensor name.
CDF_plots.html – Golden vs target CDF graph
Diff_plots.html – Golden and target deviation graph
Distribution_min-max.png – Density plot for target tensor highlighting target vs calibrated min/max values
Histograms.html – Golden and target histograms
golden_data.csv – Golden tensor data
target_data.csv – Target tensor data
log.txt – Log statements from the entire run
summary.csv – Target min/max, calibrated min/max, golden output min/max, target vs calibrated min/max differences, and verifier outputs
Histogram Plots
Comparison: We compare histograms for both the golden data and the target data.
Overlay: To enhance clarity, we overlay the histograms bin by bin.
Binned Ranges: Each bin represents a value range, showing the frequency of occurrence.
Visual Insight: Overlapping histograms reveal differences or similarities between the datasets.
Interactive: Hover over histograms to get tensor range and frequencies for the dataset.
Cumulative Distribution Function (CDF) Plots
Overview: CDF plots display the cumulative probability distribution.
Overlay: We superimpose CDF plots for golden and target data.
Percentiles: These plots illustrate data distribution across different percentiles.
Hover Details: Exact cumulative probabilities are available on hover.
Tensor Difference Plots
Inspection: We generate plots highlighting differences between golden and target data tensors.
Scatter and Line: Scatter plots represent tensor values, while line plots show differences at each index.
Interactive: Hover over points to access precise values.
Run QNN Accuracy Debugger E2E¶
This feature is designed to run the framework runner, inference engine, and verification features sequentially with a single command to debug the model. The following debugging algorithms are available.
- Oneshot-layerwise(default):
- This algorithm is designed to debug all layers of model at a time by performing below steps
Execute framework runner to collect reference outputs in fp32
Execute inference engine to collect backend outputs in provided target precision.
Execute verification for comparison of intermediate outputs from the above 2 steps
Execute tensor inspection (when –enable_tensor_inspection is passed) to dump various plots, e.g., scatter, line, CDF, etc., for intermediate outputs
It provides quick analysis to identify layers of model causing accuracy deviation.
User can chose cumulative-layerwise(below) for deeper analysis of accuracy deviation.
- Cumulative-layerwise:
- This algorithm is designed to debug one layer at a time by performing below steps
Execute framework runner to collect reference outputs from all layers of model in fp32.
- Execute inference engine and verification in iterative manner to perform below operations
to collect backend outputs in target precision for each layer while removing the effect of its preceeding layers on final output.
to compare intermediate outputs from framework runner and inference engine
It provides deeper analysis to identify all layers of model causing accuracy deviation.
Currently this option supports only onnx models.
- Layerwise:
- This algorithm is designed to debug a single layer model at a time by performing the following steps
Get golden reference per layer outputs from an external tool or, if a golden reference is not given, run framework runner to collect intermediate layer outputs.
- Iteratively execute inference engine and verification to:
Collect backend outputs in target precision for each single layer model by removing the preceding and following layers
Compare intermediate output from golden reference with inference engine single layer model output
Layerwise snooping provides deeper analysis to identify all model layers causing accuracy deviation on hardware with respect to framework/simulation outputs.
Layerwise snooping only supports ONNX models.
Usage¶
usage: qnn-accuracy-debugger [--framework_runner] [--inference_engine] [--verification] [-h]
Script that runs Framework Runner, Inference Engine or Verification.
Arguments to select which component of the tool to run. Arguments are mutually exclusive (at
most 1 can be selected). If none are selected, then all components are run:
--framework_runner Run framework
--inference_engine Run inference engine
--verification Run verification
optional arguments:
-h, --help Show this help message. To show help for any of the components, run
script with --help and --<component>. For example, to show the help
for Framework Runner, run script with the following: --help
--framework_runner
usage: qnn-accuracy-debugger [-h] -f FRAMEWORK [FRAMEWORK ...] -m MODEL_PATH -i INPUT_TENSOR
[INPUT_TENSOR ...] -o OUTPUT_TENSOR -r RUNTIME -a
{aarch64-android,x86_64-linux-clang,aarch64-android-clang6.0}
-l INPUT_LIST --default_verifier DEFAULT_VERIFIER
[DEFAULT_VERIFIER ...] [-v] [-w WORKING_DIR]
[--output_dirname OUTPUT_DIRNAME]
[--deep_analyzer {modelDissectionAnalyzer}]
[--debugging_algorithm {layerwise,cumulative-layerwise,oneshot-layerwise}]
Options for running the Accuracy Debugger components
optional arguments:
-h, --help show this help message and exit
Arguments required by both Framework Runner and Inference Engine:
-f FRAMEWORK [FRAMEWORK ...], --framework FRAMEWORK [FRAMEWORK ...]
Framework type and version, version is optional. Currently supported
frameworks are [tensorflow, tflite, onnx]. For example, tensorflow
2.3.0
-m MODEL_PATH, --model_path MODEL_PATH
Path to the model file(s).
-i INPUT_TENSOR [INPUT_TENSOR ...], --input_tensor INPUT_TENSOR [INPUT_TENSOR ...]
The name, dimensions, raw data, and optionally data type of the
network input tensor(s) specifiedin the format "input_name" comma-
separated-dimensions path-to-raw-file, for example: "data"
1,224,224,3 data.raw float32. Note that the quotes should always be
included in order to handle special characters, spaces, etc. For
multiple inputs specify multiple --input_tensor on the command line
like: --input_tensor "data1" 1,224,224,3 data1.raw --input_tensor
"data2" 1,50,100,3 data2.raw float32.
-o OUTPUT_TENSOR, --output_tensor OUTPUT_TENSOR
Name of the graph's specified output tensor(s).
Arguments required by Inference Engine:
-r RUNTIME, --runtime RUNTIME
Runtime to be used for inference.
-a {aarch64-android,x86_64-linux-clang,aarch64-android-clang6.0}, --architecture {aarch64-an
droid,x86_64-linux-clang,aarch64-android-clang6.0}
Name of the architecture to use for inference engine.
-l INPUT_LIST, --input_list INPUT_LIST
Path to the input list text.
Arguments required by Verification: [3/467]
--default_verifier DEFAULT_VERIFIER [DEFAULT_VERIFIER ...]
Default verifier used for verification. The options "RtolAtol",
"AdjustedRtolAtol", "TopK", "L1Error", "CosineSimilarity", "MSE",
"MAE", "SQNR", "ScaledDiff" are supported. An optional
list of hyperparameters can be appended. For example:
--default_verifier rtolatol,rtolmargin,0.01,atolmargin,0,01. An
optional list of placeholders can be appended. For example:
--default_verifier CosineSimilarity param1 1 param2 2. to use
multiple verifiers, add additional --default_verifier
CosineSimilarity
optional arguments:
-v, --verbose Verbose printing
-w WORKING_DIR, --working_dir WORKING_DIR
Working directory for the wrapper to store temporary files. Creates
a new directory if the specified working directory does not exitst.
--output_dirname OUTPUT_DIRNAME
output directory name for the wrapper to store temporary files under
<working_dir>/wrapper. Creates a new directory if the specified
working directory does not exist
--deep_analyzer {modelDissectionAnalyzer}
Deep Analyzer to perform deep analysis
--golden_output_reference_directory
Optional parameter to indicate the directory of the golden reference outputs.
When this option is provided, the framework runner is stage skipped.
In inference stage, it's used for tensor mapping without a framework.
In verification stage, it's used as a reference to compare
outputs produced in the inference engine stage.
--enable_tensor_inspection
Plots graphs (line, scatter, CDF etc.) for each
layer's output. Additionally, summary sheet will have
more details like golden min/max, target min/max etc.,
--debugging_algorithm {layerwise,cumulative-layerwise,oneshot-layerwise}
Performs model debugging layerwise, cumulative-layerwise or in oneshot-
layerwise based on choice.
--step_size
Number of layers to skip in each iteration of debugging.
Applicable only for cumulative-layerwise algorithm.
--step_size (> 1) should not be used along with --add_layer_outputs,
--add_layer_types, --skip_layer_outputs, skip_layer_types,
--start_layer, --end_layer
(below options are ignored for framework_runner component incase of layerwise and cumulative-layerwise runs)
--add_layer_outputs ADD_LAYER_OUTPUTS
Output layers to be dumped, e.g., 1579,232
--add_layer_types ADD_LAYER_TYPES
Outputs of layer types to be dumped, e.g., Resize, Transpose; all enabled by default
--skip_layer_types SKIP_LAYER_TYPES
Comma delimited layer types to skip snooping, e.g., Resize, Transpose
--skip_layer_outputs SKIP_LAYER_OUTPUTS
Comma delimited layer output names to skip debugging, e.g., 1171, 1174
--start_layer START_LAYER
Extracts the given model from mentioned start layer
output name
--end_layer END_LAYER
Extracts the given model from mentioned end layer
output name
Note : --start_layer and --end_layer options are allowed only for Layerwise and Cumulative layerwise run
Sample Command for oneshot-layerwise
Command for Oneshot-layerwise using DSP backend:
qnn-accuracy-debugger \
--framework tensorflow \
--runtime dspv73 \
--model_path $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen.pb \
--input_tensor "input:0" 1,299,299,3 $PATHTOGOLDENI/samples/InceptionV3Model/data/chairs.raw \
--output_tensor InceptionV3/Predictions/Reshape_1:0 \
--architecture x86_64-linux-clang \
--debugging_algorithm oneshot-layerwise
--input_list $RESOURCESPATH/samples/InceptionV3Model/data/image_list.txt \
--default_verifier CosineSimilarity \
--enable_tensor_inspection \
--verbose
Command for Oneshot-layerwise using HTP emulation on x86 host:
qnn-accuracy-debugger \
--framework onnx \
--runtime htp \
--model_path /local/mnt/workspace/models/vit/vit_base_16_224.onnx \
--input_tensor "input.1" 1,3,224,224 /local/mnt/workspace/models/vit/000000039769_1_3_224_224.raw \
--output_tensor 1597 \
--architecture x86_64-linux-clang \
--input_list /local/mnt/workspace/models/vit/list.txt \
--default_verifier CosineSimilarity \
--offline_prepare \
--debugging_algorithm oneshot-layerwise
--enable_tensor_inspection \
--verbose
Note
The –enable_tensor_inspection argument significantly increases overall execution time when used with large models. To speed up execution, omit this argument.
Output
The program creates framework_runner, inference_engine, verification, and wrapper output directories as below:
framework_runner – Contains a timestamped directory that contains the intermediate layer outputs (framework) stored in .raw format as described in the framework runner step.
inference_engine – Contains a timestamped directory that contains the intermediate layer outputs (inference engine) stored in .raw format as described in the inference engine step.
verification directory – Contains a timestamped directory that contains the following:
A directory for each verifier specified while running oneshot; it contains CSV and HTML files with metric details for each layer output
tensor_inspection – Individual directories for each layer’s output with the following contents:
CDF_plots.png – Golden vs target CDF graph
Diff_plots.png – Golden and target deviation graph
Histograms.png – Golden and target histograms
golden_data.csv – Golden tensor data
target_data.csv – Target tensor data
summary.csv – Report for verification results of each layers output
Wrapper directory containing log.txt with the entire log for the run.
Note: Except wrapper directory all other directories will have a folder called latest which is a symlink to the latest run’s corresponding timestamped directory.
Snapshot of summary.csv file:
Understanding the oneshot-layerwise report:
Column |
Description |
|---|---|
Name |
Output name of the current layer |
Layer Type |
Type of the current layer |
Size |
Size of this layer’s output |
Tensor_dims |
Shape of this layer’s output |
<Verifier name> |
Verifier value of the current layer output compared to reference output |
golden_min |
minimum value in the reference output for current layer |
golden_max |
maximum value in the reference output for current layer |
target_min |
minimum value in the target output for current layer |
target_max |
maximum value in the target output for current layer |
Sample Command for cumulative-layerwise
Command for Cumulative-layerwise using DSP backend:
qnn-accuracy-debugger \
--framework onnx \
--runtime dspv73 \
--model_path /local/mnt/workspace/models/vit/vit_base_16_224.onnx \
--input_tensor "input.1" 1,3,224,224 /local/mnt/workspace/models/vit/000000039769_1_3_224_224.raw \
--output_tensor 1597 \
--architecture x86_64-linux-clang \
--input_list /local/mnt/workspace/models/vit/list.txt \
--default_verifier CosineSimilarity \
--offline_prepare \
--debugging_algorithm cumulative-layerwise
--engine QNN \
--verbose
Command for Cumulative-layerwise using HTP emulation on x86 host:
qnn-accuracy-debugger \
--framework onnx \
--runtime htp \
--model_path /local/mnt/workspace/models/vit/vit_base_16_224.onnx \
--input_tensor "input.1" 1,3,224,224 /local/mnt/workspace/models/vit/000000039769_1_3_224_224.raw \
--output_tensor 1597 \
--architecture x86_64-linux-clang \
--input_list /local/mnt/workspace/models/vit/list.txt \
--default_verifier CosineSimilarity \
--offline_prepare \
--debugging_algorithm cumulative-layerwise
--engine QNN \
--verbose
Output
The program creates output directories framework_runner, cumulative_layerwise_snooping and wrapper directories as below
framework_runner directory contains timestamped directory that contains the intermediate layers outputs stored in .raw format just as mentioned in Framework Runner step.
cumulative_layerwise_snooping directory contains intemediate outputs obtained from inference engine step stored in separate directories with respective layer names. Also it contains final report named cumulative_layerwise.csv which contains verifier scores for each layer. User can identify layers with most deviating scores as problematic nodes.
Wrapper directory consists a log.txt where user can refer entire logs for the whole run.
Understanding the cumulative-layerwise report
At the end of cumulative-layerwise run, the tool generates .csv with below information for each layer
Column |
Description |
|---|---|
O/P Name |
Output name of the current layer. |
Status |
|
Layer Type |
Type of the current layer. |
Shape |
Shape of this layer’s output. |
Activations |
The Min, Max and Median of the outputs at this layer taken from reference execution. |
<Verifier name> |
Absolute verifier value of the current layer compared to reference platform. |
Orig outputs |
Displays the original outputs verifier score observed when the model was run with the current
layer output enabled starting from the last partitioned layer.
|
Info |
Displays information for the output verifiers, if the values are abnormal. |
Command for Layerwise:
qnn-accuracy-debugger \
--framework onnx \
--runtime dspv73 \
--model_path /local/mnt/workspace/models/vit/vit_base_16_224.onnx \
--input_tensor "input.1" 1,3,224,224 /local/mnt/workspace/models/vit/000000039769_1_3_224_224.raw \
--output_tensor 1597 \
--architecture x86_64-linux-clang \
--input_list /local/mnt/workspace/models/vit/list.txt \
--default_verifier CosineSimilarity \
--offline_prepare \
--debugging_algorithm layerwise \
--quantization_overrides /local/mnt/workspace/layer_output_dump/vit_base_16_224.encodings \
--engine QNN \
--verbose
Output
The program creates layerwise_snooping and wrapper output directories as well as framework_runner if a golden reference is not provided (like described for cumulative-layerwise).
layerwise_snooping directory – Contains each single layer model outputs obtained from the inference engine stage stored in separate directories and the final report named layerwise.csv which contains verifier scores for each layer model. Users can identify layers with the most deviating scores as problematic nodes.
wrapper directory – Contains log.txt which stores the full logs for the run.
The output .csv is similar to the cumulative-layerwise output, but the original outputs column will not be present in layerwise snooping, since we are not dealing with final outputs of the model.
Debugging Accuracy issue with Quantized model using Cumulative Layerwise Snooping
With quantized models, it is expected to have some mismatch at most data intensive layers - arising due to quantization error.
The debugger can be used to identify operators which are most sensitive with high verifier score and run those at higher precision to improve overall accuracy.
The sensitivity is determined by the verifier score seen at that layer regarding the reference platform (like ONNXRT).
Note that Cumulative-layerwise debugging takes considerable time as the partitioned model shall be quantized and compiled at every layer that does not have a 100% match with reference.
Below is one strategy to debug larger models:
Run Oneshot-layerwise on the model which helps to identify the starting point of sensitivity in the model.
Run Cumulative-layerwise at different parts of the model using start-layer and end-layer options (if the model has 100 nodes, use start layer at starting node from Oneshot-layerwise run and end layer at the 25th node for run 1, start layer at 26th and end layer at 50th node for run 2, start layer at 51st node and end layer at 75th node for run 3 .. and so on).The final reports of all runs help to identify the most sensitive layers in the model. Let’s say node A,B,C have high verifier scores which indicates high sensitivity
Run the original model with those specific layers (A/B/C - one at a time or combinations) in FP16 and observe the improvement in accuracy.
Debugging Accuracy issue for models exhibiting Accuracy discrepancy between golden reference (for ex. - AIMET/framework runtime output) vs target output using Layerwise Snooping
- One of the popular usecase for layerwise snooping is debugging accuracy difference between AIMET vs target
Though we are creating an exact simulation of hardware using tools like AIMET, still it is expected to have a very minute mismatch due to environment differences. This can be because simulation executes on GPU FP32 kernels and is simulating noise rather than actual execution on integer kernels in the case of hardware execution.
If we have a higher deviation between simulation and hardware, then layerwise snooping could be used to point out to the nodes having higher deviations. The nodes showing higher deviation as per layerwise.csv can be identified as the erroneous nodes.
Other usecases include debugging Framework runtime’s FP32 output vs target INT16 output deviations.
Binary Snooping¶
The binary snooping tool debugs the given ONNX graph in a binary search fashion.
For the graph under analysis, it quantizes half of the graph and lets the other half run in fp16/32. The final model output is used to calculate the subgraph quantization effect. If the subgraph has a high effect(verifier scores greater than 60% of sum of the two subgraphs scores) on the final model output due to quantization, the process repeats until the subgraph size is less than the min_graph_size or the subgraph cannot be divided again. If both subgraphs have similar scores(verifier scores greater than 40% of sum of the two subgraphs scores), both subgraphs are investigated further.
usage
usage: qnn-accuracy-debugger --binary_snooping \
-m MODEL_PATH \
-l INPUT_LIST \
-i INPUT_TENSOR \
-f FRAMEWORK \
-o OUTPUT_TENSOR \
-e ENGINE_NAME \
-qo QUANTIZATION_OVERRIDES \
[--verifier VERIFIER] \
[-a {x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}] \
[--host_device {x86,x86_64-windows-msvc,wos}] \
[-r {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic,htp}] \
[--deviceId DEVICEID] \
[--golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY] \
[--bias_bitwidth BIAS_BITWIDTH] \
[--use_per_channel_quantization USE_PER_CHANNEL_QUANTIZATION] \
[--weights_bitwidth WEIGHTS_BITWIDTH] \
[--act_bitwidth {8,16}] [-fbw {16,32}] \
[-rqs RESTRICT_QUANTIZATION_STEPS] \
[-w WORKING_DIR] \
[--output_dirname OUTPUT_DIRNAME] \
[-p ENGINE_PATH] \
[--min_graph_size MIN_GRAPH_SIZE] \
[--extra_converter_args EXTRA_CONVERTER_ARGS] \
[--act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}] \
[--param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}] \
[--act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}] \
[--param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}] \
[--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE] \
[--param_quantizer {tf,enhanced,adjusted,symmetric}] \
[--act_quantizer {tf,enhanced,adjusted,symmetric}] \
[--per_channel_quantization] \
[--algorithms ALGORITHMS] \
[--verifier_config VERIFIER_CONFIG] \
[--start_layer START_LAYER] \
[--end_layer END_LAYER] [--precision {int8,fp16}] \
[--compiler_config COMPILER_CONFIG] \
[--ignore_encodings] \
[--extra_runtime_args EXTRA_RUNTIME_ARGS] \
[--add_layer_outputs ADD_LAYER_OUTPUTS] \
[--add_layer_types ADD_LAYER_TYPES] \
[--skip_layer_types SKIP_LAYER_TYPES] \
[--skip_layer_outputs SKIP_LAYER_OUTPUTS] \
[--remote_server REMOTE_SERVER] \
[--remote_username REMOTE_USERNAME] \
[--remote_password REMOTE_PASSWORD] [-nif] [-nof]
Sample Commands
Sample command to run binary snooping on mv2 large model
qnn-accuracy-debugger\
--binary_snooping\
--framework onnx\
--model_path models/mv2/mobilenet-v2.onnx\
--architecture aarch64-android\
--input_list models/mv2/inputs/input_list_1.txt\
--input_tensor "input.1" 1,3,224,224 /local/mnt/workspace/harsraj/models/mv2/inputs/data1.raw\
--output_tensor "473"\
--engine_path $QNN_SDK_ROOT\
--working_dir working_directory/QNN/BINARY_MV2_DSP\
--runtime dspv75\
--engine QNN\
--verifier mse\
--extra_converter_args "float_bitwidth=32;preserve_io=layout"\
--quantization_overrides /local/mnt/workspace/harsraj/models/mv2/quantized_encoding.json\
--min_graph_size 16
Outputs The algorithm provides two JSON files:
graph_result.json (for each subgraph) - Contains verifier scores for two child subgraphs; for example 318_473 has child subgraphs 318_392 and 393_473.
subgraph_result.json (for each subgraph) - Contains the corresponding and sorted verifier scores.
Keys in both files look like “subgraph_start_node_activation_name” + _ + “subgraph_end_node_activation_name”.
For example, 318_473 means a subgraph starts at node activation 318 and ends at node activation 473. Only the subgraph from 318 to 473 is quantized while the rest of the model runs in fp16/32.
Debugging accuracy issues with binary snooping results Subgraphs with maximum verifier scores in subgraph_result.json are the culprit subgraphs.
One subgraph can be a subset of another subgraph. In this case prioritize a subgraph size you are comfortable debugging. The details of a subset can be found in graph_result.json.
Quantization Checker¶
The quantization checker analyzes activations, weights, and biases of a given model. It provides:
Comparision between quantized and unquantized weights and biases
Analysis on unquantized weights, biases, and activations
Results in csv, html, or plots
Problematic weights and biases for a given bitwidth quantization
Usage
usage: qnn-accuracy-debugger --quant_checker [-h] \
--model_path \
--input_tensor \
--config_file \
--framework \
--input_list \
--output_tensor \
[--engine_path] \
[--working_dir] \
[--quantization_overrides] \
[--extra_converter_args] \
[--bias_width] \
[--weights_width] \
[--host_device] \
[--deviceId] \
[--generate_csv] \
[--generate_plots] \
[--per_channel_plots] \
[--golden_output_reference_directory] \
[--output_dirname]
[--verbose]
Sample quant_checker_config_file
- {
- “WEIGHT_COMPARISON_ALGORITHMS”: [
{“algo_name”:”minmax”,”threshold”:”10”}, {“algo_name”:”maxdiff”, “threshold”:”10”}, {“algo_name”:”sqnr”, “threshold”:”26”}, {“algo_name”:”stats”, “threshold”:”2”}, {“algo_name”:”data_range_analyzer”}, {“algo_name”:”data_distribution_analyzer”, “threshold”:”0.6”}
], “BIAS_COMPARISON_ALGORITHMS”: [
{“algo_name”:”minmax”, “threshold”:”10”}, {“algo_name”:”maxdiff”, “threshold”:”10”}, {“algo_name”:”sqnr”, “threshold”:”26”}, {“algo_name”:”stats”, “threshold”:”2”}, {“algo_name”:”data_range_analyzer”}, {“algo_name”:”data_distribution_analyzer”, “threshold”:”0.6”}
], “ACT_COMPARISON_ALGORITHMS”: [
{“algo_name”:”minmax”, “threshold”:”10”}, {“algo_name”:”data_range_analyzer”}
], “INPUT_DATA_ANALYSIS_ALGORITHMS”: [{“algo_name”:”stats”, “threshold”:”2”}], “QUANTIZATION_ALGORITHMS”: [“cle”, “None”], “QUANTIZATION_VARIATIONS”: [“tf”, “enhanced”, “symmetric”, “asymmetric”]
}
Output
Output are available in the <working-directory>/results, which looks like: .. container:
.. figure:: /../_static/resources/quant_checker_acc_debug_output_dir_struct.png
Results are provided in:
HTML
CSV
Histogram
A log is provided in the <working-directory>/quant_checker directory.
HTML Each HTML file contains a summary of the results for each quantization option and for each input file provided.
The following example provides additional guidance on the contents of the HTML files.
CSV Results Files
Each CSV file contains detailed computation results for a specific node type (activation/weight/bias) and quantization option. Each row in the csv file displays the op name, node name, passes accuracy (True/False), computation result (accuracy differences), threshold used for each algorithm, and the algorithm name. The format of the computation results (accuracy differences) differs according to the algorithms/metrics used.
The following table provides additional notes about the different algorithms and information in each csv row. .. list-table:
:header-rows: 1
:widths: auto
* - Field
- Comparator
- Information
- Example
* - minmax
- Indicates the difference between the unquantized
minimum and the dequantized minimum value. Correspondingly,
indicates the same difference for the maximum unquantized and
dequantized value.
- computation result: "min: #VALUE max: #VALUE"
* - maxdiff
- Calculates the absolute difference between the
unquantized and dequantized data for all data points and
displays the maximum value of the result.
- computation result: "#VALUE"
* - sqnr
- Calculates the signal to quantization noise ratio
between the two tensors of unquantized and dequantized
data.
- computation result: "#VALUE"
* - data_range_analyzer
- Calculates the difference between the
maximum and minimum values in a tensor and compares that to
the maximum value supported by the bit-width used to
determine if the range of values can be reasonably
represented by the selected quantization bit width.
- computation result: "unique dec places: #INT_VALUE
data range : #VALUE". Information in the computation results
field includes how many unique decimal places we need to
express the unquantized data in quantized format and what is
the actual data range.
* - data_distribution_analyzer
- Calculates the clustering of
the data to find whether a large number of unique unquantized
values are quantized to the same value or not.
- computation result: "Distribution of pixels above
threshold: #VALUE"
* - stats
- Calculates some basic statistics on the received
data such as the min, max, median, variance, standard
deviation, the mode and the skew. The skew is used to
indicate how symmetric the data is.
- computation result: skew: #VALUE min: #VALUE max:
#VALUE median: #VALUE variance: #VALUE stdDev: #VALUE mode:
#VALUE
The following CSV example shows weight data for one of the quantization options.
Separate .csv files are available for activations, weights and biases for each quantization option. The activation related results also include analysis for each input file provided.
Histogram
For each quantization variation and for each weight and bias tensor in the model, we generate historagm. a histogram is generated for each quantization variation and for each weight and bias tensor in the model. The following example illustrates the generated histograms.
- align
left
Logs
The log files contain the following information.
The commands executed as part of the script’s run, including different runs of the snpe-converter tool with different quantization options
Analysis failures for activations, weights, and biases
The following example shows a sample log output.
<====ACTIVATIONS ANALYSIS FAILURES====>
<====ACTIVATIONS ANALYSIS FAILURES====>
Results for the enhanced quantization: | Op Name | Activation Node | Passes Accuracy | Accuracy Difference | Threshold Used | Algorithm Used | | conv_tanh_comp1_conv0 | ReLU_6919 | False | minabs_diff: 0.59 maxabs_diff: 17.16 | 0.05 | minmax |
where,
Op Name : Op name as expressed in corresponding qnn artifacts
Activation Node : Activation node name in the operation
Passes Accuracy : True if the quantized activation (or weight or bias) meets threshold when compared with values from float32 graph; false otherwise
Accuracy Difference : Details about the accuracy per the algorithm used
Threshold Used : The threshold used to influence the result of “Passes Accuracy” column
Algorithm Used : Metric used to compare actual quantized activations/weights/biases against unquantized float data or analyze the quality of unquantized float data. Metrics can be minmax, maxdiff, sqnr, stats, data_range_analyzer, data_distribution_analyzer.
qairt-accuracy-debugger (Beta)¶
Dependencies
The Accuracy Debugger depends on the setup outlined in Setup. In particular, the following are required:
Platform dependencies are need to be met as per Platform Dependencies
The desired ML frameworks need to be installed. Accuracy debugger is verified to work with the ML framework versions mentioned at Environment Setup
Supported models
The qairt-accuracy-debugger currently supports ONNX, Tensorflow and TFLite models. Note that Pytorch models support is limited to oneshot snooping feature of this tool.
Overview
The Accuracy Debugger tool finds inaccuracies in a neural-network at the layer level. Primarily functionality of this tool is to compare the golden outputs produced by running a model through a specific ML framework (i.e., Tensorflow, Onnx, TFlite) with the results produced by running the same model on Target devices (CPU, GPU, DSP etc.,).
The following features are available in Accuracy Debugger. Each feature can be run with its corresponding option; for example, qairt-accuracy-debugger --{option}.
qairt-accuracy-debugger -–framework_runner uses an ML framework e.g. Tensorflow, TFlite or Onnx, to run the model to get intermediate outputs.
qairt-accuracy-debugger –-inference_engine uses inference engine to run a model on the target device to retrieve intermediate outputs.
qairt-accuracy-debugger –-verification compares the output generated by the framework runner and inference engine features using verifiers such as CosineSimilarity, RtolAtol, etc.
qairt-accuracy-debugger –compare_encodings compares target encodings with the AIMET encodings, and outputs an Excel sheet highlighting mismatches.
qairt-accuracy-debugger –tensor_inspection compares given target outputs with golden outputs.
qairt-accuracy-debugger –snooping runs chosen snooping algorithm to investigate accuracy issues.
- Tip:
You can use –help with feature name to see what other options (required or optional) you can add.
Below are the instructions for running various features available in Accuracy Debugger:
Framework Runner¶
The Framework Runner feature is designed to run models with different machine learning frameworks (e.g. Tensorflow, Onnx, TFLite etc). A given model is run with a specific ML framework. Golden outputs are produced for future comparison with inference results from the Inference Engine step.
Usage
usage: qairt-accuracy-debugger --framework_runner [-h]
-m MODEL_PATH -i INPUT_TENSOR
[INPUT_TENSOR ...] -o OUTPUT_TENSOR
[-w WORKING_DIR]
[--output_dirname OUTPUT_DIRNAME]
[--args_config ARGS_CONFIG] [-v]
[--disable_graph_optimization]
[--onnx_custom_op_lib ONNX_CUSTOM_OP_LIB]
[--add_layer_outputs ADD_LAYER_OUTPUTS]
[--add_layer_types ADD_LAYER_TYPES]
[--skip_layer_types SKIP_LAYER_TYPES]
[--skip_layer_outputs SKIP_LAYER_OUTPUTS]
[--start_layer START_LAYER]
[--end_layer END_LAYER]
[-f FRAMEWORK [FRAMEWORK ...]]
Script to generate intermediate tensors from an ML Framework.
options:
-h, --help show this help message and exit
required arguments:
-m MODEL_PATH, --model_path MODEL_PATH
Path to the model file(s).
-i INPUT_TENSOR [INPUT_TENSOR ...], --input_tensor INPUT_TENSOR [INPUT_TENSOR ...]
The name, dimensions, raw data, and optionally data
type of the network input tensor(s) specifiedin the
format "input_name" comma-separated-dimensions path-
to-raw-file, for example: "data" 1,224,224,3 data.raw
float32. Note that the quotes should always be
included in order to handle special characters,
spaces, etc. For multiple inputs specify multiple
--input_tensor on the command line like:
--input_tensor "data1" 1,224,224,3 data1.raw
--input_tensor "data2" 1,50,100,3 data2.raw float32.
-o OUTPUT_TENSOR, --output_tensor OUTPUT_TENSOR
Name of the graph's specified output tensor(s).
optional arguments:
-w WORKING_DIR, --working_dir WORKING_DIR
Working directory for the framework_runner to store
temporary files. Creates a new directory if the
specified working directory does not exist
--output_dirname OUTPUT_DIRNAME
output directory name for the framework_runner to
store temporary files under
<working_dir>/framework_runner. Creates a new
directory if the specified working directory does not
exist
--args_config ARGS_CONFIG
Path to a config file with arguments. This can be used
to feed arguments to the AccuracyDebugger as an
alternative to supplying them on the command line.
-v, --verbose Verbose printing
--disable_graph_optimization
Disables basic model optimization
--onnx_custom_op_lib ONNX_CUSTOM_OP_LIB
path to onnx custom operator library
--add_layer_outputs ADD_LAYER_OUTPUTS
Output layers to be dumped. example:1579,232
--add_layer_types ADD_LAYER_TYPES
outputs of layer types to be dumped. e.g
:Resize,Transpose. All enabled by default.
--skip_layer_types SKIP_LAYER_TYPES
comma delimited layer types to skip snooping. e.g
:Resize, Transpose
--skip_layer_outputs SKIP_LAYER_OUTPUTS
comma delimited layer output names to skip debugging.
e.g :1171, 1174
--start_layer START_LAYER
save all intermediate layer outputs from provided
start layer to bottom layer of model
--end_layer END_LAYER
save all intermediate layer outputs from top layer to
provided end layer of model
-f FRAMEWORK [FRAMEWORK ...], --framework FRAMEWORK [FRAMEWORK ...]
Framework type and version, version is optional.
Currently supported frameworks are [tensorflow,
tflite, onnx, pytorch]. For example, tensorflow 2.10.1
Please note: All the command line arguments should either be provided through command line or through the config file. They will not override those in the config file if there is an overlap.
Sample Commands
# Tensorflow model example:
qairt-accuracy-debugger \
--framework_runner \
--framework tensorflow \
--model_path InceptionV3Model/inception_v3_2016_08_28_frozen.pb \
--input_tensor "input:0" 1,299,299,3 InceptionV3Model/data/chairs.raw \
--output_tensor InceptionV3/Predictions/Reshape_1:0
# Onnx model example:
qairt-accuracy-debugger \
--framework_runner \
--framework onnx \
--model_path dlv3onnx/dlv3plus_mbnet_513-513_op9_mod_basic.onnx \
--input_tensor Input 1,3,513,513 dlv3onnx/data/00000_1_3_513_513.raw \
--output_tensor Output
# Example to run model with custom operator:
qairt-accuracy-debugger \
--framework_runner \
--framework onnx \
-input_tensor "image" 1,3,640,640 yolov3/batched-inp-107-0.raw \
--model_path yolov3/yolov3_640_640_with_abp_qnms.onnx \
--output_tensor detection_boxes \
--onnx_custom_op_lib libCustomQnmsYoloOrt.so
- TIP:
a working_directory, if not otherwise specified, is generated from wherever you are calling the script
for tensorflow it is sometimes necessary to add the “:0” after the input and output node name to signify the index of the node. Note that “:0” is not required for Onnx models.
Outputs
Once the Framework Runner has finished running, it will store the outputs in the specified working directory. By default, it will store the output in working_directory/framework_runner in the current working directory. Creates a directory named latest in working_directory/framework_runner which is symbolically linked to the most recent run YYYY-MM-DD_HH:mm:ss. Users may choose to override the directory name by passing it to –output_dirname (i.e. –output_dirname myTest1). The following figure shows a sample output folder from a Framework Runner run using an Onnx model.
working_directory/framework_runner/latest contains the outputs of each layer in the model saved as .raw files. Every raw file that can be seen is the output of an operation in the model. The file framework_runner_options.json contains all the options used to run this feature.
The intermediate outputs produced by the Framework Runner step offers precise reference/golden material for the Verification component to diagnose the accuracy of the network outputs generated by the Inference Engine.
Inference Engine¶
The Inference Engine feature is designed to dump intermediate outputs of the model when run on target devices like CPU, DSP, GPU etc.,. The output produced by this step can be compared with the golden outputs produced by the framework runner step.
Usage
usage: qairt-accuracy-debugger --inference_engine [-h]
-r {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic}
-a
{x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}
-l INPUT_LIST [--input_network INPUT_NETWORK]
[--desired_input_shape DESIRED_INPUT_SHAPE [DESIRED_INPUT_SHAPE ...]]
[--out_tensor_node OUT_TENSOR_NODE]
[--io_config IO_CONFIG]
[-qo QUANTIZATION_OVERRIDES]
[--converter_float_bitwidth {32,16}]
[--extra_converter_args EXTRA_CONVERTER_ARGS]
[--input_dlc INPUT_DLC]
[--calibration_input_list CALIBRATION_INPUT_LIST]
[-bbw {8,32}] [-abw {8,16}] [-wbw {8,4}]
[--quantizer_float_bitwidth {32,16}]
[--act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}]
[--param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}]
[--act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}]
[--param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}]
[--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE]
[--use_per_channel_quantization]
[--use_per_row_quantization] [--float_fallback]
[--extra_quantizer_args EXTRA_QUANTIZER_ARGS]
[--perf_profile {low_balanced,balanced,default,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}]
[--profiling_level PROFILING_LEVEL]
[--userlogs {warn,verbose,info,error,fatal}]
[--log_level {error,warn,info,debug,verbose}]
[--extra_runtime_args EXTRA_RUNTIME_ARGS]
[--executor_type {qnn,snpe}]
[--stage {source,converted,quantized}]
[-p ENGINE_PATH] [--deviceId DEVICEID] [-v]
[--host_device {x86,x86_64-windows-msvc,wos}]
[-w WORKING_DIR]
[--output_dirname OUTPUT_DIRNAME]
[--debug_mode_off] [--args_config ARGS_CONFIG]
[--remote_server REMOTE_SERVER]
[--remote_username REMOTE_USERNAME]
[--remote_password REMOTE_PASSWORD]
[--golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY]
[--disable_offline_prepare]
[--backend_extension_config BACKEND_EXTENSION_CONFIG]
[--context_config_params CONTEXT_CONFIG_PARAMS]
[--graph_config_params GRAPH_CONFIG_PARAMS]
[--extra_contextbin_args EXTRA_CONTEXTBIN_ARGS]
[--start_layer START_LAYER]
[--end_layer END_LAYER]
[--add_layer_outputs ADD_LAYER_OUTPUTS]
[--add_layer_types ADD_LAYER_TYPES]
[--skip_layer_types SKIP_LAYER_TYPES]
[--skip_layer_outputs SKIP_LAYER_OUTPUTS]
Script to run inference engine.
options:
-h, --help show this help message and exit
Required Arguments:
-r {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic}, --runtime {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic}
Runtime to be used. Note: In case of SNPE
execution(--executor_type snpe), aic runtime is not
supported.
-a {x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}, --architecture {x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}
Name of the architecture to use for inference engine.
Note: In case of SNPE execution(--executor_type snpe),
aarch64-qnx architecture is not supported.
-l INPUT_LIST, --input_list INPUT_LIST
Path to the input list text file to run inference(used
with net-run). Note: When having multiple entries in
text file, in order to save memory and time, you can
pass --debug_mode_off to skip intermediate outputs
dump.
QAIRT Converter Arguments:
--input_network INPUT_NETWORK, --model_path INPUT_NETWORK
Path to the model file(s). This argument is mandatory
when --stage is source(which is default).
--desired_input_shape DESIRED_INPUT_SHAPE [DESIRED_INPUT_SHAPE ...], --input_tensor DESIRED_INPUT_SHAPE [DESIRED_INPUT_SHAPE ...]
The name and dimension of all the input buffers to the
network specified in the format [input_name comma-
separated-dimensions sample-data data-type] Note:
sample-data and data-type are optional for example:
'data' 1,224,224,3. Note that the quotes should always
be included in order to handle special characters,
spaces, etc. For multiple inputs, specify multiple
--desired_input_shape on the command line like:
--desired_input_shape "data1" 1,224,224,3 sample1.raw
float32 --desired_input_shape "data2" 1,50,100,3
sample2.raw int64 NOTE: Required for TensorFlow and
PyTorch. Optional for Onnx and Tflite. In case of
Onnx, this feature works only with Onnx 1.6.0 and
above.
--out_tensor_node OUT_TENSOR_NODE, --output_tensor OUT_TENSOR_NODE
Name of the graph's output Tensor Names. Multiple
output names should be provided separately like:
--out_tensor_node out_1 --out_tensor_node out_2 NOTE:
Required for TensorFlow. Optional for Onnx, Tflite and
PyTorch
--io_config IO_CONFIG
Use this option to specify a yaml file for input and
output options.
-qo QUANTIZATION_OVERRIDES, --quantization_overrides QUANTIZATION_OVERRIDES
Path to quantization overrides json file.
--converter_float_bitwidth {32,16}
Use this option to convert the graph to the specified
float bitwidth, either 32 (default) or 16. Note:
Cannot be used with --calibration_input_list and
--quantization_overrides
--extra_converter_args EXTRA_CONVERTER_ARGS
additional converter arguments in a quoted string.
example: --extra_converter_args
'arg1=value1;arg2=value2'
QAIRT Quantizer Arguments:
--input_dlc INPUT_DLC
Path to the dlc container containing the model for
which fixed-point encoding metadata should be
generated. This argument is mandatory when --stage is
either converted or quantized.
--calibration_input_list CALIBRATION_INPUT_LIST
Path to the inputs list text file to run
quantization(used with qairt-quantizer)
-bbw {8,32}, --bias_bitwidth {8,32}
option to select the bitwidth to use when quantizing
the bias. default 8
-abw {8,16}, --act_bitwidth {8,16}
option to select the bitwidth to use when quantizing
the activations. default 8
-wbw {8,4}, --weights_bitwidth {8,4}
option to select the bitwidth to use when quantizing
the weights. default 8
--quantizer_float_bitwidth {32,16}
Use this option to select the bitwidth to use for
float tensors, either 32 (default) or 16.
--act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}
Specify which quantization calibration method to use
for activations supported values: min-max (default),
sqnr, entropy, mse, percentile This option can be
paired with --act_quantizer_schema to override the
quantization schema to use for activations otherwise
default schema(asymmetric) will be used
--param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}
Specify which quantization calibration method to use
for parameters supported values: min-max (default),
sqnr, entropy, mse, percentile This option can be
paired with --act_quantizer_schema to override the
quantization schema to use for activations otherwise
default schema(asymmetric) will be used
--act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}
Specify which quantization schema to use for
activations. Note: Default is asymmetric.
--param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}
Specify which quantization schema to use for
parameters. Note: Default is asymmetric.
--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE
Value must lie between 90 and 100. Default is 99.99
--use_per_channel_quantization
Use per-channel quantization for convolution-based op
weights. Note: This will replace built-in model QAT
encodings when used for a given weight.
--use_per_row_quantization
Use this option to enable rowwise quantization of
Matmul and FullyConnected ops.
--float_fallback Use this option to enable fallback to floating point
(FP) instead of fixed point. This option can be paired
with --quantizer_float_bitwidth to indicate the
bitwidth for FP (by default 32). If this option is
enabled, then input list must not be provided and
--ignore_encodings must not be provided. The external
quantization encodings (encoding file/FakeQuant
encodings) might be missing quantization parameters
for some interim tensors. First it will try to fill
the gaps by propagating across math-invariant
functions. If the quantization params are still
missing, then it will apply fallback to nodes to
floating point.
--extra_quantizer_args EXTRA_QUANTIZER_ARGS
additional quantizer arguments in a quoted string.
example: --extra_quantizer_args
'arg1=value1;arg2=value2'
Net-run Arguments:
--perf_profile {low_balanced,balanced,default,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}
Specifies perf profile to set. Valid settings are
"low_balanced" , "balanced" , "default",
"high_performance" ,"sustained_high_performance",
"burst", "low_power_saver", "power_saver",
"high_power_saver", "extreme_power_saver", and
"system_settings". Note: perf_profile argument is now
deprecated for HTP backend, user can specify
performance profile through backend extension config
now.
--profiling_level PROFILING_LEVEL
Enables profiling and sets its level. For QNN
executor, valid settings are "basic", "detailed" and
"client" For SNPE executor, valid settings are "off",
"basic", "moderate", "detailed", and "linting".
Default is detailed.
--userlogs {warn,verbose,info,error,fatal}
Enable verbose logging. Note: This argument is
applicable only when --executor_type snpe
--log_level {error,warn,info,debug,verbose}
Enable verbose logging. Note: This argument is
applicable only when --executor_type qnn
--extra_runtime_args EXTRA_RUNTIME_ARGS
additional net runner arguments in a quoted string.
example: --extra_runtime_args
'arg1=value1;arg2=value2'
Other optional Arguments:
--executor_type {qnn,snpe}
Choose between qnn(qnn-net-run) and snpe(snpe-net-run)
execution. If not provided, qnn-net-run will be
executed for QAIRT or QNN SDK, or else snpe-net-run
will be executed for SNPE SDK.
--stage {source,converted,quantized}
Specifies the starting stage in the Accuracy Debugger
pipeline. source: starting with a source framework
model, converted: starting with a converted model,
quantized: starting with a quantized model. Default is
source.
-p ENGINE_PATH, --engine_path ENGINE_PATH
Path to SDK folder.
--deviceId DEVICEID The serial number of the device to use. If not passed,
the first in a list of queried devices will be used
for validation.
-v, --verbose Set verbose logging at debugger tool level
--host_device {x86,x86_64-windows-msvc,wos}
The device that will be running conversion. Set to x86
by default.
-w WORKING_DIR, --working_dir WORKING_DIR
Working directory for the inference_engine to store
temporary files. Creates a new directory if the
specified working directory does not exist
--output_dirname OUTPUT_DIRNAME
output directory name for the inference_engine to
store temporary files under
<working_dir>/inference_engine .Creates a new
directory if the specified working directory does not
exist
--debug_mode_off This option can be used to avoid dumping intermediate
outputs.
--args_config ARGS_CONFIG
Path to a config file with arguments. This can be used
to feed arguments to the AccuracyDebugger as an
alternative to supplying them on the command line.
--remote_server REMOTE_SERVER
ip address of remote machine
--remote_username REMOTE_USERNAME
username of remote machine
--remote_password REMOTE_PASSWORD
password of remote machine
--golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY, --golden_dir_for_mapping GOLDEN_OUTPUT_REFERENCE_DIRECTORY
Optional parameter to indicate the directory of the
goldens, it's used for tensor mapping without running
model with framework runtime.
--disable_offline_prepare
Use this option to disable offline preparation. Note:
By default offline preparation will be done for
DSP/HTP runtimes.
--backend_extension_config BACKEND_EXTENSION_CONFIG
Path to config to be used with qnn-context-binary-
generator. Note: This argument is applicable only when
--executor_type qnn
--context_config_params CONTEXT_CONFIG_PARAMS
optional context config params in a quoted string.
example: --context_config_params
'context_priority=high;
cache_compatibility_mode=strict' Note: This argument
is applicable only when --executor_type qnn
--graph_config_params GRAPH_CONFIG_PARAMS
optional graph config params in a quoted string.
example: --graph_config_params 'graph_priority=low;
graph_profiling_num_executions=10'
--extra_contextbin_args EXTRA_CONTEXTBIN_ARGS
Additional context binary generator arguments in a
quoted string(applicable only when --executor_type
qnn). example: --extra_contextbin_args
'arg1=value1;arg2=value2'
--start_layer START_LAYER
save all intermediate layer outputs from provided
start layer to bottom layer of model. Can be used in
conjunction with --end_layer.
--end_layer END_LAYER
save all intermediate layer outputs from top layer to
provided end layer of model. Can be used in
conjunction with --start_layer.
--add_layer_outputs ADD_LAYER_OUTPUTS
Output layers to be dumped. e.g: node1,node2
--add_layer_types ADD_LAYER_TYPES
outputs of layer types to be dumped. e.g
:Resize,Transpose. All enabled by default.
--skip_layer_types SKIP_LAYER_TYPES
comma delimited layer types to skip dumping. e.g
:Resize,Transpose
--skip_layer_outputs SKIP_LAYER_OUTPUTS
comma delimited layer output names to skip dumping.
e.g: node1,node2
Please note: All the command line arguments should either be provided through command line or through the config file. They will not override those in the config file if there is an overlap.
Sample Commands
# Example for running on Linux host's CPU by passing quantization encodings
qairt-accuracy-debugger \
--inference_engine \
--runtime cpu \
--architecture x86_64-linux-clang \
--model_path model.onnx \
--input_list InceptionV3Model/data/image_list.txt \
--quantization_overrides InceptionV3Model/data/AIMET_quantization_encodings.json
# Example for running on Linux host's CPU without quantization encodings
qairt-accuracy-debugger \
--inference_engine \
--runtime cpu \
--architecture x86_64-linux-clang \
--model_path model.onnx \
--input_list InceptionV3Model/data/image_list.txt \
--calibration_input_list InceptionV3Model/data/calibration_list.txt
--param_quantizer_schema symmetric \
--act_quantizer_schema asymmetric \
--param_quantizer_calibration sqnr \
--act_quantizer_calibration percentile \
--percentile_calibration_value 99.995 \
--bias_bitwidth 32
# Example for running on Android DSP target
qairt-accuracy-debugger \
--inference_engine \
--runtime dspv75 \
--architecture aarch64-android \
--deviceId 357415c4 \
--model_path model.onnx \
--input_list InceptionV3Model/data/image_list.txt \
--quantization_overrides InceptionV3Model/data/AIMET_quantization_encodings.json
# Example for running on Android GPU target with fp32 precision
qairt-accuracy-debugger \
--inference_engine \
--runtime gpu \
--architecture aarch64-android \
--framework tensorflow \
--model_path InceptionV3Model/inception_v3_2016_08_28_frozen.pb \
--input_tensor "input:0" 1,299,299,3 InceptionV3Model/data/chairs.raw \
--output_tensor InceptionV3/Predictions/Reshape_1 \
--input_list InceptionV3Model/data/image_list.txt \
--converter_float_bitwidth 32
# Example for running on Android GPU target with fp16 precision
qairt-accuracy-debugger \
--inference_engine \
--runtime gpu \
--architecture aarch64-android \
--framework tensorflow \
--model_path InceptionV3Model/inception_v3_2016_08_28_frozen.pb \
--input_tensor "input:0" 1,299,299,3 InceptionV3Model/data/chairs.raw \
--output_tensor InceptionV3/Predictions/Reshape_1 \
--input_list InceptionV3Model/data/image_list.txt \
--converter_float_bitwidth 16
# Example for running on DSP of "Windows on Snapdragon" machine
qairt-accuracy-debugger \
--inference_engine \
--runtime dspv75 \
--architecture wos \
--host_device wos \
--model_path model.onnx \
--input_list InceptionV3Model\data\image_list.txt \
--quantization_overrides InceptionV3Model/data/AIMET_quantization_encodings.json
# Example for running on Windows native
qairt-accuracy-debugger \
--inference_engine \
--runtime cpu \
--architecture x86_64-windows-msvc \
--host_device x86_64-windows-msvc \
--model_path model.onnx \
--input_list InceptionV3Model\data\image_list.txt \
--quantization_overrides InceptionV3Model/data/AIMET_quantization_encodings.json
- Tip:
Although tool can quantize the given model using data provided through –calibration_input_list argument, it is recommended to pass quantization encodings through –quantization_overrides argument to speed-up the execution
–input_tensor and –output_tensor arguments are mandatory for Tensorflow and TFlite models but they does not need to have indexing information (“:0”) unlike framework runner
Before running the qairt-accuracy-debugger on a Windows x86 system/Windows on Snapdragon system, ensure that you have configured the environment. Specify the host and target machine as x86_64-windows-msvc/wos respectively
Note that qairt-accuracy-debugger on Windows x86 system is tested only for CPU runtime currently
More example commands with different configurations:
Sample Commands
# source stage: same as examples from above section (default for stage is "source")
# Running from converted stage (Android DSP):
qairt-accuracy-debugger \
--inference_engine \
--stage converted \
--input_dlc converted_model.dlc \
--runtime dspv75 \
--deviceId f366ce60 \
--architecture aarch64-android \
--input_list InceptionV3Model/data/image_list.txt \
--quantization_overrides InceptionV3Model/data/AIMET_quantization_encodings.json
# Running from quantized stage (x86 CPU):
qairt-accuracy-debugger \
--inference_engine \
--stage quantized \
--input_dlc quantized_model.dlc \
--runtime cpu \
--architecture x86_64-linux-clang \
--input_list InceptionV3Model/data/image_list.txt \
--quantization_overrides InceptionV3Model/data/AIMET_quantization_encodings.json
# Running with --extra_converter_args argument for enabling preserve_io and passing onnx symbols (Android DSP):
qairt-accuracy-debugger \
--inference_engine \
--runtime dspv75
--architecture aarch64_android \
--model_path model.onnx \
--input_list InceptionV3Model/data/image_list.txt \
--quantization_overrides InceptionV3Model/data/AIMET_quantization_encodings.json \
--extra_converter_args 'onnx_define_symbol seq_length=384;onnx_define_symbol batch_size=1'
# Run onnx model with custom operator (Android DSP):
qairt-accuracy-debugger \
--inference_engine \
--runtime dspv75
--architecture aarch64_android \
--model_path model.onnx \
--input_list InceptionV3Model/data/image_list.txt \
--quantization_overrides InceptionV3Model/data/AIMET_quantization_encodings.json \
--executor_type qnn \
--extra_converter_args 'op_package_config=CustomPreTopKOpPackageCPU_v2.xml;converter_op_package_lib=libCustomPreTopKOpPackageHtp.so:CustomPreTopKOpPackageHtpInterfaceProvider:' \
--extra_contextbin_args 'op_packages=libQnnCustomPreTopKOpPackageHtp.so:CustomPreTopKOpPackageHtpInterfaceProvider:' \
--extra_runtime_args 'op_packages=libQnnCustomPreTopKOpPackageHtp_v75.so:CustomPreTopKOpPackageHtpInterfaceProvider'
Outputs
Once the Inference Engine has finished running, it will store the outputs in the specified working directory. By default, it will store the output in working_directory/inference_engine in the current working directory. Creates a directory named latest in working_directory/inference_engine which is symbolically linked to the most recent run YYYY-MM-DD_HH:mm:ss. Users may choose to override the directory name by passing it to –output_dirname (i.e. –output_dirname myTest1). The following figure shows a sample output folder from a Inference Engine run.
The “output” directory contains raw files. Each raw file is an output of an operation in the network. In addition to generating the .raw files, the inference_engine also generates the model’s graph structure in a .json file. The name of the file is the same as the name of the protobuf model file. The model_graph_struct.json aids in providing structure related information of the converted model graph during the verification step. Specifically, it helps with organizing the nodes in order (for i.e. the beginning nodes should come earlier than ending nodes).
The inference_engine_options.json file contains all the options with which the run was launched. The base_quantized_encoding.json contains quantization encodings used by the model.
Finally, the tensor_mapping file contains a mapping of the various intermediate output file names generated from the framework runner step and the inference engine step.
Verification¶
The Verification step compares the output (from the intermediate tensors of a given model) produced by the framework runner step with the output produced by the inference engine step. Once the comparison is complete, the verification results are compiled and displayed visually in a format that can be easily interpreted by the user.
There are different types of verifiers for e.g.: CosineSimilarity, RtolAtol, etc. To see available verifiers please use the –help option (qairt-accuracy-debugger –verification –help). Each verifier compares the Framework Runner and Inference Engine output using an error metric. It also prepares reports and/or visualizations to help the user analyze the network’s error data.
Usage
usage: qairt-accuracy-debugger --verification [-h]
--default_verifier DEFAULT_VERIFIER
[DEFAULT_VERIFIER ...]
--golden_output_reference_directory
GOLDEN_OUTPUT_REFERENCE_DIRECTORY
--inference_results INFERENCE_RESULTS
[--tensor_mapping TENSOR_MAPPING]
[--dlc_path DLC_PATH]
[--verifier_config VERIFIER_CONFIG]
[--graph_struct GRAPH_STRUCT] [-v]
[-w WORKING_DIR]
[--output_dirname OUTPUT_DIRNAME]
[--args_config ARGS_CONFIG]
[--target_encodings TARGET_ENCODINGS]
[-e ENGINE [ENGINE ...]]
Script to run verification.
required arguments:
--default_verifier DEFAULT_VERIFIER [DEFAULT_VERIFIER ...]
Default verifier used for verification. The options
"RtolAtol", "AdjustedRtolAtol", "TopK", "L1Error",
"CosineSimilarity", "MSE", "MAE", "SQNR", "ScaledDiff"
are supported. An optional list of hyperparameters can
be appended. For example: --default_verifier
rtolatol,rtolmargin,0.01,atolmargin,0.01 An optional
list of placeholders can be appended. For example:
--default_verifier CosineSimilarity param1 1 param2 2.
to use multiple verifiers, add additional
--default_verifier CosineSimilarity
--golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY, --framework_results GOLDEN_OUTPUT_REFERENCE_DIRECTORY
Path to root directory of golden output files. Paths
may be absolute, or relative to the working directory.
--inference_results INFERENCE_RESULTS
Path to root directory generated from inference engine
diagnosis. Paths may be absolute, or relative to the
working directory.
optional arguments:
--tensor_mapping TENSOR_MAPPING
Path to the file describing the tensor name mapping
between inference and golden tensors.
--dlc_path DLC_PATH Path to the dlc file, used for transforming axis of
golden outputs w.r.t to target outputs. Note:
Applicable for QAIRT/SNPE
--verifier_config VERIFIER_CONFIG
Path to the verifiers' config file
--graph_struct GRAPH_STRUCT
Path to the inference graph structure .json file. This
file aids in providing structure related information
of the converted model graph during this stage.Note:
This file is mandatory when using ScaledDiff verifier
-v, --verbose Verbose printing
-w WORKING_DIR, --working_dir WORKING_DIR
Working directory for the verification to store
temporary files. Creates a new directory if the
specified working directory does not exist
--output_dirname OUTPUT_DIRNAME
output directory name for the verification to store
temporary files under <working_dir>/verification.
Creates a new directory if the specified working
directory does not exist
--args_config ARGS_CONFIG
Path to a config file with arguments. This can be used
to feed arguments to the AccuracyDebugger as an
alternative to supplying them on the command line.
--target_encodings TARGET_ENCODINGS
Path to target encodings json file.
Arguments for generating Tensor mapping (required when --tensor_mapping is not specified):
-e ENGINE [ENGINE ...], --engine ENGINE [ENGINE ...]
Name of engine(qnn/snpe) that is used for running
inference.
Please note: All the command line arguments should either be provided through command line or through the config file. They will not override those in the config file if there is overlap.
Note
The standalone verification process run using qairt-accuracy-debugger –verification optionally uses –tensor_mapping and –graph_struct to find files to compare. These files are generated by the inference engine step, and should be supplied to verification for best results. By default they are named tensor_mapping.json and {model name}_graph_struct.json, and can be found in the output directory of the inference engine results.
Sample Commands
# Compare output of framework runner with inference engine
qairt-accuracy-debugger \
--verification \
--default_verifier CosineSimilarity param1 1 param2 2 \
--default_verifier SQNR param1 5 param2 1 \
--golden_output_reference_directory working_directory/framework_runner/latest/ \
--inference_results working_directory/inference_engine/latest/output/Result_0/ \
--tensor_mapping working_directory/inference_engine/latest/tensor_mapping.json \
--graph_struct working_directory/inference_engine/latest/qnn_model_graph_struct.json
- Tip:
If you passed multiple images in the image_list.txt from run inference engine diagnosis, you’ll receive multiple output/Result_x. Choose the result that matches the input you used for framework runner for comparison (i.e., in framework you used chair.raw and inference chair.raw was the first item in the image_list.txt then choose output/Result_0 if chair.raw was the second item in image_list.txt, then choose output/Result_1).
It is recommended to always supply ‘graph_struct’ and ‘tensor_mapping’ to the command as it is used to line up the report and find the corresponding files for comparison. If tensor_mapping did not get generated from previous steps, you can supplement with ‘model_path’, ‘engine’, ‘framework’ to have the module generate ‘tensor_mapping’ during runtime.
If both targets and golden outputs are to be exact-name-matching, then you do not need to provide a tensor_mapping file.
Verifier Config:
The verifier config file is a JSON file that tells verification which verifiers (aside from the default verifier) to use and with which parameters and on what specific tensors. If no config file is provided, the tool will only use the default verifier specified from the command line, with its default parameters, on all the tensors. The JSON file is keyed by verifier names, with each verifier as its own dictionary keyed by “parameters” and “tensors”.
Config File
```json
{
"MeanIOU": {
"parameters": {
"background_classification": 1.0
},
"tensors": [["Postprocessor/BatchMultiClassNonMaxSuppression_boxes", "detection_classes:0"]]
},
"TopK": {
"parameters": {
"k": 5,
"ordered": false
},
"tensors": [["Reshape_1:0"], ["detection_classes:0"]]
}
}
```
Note that the “tensors” field is a list of lists. This is done because specific verifiers run on two tensors at a time. Hence the two tensors are placed in a list. Otherwise if a verifier only runs on one tensor, it will have a list of lists with only one tensor name in each list. MeanIOU is not supported as a verifer in Debugger.
Tensor Mapping:
Tensor mapping is a JSON file keyed by inference tensor names, of framework tensor names. If the tensor mapping is not provided, the tool will assume inference and golden tensor names are identical.
Tensor Mapping File
```json
{
"Postprocessor/BatchMultiClassNonMaxSuppression_boxes": "detection_boxes:0",
"Postprocessor/BatchMultiClassNonMaxSuppression_scores": "detection_scores:0"
}
```
Outputs
Once the Verification has finished running, it will store the outputs in the specified working directory. By default, it will store the output in working_directory/verification in the current working directory. Creates a directory named latest in working_directory/verification which is symbolically linked to the most recent run YYYY-MM-DD_HH:mm:ss. Users may choose to override the directory name by passing it to –output_dirname (i.e. –output_dirname myTest1). The following figure shows a sample output folder from a Verification run.
Verification’s output is divided into different verifiers. For example, if both mse and sqnr verifiers are used, there will be two sub-directories named “mse” and “sqnr”. Under each sub-directory, for each tensor, a CSV and HTML file is generated.
In addition to the tensor-specific analysis, the tool also generates a summary CSV and HTML file which summarizes the data from all verifiers and their subsequent tensors. The following figure shows how a sample summary generated in the verification step looks. Each row in this summary corresponds to one tensor name that is identified by the framework runner and inference engine steps. The final column shows cosinesimilarity score which can vary between 0 to 1 (this range might be different for other verifiers). Higher scores denote similarity while lower scores indicate variance. The developer can then further investigate those specific tensor details. The developer should inspect tensors from top-to-bottom order, meaning if a tensor is broken at an earlier node, anything that was generated post that node is unreliable until that node is properly fixed.
Compare Encodings¶
The Compare Encodings feature is designed to compare Target and AIMET encodings. This feature takes Target DLC and AIMET encoding JSON file as inputs. This feature executes in the following order.
Extracts encodings from the given DLC file
Compares extracted DLC encodings with given AIMET encodings
Writes results to an Excel file that highlights mismatches
Throws warnings if some encodings are present in DLC but not in AIMET and vice-versa
Writes the extracted DLC encodings JSON file (for reference)
Usage
usage: qairt-accuracy-debugger --compare_encodings [-h]
--input INPUT
--aimet_encodings_json AIMET_ENCODINGS_JSON
[--precision PRECISION]
[--params_only]
[--activations_only]
[--specific_node SPECIFIC_NODE]
[--working_dir WORKING_DIR]
[--output_dirname OUTPUT_DIRNAME]
[-v]
Script to compare DLC encodings with AIMET encodings
optional arguments:
-h, --help Show this help message and exit
required arguments:
--input INPUT
Path to DLC file
--aimet_encodings_json AIMET_ENCODINGS_JSON
Path to AIMET encodings JSON file
optional arguments:
--precision PRECISION
Number of decimal places up to which comparison will be done (default: 17)
--params_only Compare only parameters in the encodings
--activations_only Compare only activations in the encodings
--specific_node SPECIFIC_NODE
Display encoding differences for the given node
--working_dir WORKING_DIR
Working directory for the compare_encodings to store temporary files.
Creates a new directory if the specified working directory does not exist.
--output_dirname OUTPUT_DIRNAME
Output directory name for the compare_encodings to store temporary files
under <working_dir>/compare_encodings. Creates a new directory if the
specified working directory does not exist.
-v, --verbose Verbose printing
Sample Commands
# Compare both params and activations
qairt-accuracy-debugger \
--compare_encodings \
--input quantized_model.dlc \
--aimet_encodings_json aimet_encodings.json
# Compare only params
qairt-accuracy-debugger \
--compare_encodings \
--input quantized_model.dlc \
--aimet_encodings_json aimet_encodings.json \
--params_only
# Compare only activations
qairt-accuracy-debugger \
--compare_encodings \
--input quantized_model.dlc \
--aimet_encodings_json aimet_encodings.json \
--activations_only
# Compare only a specific encoding
qairt-accuracy-debugger \
--compare_encodings \
--input quantized_model.dlc \
--aimet_encodings_json aimet_encodings.json \
--specific_node _2_22_Conv_output_0
Tip
A working_directory is generated from wherever this script is called from unless otherwise specified.
Outputs
Once the Compare Encodings has finished running, it will store the outputs in the specified working directory. By default, it will store the output in working_directory/compare_encodings in the current working directory. Creates a directory named latest in working_directory/compare_encodings which is symbolically linked to the most recent run YYYY-MM-DD_HH:mm:ss. Users may choose to override the directory name by passing it to –output_dirname (i.e. –output_dirname myTest1). The following figure shows a sample output folder from a Compare Encodings run.
- The following details what each file contains.
compare_encodings_options.json contains all the options used to run this feature
encodings_diff.xlsx contains comparison results with mismatches highlighted
log.txt contains log statements for the run
extracted_encodings.json contains extracted DLC encodings
Tensor inspection¶
Tensor inspection compares given reference output and target output tensors and dumps various statistics to represent differences between them.
The Tensor inspection feature can:
Plot histograms for golden and target tensors
Plot a graph indicating deviation between golden and target tensors
Plot a cumulative distribution graph (CDF) for golden vs. target tensors
Plot a density (KDE) graph for target tensor highlighting target min/max and calibrated min/max values
Create a CSV file containing information about: target min/max; calibrated min/max; golden output min/max; target/calibrated min/max differences; and computed metrics (verifiers).
Note
Usage
usage: qairt-accuracy-debugger --tensor_inspection [-h]
--golden_data GOLDEN_DATA
--target_data TARGET_DATA
--verifier VERIFIER [VERIFIER ...]
[-w WORKING_DIR]
[--data_type {int8,uint8,int16,uint16,float32}]
[--target_encodings TARGET_ENCODINGS]
[-v]
Script to inspection tensor.
required arguments:
--golden_data GOLDEN_DATA
Path to golden/framework outputs folder. Paths may be absolute or
relative to the working directory.
--target_data TARGET_DATA
Path to target outputs folder. Paths may be absolute or relative to the
working directory.
--verifier VERIFIER [VERIFIER ...]
Verifier used for verification. The options "RtolAtol",
"AdjustedRtolAtol", "TopK", "L1Error", "CosineSimilarity", "MSE", "MAE",
"SQNR", "ScaledDiff" are supported.
An optional list of hyperparameters can be appended, for example:
--verifier rtolatol,rtolmargin,0.01,atolmargin,0,01.
To use multiple verifiers, add additional --verifier CosineSimilarity
optional arguments:
-w WORKING_DIR, --working_dir WORKING_DIR
Working directory to save results. Creates a new directory if the
specified working directory does not exist
--data_type {int8,uint8,int16,uint16,float32}
DataType of the output tensor.
--target_encodings TARGET_ENCODINGS
Path to target encodings json file.
-v, --verbose Verbose printing
Sample Commands
# Basic run
qairt-accuracy-debugger --tensor_inspection \
--golden_data golden_tensors_dir \
--target_data target_tensors_dir \
--verifier sqnr
# Pass target encodings file and enable multiple verifiers
qairt-accuracy-debugger --tensor_inspection \
--golden_data golden_tensors_dir \
--target_data target_tensors_dir \
--verifier mse \
--verifier sqnr \
--verifier rtolatol,rtolmargin,0.01,atolmargin,0.01 \
--target_encodings qnn_encoding.json
Tip
A working_directory is generated from wherever this script is called from unless otherwise specified.
Outputs
Once the Tensor Inspection has finished running, it will store the outputs in the specified working directory. By default, it will store the output in working_directory/tensor_inspection in the current working directory. Creates a directory named latest in working_directory/tensor_inspection which is symbolically linked to the most recent run YYYY-MM-DD_HH:mm:ss. Users may choose to override the directory name by passing it to –output_dirname (i.e. –output_dirname myTest1). The following figure shows a sample output folder from a Tensor Inspection run.
The following details what each file contains.
Each tensor will have its own directory; the directory name matches the tensor name.
CDF_plots.html – Golden vs. target CDF graph
Diff_plots.html – Golden and target deviation graph
Distribution_min-max.png – Density plot for target tensor highlighting target vs. calibrated min/max values
Histograms.html – Golden and target histograms
golden_data.csv – Golden tensor data
target_data.csv – Target tensor data
log.txt – Log statements from the entire run
summary.csv – Target min/max, calibrated min/max, golden output min/max, target vs. calibrated min/max differences, and verifier outputs
Histogram Plots
Comparison: We compare histograms for both the golden data and the target data.
Overlay: To enhance clarity, we overlay the histograms bin by bin.
Binned Ranges: Each bin represents a value range, showing the frequency of occurrence.
Visual Insight: Overlapping histograms reveal differences or similarities between the datasets.
Interactive: Hover over histograms to get tensor range and frequencies for the dataset.
Cumulative Distribution Function (CDF) Plots
Overview: CDF plots display the cumulative probability distribution.
Overlay: We superimpose CDF plots for golden and target data.
Percentiles: These plots illustrate data distribution across different percentiles.
Hover Details: Exact cumulative probabilities are available on hover.
Tensor Difference Plots
Inspection: We generate plots highlighting differences between golden and target data tensors.
Scatter and Line: Scatter plots represent tensor values, while line plots show differences at each index.
Interactive: Hover over points to access precise values.
Snooping¶
Snooping algorithms help in finding inaccuracies in a neural-network at the layer level. The following snooping options are available:
oneshot-layerwise
cumulative-layerwise
layerwise
binary
oneshot-layerwise Snooping¶
This algorithm is designed to debug all layers of the model at a time by performing below steps:
Execute framework runner to collect reference outputs from all intermediate tensors of a model in fp32 precision
Execute inference engine to collect target outputs from all intermediate tensors of a model in provided target precision
Execute verification for comparison of intermediate outputs from the above two steps
This algorithm can be used to get quick analysis to check if layers in the model are quantization sensitive.
Usage
usage: qairt-accuracy-debugger --snooping oneshot-layerwise [-h]
--default_verifier DEFAULT_VERIFIER
[--result_csv RESULT_CSV]
[--verifier_config VERIFIER_CONFIG]
[--run_tensor_inspection] -r
{cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic}
-a
{x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}
-l INPUT_LIST [--input_network MODEL_PATH]
[--input_tensor INPUT_TENSOR [INPUT_TENSOR ...]]
[--out_tensor_node OUTPUT_TENSOR]
[--io_config IO_CONFIG]
[--converter_float_bitwidth {32,16}]
[--extra_converter_args EXTRA_CONVERTER_ARGS]
[--calibration_input_list CALIBRATION_INPUT_LIST]
[-bbw {8,32}] [-abw {8,16}] [-wbw {8,4}]
[--quantizer_float_bitwidth {32,16}]
[--act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}]
[--param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}]
[--act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}]
[--param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}]
[--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE]
[--use_per_channel_quantization]
[--use_per_row_quantization] [--float_fallback]
[--extra_quantizer_args EXTRA_QUANTIZER_ARGS]
[--perf_profile {low_balanced,balanced,default,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}]
[--profiling_level PROFILING_LEVEL]
[--userlogs {warn,verbose,info,error,fatal}]
[--log_level {error,warn,info,debug,verbose}]
[--extra_runtime_args EXTRA_RUNTIME_ARGS]
[--executor_type {qnn,snpe}]
[--stage {source,converted,quantized}]
[-p ENGINE_PATH] [--deviceId DEVICEID] [-v]
[--host_device {x86,x86_64-windows-msvc,wos}]
[-w WORKING_DIR]
[--output_dirname OUTPUT_DIRNAME]
[--debug_mode_off] [--args_config ARGS_CONFIG]
[--remote_server REMOTE_SERVER]
[--remote_username REMOTE_USERNAME]
[--remote_password REMOTE_PASSWORD]
[--golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY]
[--disable_offline_prepare]
[--backend_extension_config BACKEND_EXTENSION_CONFIG]
[--context_config_params CONTEXT_CONFIG_PARAMS]
[--graph_config_params GRAPH_CONFIG_PARAMS]
[--extra_contextbin_args EXTRA_CONTEXTBIN_ARGS]
[--disable_graph_optimization]
[--onnx_custom_op_lib ONNX_CUSTOM_OP_LIB]
[-f FRAMEWORK [FRAMEWORK ...]]
[-qo QUANTIZATION_OVERRIDES]
[--start_layer START_LAYER]
[--end_layer END_LAYER]
[--add_layer_outputs ADD_LAYER_OUTPUTS]
[--add_layer_types ADD_LAYER_TYPES]
[--skip_layer_types SKIP_LAYER_TYPES]
[--skip_layer_outputs SKIP_LAYER_OUTPUTS]
Script to run oneshot-layerwise snooping.
options:
-h, --help show this help message and exit
Verifier Arguments:
--default_verifier DEFAULT_VERIFIER [DEFAULT_VERIFIER ...]
Default verifier used for verification. The options
"RtolAtol", "AdjustedRtolAtol", "TopK", "L1Error",
"CosineSimilarity", "MSE", "MAE", "SQNR", "ScaledDiff"
are supported. An optional list of hyperparameters can
be appended. For example: --default_verifier
rtolatol,rtolmargin,0.01,atolmargin,0.01 An optional
list of placeholders can be appended. For example:
--default_verifier CosineSimilarity param1 1 param2 2.
to use multiple verifiers, add additional
--default_verifier CosineSimilarity
--result_csv RESULT_CSV
Path to the csv summary report comparing the inference
vs frameworkPaths may be absolute, or relative to the
working directory.if not specified, then a
--problem_inference_tensor must be specified
--verifier_config VERIFIER_CONFIG
Path to the verifiers' config file
--run_tensor_inspection
To run tensor inspection, pass this argument
Required Arguments:
-r {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic}, --runtime {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic}
Runtime to be used. Note: In case of SNPE
execution(--executor_type snpe), aic runtime is not
supported.
-a {x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}, --architecture {x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}
Name of the architecture to use for inference engine.
Note: In case of SNPE execution(--executor_type snpe),
aarch64-qnx architecture is not supported.
-l INPUT_LIST, --input_list INPUT_LIST
Path to the input list text file to run inference(used
with net-run). Note: When having multiple entries in
text file, in order to save memory and time, you can
pass --debug_mode_off to skip intermediate outputs
dump.
QAIRT Converter Arguments:
--input_network MODEL_PATH, --model_path MODEL_PATH
Path to the model file(s).
--input_tensor INPUT_TENSOR [INPUT_TENSOR ...]
The name and dimension of all the input buffers to the
network specified in the format [input_name comma-
separated-dimensions sample-data data-type] Note:
sample-data and data-type are optional for example:
'data' 1,224,224,3. Note that the quotes should always
be included in order to handle special characters,
spaces, etc. For multiple inputs, specify multiple
--input_tensor on the command line like:
--input_tensor "data1" 1,224,224,3 sample1.raw float32
--input_tensor "data2" 1,50,100,3 sample2.raw int64
NOTE: Required for TensorFlow and PyTorch. Optional
for Onnx and Tflite. In case of Onnx, this feature
works only with Onnx 1.6.0 and above.
--out_tensor_node OUTPUT_TENSOR, --output_tensor OUTPUT_TENSOR
Name of the graph's output Tensor Names. Multiple
output names should be provided separately like:
--out_tensor_node out_1 --out_tensor_node out_2 NOTE:
Required for TensorFlow. Optional for Onnx, Tflite and
PyTorch
--io_config IO_CONFIG
Use this option to specify a yaml file for input and
output options.
--converter_float_bitwidth {32,16}
Use this option to convert the graph to the specified
float bitwidth, either 32 (default) or 16. Note:
Cannot be used with --calibration_input_list and
--quantization_overrides
--extra_converter_args EXTRA_CONVERTER_ARGS
additional converter arguments in a quoted string.
example: --extra_converter_args
'arg1=value1;arg2=value2'
-qo QUANTIZATION_OVERRIDES, --quantization_overrides QUANTIZATION_OVERRIDES
Path to quantization overrides json file.
QAIRT Quantizer Arguments:
--calibration_input_list CALIBRATION_INPUT_LIST
Path to the inputs list text file to run
quantization(used with qairt-quantizer).
-bbw {8,32}, --bias_bitwidth {8,32}
option to select the bitwidth to use when quantizing
the bias. default 8
-abw {8,16}, --act_bitwidth {8,16}
option to select the bitwidth to use when quantizing
the activations. default 8
-wbw {8,4}, --weights_bitwidth {8,4}
option to select the bitwidth to use when quantizing
the weights. default 8
--quantizer_float_bitwidth {32,16}
Use this option to select the bitwidth to use for
float tensors, either 32 (default) or 16.
--act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}
Specify which quantization calibration method to use
for activations. This option has to be paired with
--act_quantizer_schema.
--param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}
Specify which quantization calibration method to use
for parameters. This option has to be paired with
--param_quantizer_schema.
--act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}
Specify which quantization schema to use for
activations. Note: Default is asymmetric.
--param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}
Specify which quantization schema to use for
parameters. Note: Default is asymmetric.
--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE
Value must lie between 90 and 100. Default is 99.99
--use_per_channel_quantization
Use per-channel quantization for convolution-based op
weights. Note: This will replace built-in model QAT
encodings when used for a given weight.
--use_per_row_quantization
Use this option to enable rowwise quantization of
Matmul and FullyConnected ops.
--float_fallback Use this option to enable fallback to floating point
(FP) instead of fixed point. This option can be paired
with --quantizer_float_bitwidth to indicate the
bitwidth for FP (by default 32). If this option is
enabled, then input list must not be provided and
--ignore_encodings must not be provided. The external
quantization encodings (encoding file/FakeQuant
encodings) might be missing quantization parameters
for some interim tensors. First it will try to fill
the gaps by propagating across math-invariant
functions. If the quantization params are still
missing, then it will apply fallback to nodes to
floating point.
--extra_quantizer_args EXTRA_QUANTIZER_ARGS
additional quantizer arguments in a quoted string.
example: --extra_quantizer_args
'arg1=value1;arg2=value2'
Net-run Arguments:
--perf_profile {low_balanced,balanced,default,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}
Specifies perf profile to set. Valid settings are
"low_balanced" , "balanced" , "default",
"high_performance" ,"sustained_high_performance",
"burst", "low_power_saver", "power_saver",
"high_power_saver", "extreme_power_saver", and
"system_settings". Note: perf_profile argument is now
deprecated for HTP backend, user can specify
performance profile through backend extension config
now.
--profiling_level PROFILING_LEVEL
Enables profiling and sets its level. For QNN
executor, valid settings are "basic", "detailed" and
"client" For SNPE executor, valid settings are "off",
"basic", "moderate", "detailed", and "linting".
Default is detailed.
--userlogs {warn,verbose,info,error,fatal}
Enable verbose logging. Note: This argument is
applicable only when --executor_type snpe
--log_level {error,warn,info,debug,verbose}
Enable verbose logging. Note: This argument is
applicable only when --executor_type qnn
--extra_runtime_args EXTRA_RUNTIME_ARGS
additional net runner arguments in a quoted string.
example: --extra_runtime_args
'arg1=value1;arg2=value2'
Other optional Arguments:
--executor_type {qnn,snpe}
Choose between qnn(qnn-net-run) and snpe(snpe-net-run)
execution. If not provided, qnn-net-run will be
executed for QAIRT or QNN SDK, or else snpe-net-run
will be executed for SNPE SDK.
--stage {source,converted,quantized}
Specifies the starting stage in the Accuracy Debugger
pipeline. source: starting with a source framework
model, converted: starting with a converted model,
quantized: starting with a quantized model. Default is
source.
-p ENGINE_PATH, --engine_path ENGINE_PATH
Path to SDK folder.
--deviceId DEVICEID The serial number of the device to use. If not passed,
the first in a list of queried devices will be used
for validation.
-v, --verbose Set verbose logging at debugger tool level
--host_device {x86,x86_64-windows-msvc,wos}
The device that will be running conversion. Set to x86
by default.
-w WORKING_DIR, --working_dir WORKING_DIR
Working directory for the snooping to store temporary
files. Creates a new directory if the specified
working directory does not exist
--output_dirname OUTPUT_DIRNAME
output directory name for the snooping to store
temporary files under <working_dir>/snooping .Creates
a new directory if the specified working directory
does not exist
--debug_mode_off This option can be used to avoid dumping intermediate
outputs.
--args_config ARGS_CONFIG
Path to a config file with arguments. This can be used
to feed arguments to the AccuracyDebugger as an
alternative to supplying them on the command line.
--remote_server REMOTE_SERVER
ip address of remote machine
--remote_username REMOTE_USERNAME
username of remote machine
--remote_password REMOTE_PASSWORD
password of remote machine
--golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY, --golden_dir_for_mapping GOLDEN_OUTPUT_REFERENCE_DIRECTORY
Optional parameter to indicate the directory of the
goldens, it's used for tensor mapping without running
model with framework runtime.
--disable_offline_prepare
Use this option to disable offline preparation. Note:
By default offline preparation will be done for
DSP/HTP runtimes.
--backend_extension_config BACKEND_EXTENSION_CONFIG
Path to config to be used with qnn-context-binary-
generator. Note: This argument is applicable only when
--executor_type qnn
--context_config_params CONTEXT_CONFIG_PARAMS
optional context config params in a quoted string.
example: --context_config_params
'context_priority=high;
cache_compatibility_mode=strict' Note: This argument
is applicable only when --executor_type qnn
--graph_config_params GRAPH_CONFIG_PARAMS
optional graph config params in a quoted string.
example: --graph_config_params 'graph_priority=low;
graph_profiling_num_executions=10'
--extra_contextbin_args EXTRA_CONTEXTBIN_ARGS
Additional context binary generator arguments in a
quoted string(applicable only when --executor_type
qnn). example: --extra_contextbin_args
'arg1=value1;arg2=value2'
--disable_graph_optimization
Disables basic model optimization
--onnx_custom_op_lib ONNX_CUSTOM_OP_LIB
path to onnx custom operator library
-f FRAMEWORK [FRAMEWORK ...], --framework FRAMEWORK [FRAMEWORK ...]
Framework type and version, version is optional.
Currently supported frameworks are [tensorflow,
tflite, onnx, pytorch]. For example, tensorflow 2.10.1
--start_layer START_LAYER
save all intermediate layer outputs from provided
start layer to bottom layer of model. Can be used in
conjunction with --end_layer.
--end_layer END_LAYER
save all intermediate layer outputs from top layer to
provided end layer of model. Can be used in
conjunction with --start_layer.
--add_layer_outputs ADD_LAYER_OUTPUTS
Output layers to be dumped. e.g: node1,node2
--add_layer_types ADD_LAYER_TYPES
outputs of layer types to be dumped. e.g
:Resize,Transpose. All enabled by default.
--skip_layer_types SKIP_LAYER_TYPES
comma delimited layer types to skip dumping. e.g
:Resize,Transpose
--skip_layer_outputs SKIP_LAYER_OUTPUTS
comma delimited layer output names to skip dumping.
e.g: node1,node2
Note
The –run_tensor_inspection argument significantly increases overall execution time when used with large models. To speed up execution, omit this argument.
Sample Commands
qairt-accuracy-debugger \
--snooping oneshot-layerwise \
--runtime dspv75 \
--architecture aarch64-android \
--framework onnx \
--model_path artifacts/mobilenet-v2.onnx \
--input_list artifacts/list.txt \
--input_tensor "input.1" 1,3,224,224 artifacts/inputFiles/dog.raw \
--output_tensor "473" \
--default_verifier mse \
--quantization_overrides artifacts/quantized_encoding.json \
--executor_type qnn \
--run_tensor_inspection
Tip
A working_directory is generated from wherever this script is called from unless otherwise specified.
Output
Below is the output directory structure:
working_directory
├── framework_runner
│ ├── 2024-08-07_15-34-08
│ └── latest
├── inputs_32
│ ├── dog.raw
│ └── input_list.txt
├── snooping
│ └── 2024-08-07_15-34-08
└── verification
├── 2024-08-07_15-34-23
└── latest
├── base.json
├── mse
├── tensor_inspection
├── summary.csv
├── summary.html
├── summary.json
└── verification_options.json
framework_runner directory contains a timestamped directory that contains the intermediate layer outputs (framework) stored in .raw format as described in the framework runner step.
snooping directory contains a timestamped directory that contains the intermediate layer outputs (inference engine) stored in .raw format as described in the inference engine step.
verification directory contains a timestamped directory that contains the following:
A directory with same name for each verifier specified while running oneshot; it contains CSV and HTML files with metric details for each layer output
tensor_inspection – Individual directories for each layer’s output with the following contents:
CDF_plots.html – Golden vs target CDF graph
Diff_plots.html – Golden and target deviation graph
Histograms.html – Golden and target histograms
golden_data.csv – Golden tensor data
target_data.csv – Target tensor data
summary.csv – Report for verification results of each layers output
Note: All directories will have a folder called latest which is a symlink to the latest run’s corresponding timestamped directory.
Snapshot of summary.csv file:
Understanding the oneshot-layerwise summary report:
Column |
Description |
|---|---|
Name |
Output name of the current layer |
Layer Type |
Type of the current layer |
Size |
Size of this layer’s output |
Tensor_dims |
Shape of this layer’s output |
<Verifier name> |
Verifier value of the current layer output compared to reference output |
golden_min |
minimum value in the reference output for current layer |
golden_max |
maximum value in the reference output for current layer |
target_min |
minimum value in the target output for current layer |
target_max |
maximum value in the target output for current layer |
cumulative-layerwise Snooping¶
This algorithm is designed to debug one layer at a time by performing below steps:
Execute framework runner to collect reference outputs from all intermediate tensors of a model in fp32 precision
Execute inference engine and verification steps in iterative manner to perform below operations
- Collect target outputs in target precision for each layer while removing the effect of its preceding layers on final output - Compare intermediate outputs from framework runner and inference engine
It provides deeper analysis to identify sensitivity of layers of model causing accuracy deviation and can be used to measure quantization sensitivity of each layer/op in the model with regard to the final output of the model.
Note
Debugging Accuracy issue with Quantized model using Cumulative Layerwise Snooping
With quantized models, it is expected to have some mismatch at most data intensive layers - arising due to quantization error.
The debugger can be used to identify operators which are most sensitive with high verifier score and run those at higher precision to improve overall accuracy.
The sensitivity is determined by the verifier score seen at that layer regarding the reference platform (like ONNXRT).
Note that Cumulative-layerwise debugging takes considerable time as the partitioned model shall be quantized and compiled at every layer that does not have a 100% match with reference.
Below is one strategy to debug larger models:
Run Oneshot-layerwise on the model which helps to identify the starting point of sensitivity in the model.
Run Cumulative-layerwise at different parts of the model using start-layer and end-layer options (if the model has 100 nodes, use start layer at starting node from Oneshot-layerwise run and end layer at the 25th node for run 1, start layer at 26th and end layer at 50th node for run 2, start layer at 51st node and end layer at 75th node for run 3 .. and so on).The final reports of all runs help to identify the most sensitive layers in the model. Let’s say node A,B,C have high verifier scores which indicates high sensitivity
Run the original model with those specific layers (A/B/C - one at a time or combinations) in FP16 and observe the improvement in accuracy.
Usage
usage: qairt-accuracy-debugger --snooping cumulative-layerwise [-h]
--default_verifier DEFAULT_VERIFIER [DEFAULT_VERIFIER ...]
[--result_csv RESULT_CSV]
[--verifier_threshold VERIFIER_THRESHOLD]
[--verifier_config VERIFIER_CONFIG] -r
{cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic}
-a
{x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}
-l INPUT_LIST [--input_network MODEL_PATH]
[--input_tensor INPUT_TENSOR [INPUT_TENSOR ...]]
[--out_tensor_node OUTPUT_TENSOR]
[--io_config IO_CONFIG]
[--converter_float_bitwidth {32,16}]
[--extra_converter_args EXTRA_CONVERTER_ARGS]
[--calibration_input_list CALIBRATION_INPUT_LIST]
[-bbw {8,32}] [-abw {8,16}] [-wbw {8,4}]
[--quantizer_float_bitwidth {32,16}]
[--act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}]
[--param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}]
[--act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}]
[--param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}]
[--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE]
[--use_per_channel_quantization]
[--use_per_row_quantization] [--float_fallback]
[--extra_quantizer_args EXTRA_QUANTIZER_ARGS]
[--perf_profile {low_balanced,balanced,default,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}]
[--profiling_level PROFILING_LEVEL]
[--userlogs {warn,verbose,info,error,fatal}]
[--log_level {error,warn,info,debug,verbose}]
[--extra_runtime_args EXTRA_RUNTIME_ARGS]
[--executor_type {qnn,snpe}]
[--stage {source,converted,quantized}]
[-p ENGINE_PATH] [--deviceId DEVICEID] [-v]
[--host_device {x86,x86_64-windows-msvc,wos}]
[-w WORKING_DIR]
[--output_dirname OUTPUT_DIRNAME]
[--debug_mode_off] [--args_config ARGS_CONFIG]
[--remote_server REMOTE_SERVER]
[--remote_username REMOTE_USERNAME]
[--remote_password REMOTE_PASSWORD]
[--golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY]
[--disable_offline_prepare]
[--backend_extension_config BACKEND_EXTENSION_CONFIG]
[--context_config_params CONTEXT_CONFIG_PARAMS]
[--graph_config_params GRAPH_CONFIG_PARAMS]
[--extra_contextbin_args EXTRA_CONTEXTBIN_ARGS]
[--disable_graph_optimization]
[--onnx_custom_op_lib ONNX_CUSTOM_OP_LIB]
[-f FRAMEWORK [FRAMEWORK ...]]
[-qo QUANTIZATION_OVERRIDES]
[--step_size STEP_SIZE]
[--start_layer START_LAYER]
[--end_layer END_LAYER]
Script to run cumulative-layerwise snooping.
options:
-h, --help show this help message and exit
Verifier Arguments:
--default_verifier DEFAULT_VERIFIER [DEFAULT_VERIFIER ...]
Default verifier used for verification. The options
"RtolAtol", "AdjustedRtolAtol", "TopK", "L1Error",
"CosineSimilarity", "MSE", "MAE", "SQNR", "ScaledDiff"
are supported. An optional list of hyperparameters can
be appended. For example: --default_verifier
rtolatol,rtolmargin,0.01,atolmargin,0.01 An optional
list of placeholders can be appended. For example:
--default_verifier CosineSimilarity param1 1 param2 2.
to use multiple verifiers, add additional
--default_verifier CosineSimilarity
--result_csv RESULT_CSV
Path to the csv summary report comparing the inference
vs frameworkPaths may be absolute, or relative to the
working directory.if not specified, then a
--problem_inference_tensor must be specified
--verifier_threshold VERIFIER_THRESHOLD
Verifier threshold for problematic tensor to be
chosen.
--verifier_config VERIFIER_CONFIG
Path to the verifiers' config file
Required Arguments:
-r {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic}, --runtime {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic}
Runtime to be used. Note: In case of SNPE
execution(--executor_type snpe), aic runtime is not
supported.
-a {x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}, --architecture {x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}
Name of the architecture to use for inference engine.
Note: In case of SNPE execution(--executor_type snpe),
aarch64-qnx architecture is not supported.
-l INPUT_LIST, --input_list INPUT_LIST
Path to the input list text file to run inference(used
with net-run). Note: When having multiple entries in
text file, in order to save memory and time, you can
pass --debug_mode_off to skip intermediate outputs
dump.
QAIRT Converter Arguments:
--input_network MODEL_PATH, --model_path MODEL_PATH
Path to the model file(s).
--input_tensor INPUT_TENSOR [INPUT_TENSOR ...]
The name and dimension of all the input buffers to the
network specified in the format [input_name comma-
separated-dimensions sample-data data-type] Note:
sample-data and data-type are optional for example:
'data' 1,224,224,3. Note that the quotes should always
be included in order to handle special characters,
spaces, etc. For multiple inputs, specify multiple
--input_tensor on the command line like:
--input_tensor "data1" 1,224,224,3 sample1.raw float32
--input_tensor "data2" 1,50,100,3 sample2.raw int64
NOTE: Required for TensorFlow and PyTorch. Optional
for Onnx and Tflite. In case of Onnx, this feature
works only with Onnx 1.6.0 and above.
--out_tensor_node OUTPUT_TENSOR, --output_tensor OUTPUT_TENSOR
Name of the graph's output Tensor Names. Multiple
output names should be provided separately like:
--out_tensor_node out_1 --out_tensor_node out_2 NOTE:
Required for TensorFlow. Optional for Onnx, Tflite and
PyTorch
--io_config IO_CONFIG
Use this option to specify a yaml file for input and
output options.
--converter_float_bitwidth {32,16}
Use this option to convert the graph to the specified
float bitwidth, either 32 (default) or 16. Note:
Cannot be used with --calibration_input_list and
--quantization_overrides
--extra_converter_args EXTRA_CONVERTER_ARGS
additional converter arguments in a quoted string.
example: --extra_converter_args
'arg1=value1;arg2=value2'
-qo QUANTIZATION_OVERRIDES, --quantization_overrides QUANTIZATION_OVERRIDES
Path to quantization overrides json file.
QAIRT Quantizer Arguments:
--calibration_input_list CALIBRATION_INPUT_LIST
Path to the inputs list text file to run
quantization(used with qairt-quantizer).
-bbw {8,32}, --bias_bitwidth {8,32}
option to select the bitwidth to use when quantizing
the bias. default 8
-abw {8,16}, --act_bitwidth {8,16}
option to select the bitwidth to use when quantizing
the activations. default 8
-wbw {8,4}, --weights_bitwidth {8,4}
option to select the bitwidth to use when quantizing
the weights. default 8
--quantizer_float_bitwidth {32,16}
Use this option to select the bitwidth to use for
float tensors, either 32 (default) or 16.
--act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}
Specify which quantization calibration method to use
for activations. This option has to be paired with
--act_quantizer_schema.
--param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}
Specify which quantization calibration method to use
for parameters. This option has to be paired with
--param_quantizer_schema.
--act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}
Specify which quantization schema to use for
activations. Note: Default is asymmetric.
--param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}
Specify which quantization schema to use for
parameters. Note: Default is asymmetric.
--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE
Value must lie between 90 and 100. Default is 99.99
--use_per_channel_quantization
Use per-channel quantization for convolution-based op
weights. Note: This will replace built-in model QAT
encodings when used for a given weight.
--use_per_row_quantization
Use this option to enable rowwise quantization of
Matmul and FullyConnected ops.
--float_fallback Use this option to enable fallback to floating point
(FP) instead of fixed point. This option can be paired
with --quantizer_float_bitwidth to indicate the
bitwidth for FP (by default 32). If this option is
enabled, then input list must not be provided and
--ignore_encodings must not be provided. The external
quantization encodings (encoding file/FakeQuant
encodings) might be missing quantization parameters
for some interim tensors. First it will try to fill
the gaps by propagating across math-invariant
functions. If the quantization params are still
missing, then it will apply fallback to nodes to
floating point.
--extra_quantizer_args EXTRA_QUANTIZER_ARGS
additional quantizer arguments in a quoted string.
example: --extra_quantizer_args
'arg1=value1;arg2=value2'
Net-run Arguments:
--perf_profile {low_balanced,balanced,default,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}
Specifies perf profile to set. Valid settings are
"low_balanced" , "balanced" , "default",
"high_performance" ,"sustained_high_performance",
"burst", "low_power_saver", "power_saver",
"high_power_saver", "extreme_power_saver", and
"system_settings". Note: perf_profile argument is now
deprecated for HTP backend, user can specify
performance profile through backend extension config
now.
--profiling_level PROFILING_LEVEL
Enables profiling and sets its level. For QNN
executor, valid settings are "basic", "detailed" and
"client" For SNPE executor, valid settings are "off",
"basic", "moderate", "detailed", and "linting".
Default is detailed.
--userlogs {warn,verbose,info,error,fatal}
Enable verbose logging. Note: This argument is
applicable only when --executor_type snpe
--log_level {error,warn,info,debug,verbose}
Enable verbose logging. Note: This argument is
applicable only when --executor_type qnn
--extra_runtime_args EXTRA_RUNTIME_ARGS
additional net runner arguments in a quoted string.
example: --extra_runtime_args
'arg1=value1;arg2=value2'
Other optional Arguments:
--executor_type {qnn,snpe}
Choose between qnn(qnn-net-run) and snpe(snpe-net-run)
execution. If not provided, qnn-net-run will be
executed for QAIRT or QNN SDK, or else snpe-net-run
will be executed for SNPE SDK.
--stage {source,converted,quantized}
Specifies the starting stage in the Accuracy Debugger
pipeline. source: starting with a source framework
model, converted: starting with a converted model,
quantized: starting with a quantized model. Default is
source.
-p ENGINE_PATH, --engine_path ENGINE_PATH
Path to SDK folder.
--deviceId DEVICEID The serial number of the device to use. If not passed,
the first in a list of queried devices will be used
for validation.
-v, --verbose Set verbose logging at debugger tool level
--host_device {x86,x86_64-windows-msvc,wos}
The device that will be running conversion. Set to x86
by default.
-w WORKING_DIR, --working_dir WORKING_DIR
Working directory for the snooping to store temporary
files. Creates a new directory if the specified
working directory does not exist
--output_dirname OUTPUT_DIRNAME
output directory name for the snooping to store
temporary files under <working_dir>/snooping .Creates
a new directory if the specified working directory
does not exist
--debug_mode_off This option can be used to avoid dumping intermediate
outputs.
--args_config ARGS_CONFIG
Path to a config file with arguments. This can be used
to feed arguments to the AccuracyDebugger as an
alternative to supplying them on the command line.
--remote_server REMOTE_SERVER
ip address of remote machine
--remote_username REMOTE_USERNAME
username of remote machine
--remote_password REMOTE_PASSWORD
password of remote machine
--golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY, --golden_dir_for_mapping GOLDEN_OUTPUT_REFERENCE_DIRECTORY
Optional parameter to indicate the directory of the
goldens, it's used for tensor mapping without running
model with framework runtime.
--disable_offline_prepare
Use this option to disable offline preparation. Note:
By default offline preparation will be done for
DSP/HTP runtimes.
--backend_extension_config BACKEND_EXTENSION_CONFIG
Path to config to be used with qnn-context-binary-
generator. Note: This argument is applicable only when
--executor_type qnn
--context_config_params CONTEXT_CONFIG_PARAMS
optional context config params in a quoted string.
example: --context_config_params
'context_priority=high;
cache_compatibility_mode=strict' Note: This argument
is applicable only when --executor_type qnn
--graph_config_params GRAPH_CONFIG_PARAMS
optional graph config params in a quoted string.
example: --graph_config_params 'graph_priority=low;
graph_profiling_num_executions=10'
--extra_contextbin_args EXTRA_CONTEXTBIN_ARGS
Additional context binary generator arguments in a
quoted string(applicable only when --executor_type
qnn). example: --extra_contextbin_args
'arg1=value1;arg2=value2'
--disable_graph_optimization
Disables basic model optimization
--onnx_custom_op_lib ONNX_CUSTOM_OP_LIB
path to onnx custom operator library
-f FRAMEWORK [FRAMEWORK ...], --framework FRAMEWORK [FRAMEWORK ...]
Framework type and version, version is optional.
Currently supported frameworks are [tensorflow,
tflite, onnx, pytorch]. For example, tensorflow 2.10.1
--step_size STEP_SIZE
number of layers to skip in each iteration of
debugging. Applicable only for cumulative-layerwise
algorithm. --step_size (> 1) should not be used along
with --add_layer_outputs, --add_layer_types,
--skip_layer_outputs, --skip_layer_types,
--start_layer, --end_layer
--start_layer START_LAYER
save all intermediate layer outputs from provided
start layer to bottom layer of model. Can be used in
conjunction with --end_layer.
--end_layer END_LAYER
save all intermediate layer outputs from top layer to
provided end layer of model. Can be used in
conjunction with --start_layer.
Sample Commands
qairt-accuracy-debugger \
--snooping cumulative-layerwise \
--runtime dspv75 \
--architecture aarch64-android \
--framework onnx \
--model_path artifacts/mobilenet-v2.onnx \
--input_list artifacts/list.txt \
--input_tensor "input.1" 1,3,224,224 artifacts/inputFiles/dog.raw \
--output_tensor "473" \
--default_verifier mse \
--quantization_overrides artifacts/quantized_encoding.json \
--executor_type qnn
Output
Below is the output directory structure:
working_directory
├── framework_runner
│ ├── 2024-08-07_16-23-50
│ └── latest
├── inputs_32
│ ├── dog.raw
│ └── input_list.txt
└── snooping
└── 2024-08-07_16-23-49
├── base_quantized.json
├── cumulative_layerwise.csv
├── extracted_model.onnx
├── inference_engine
├── log.txt
├── snooping_options.json
├── temp-list.txt
└── transformed.onnx
framework_runner directory contains timestamped directory that contains the intermediate layers outputs stored in .raw format just as mentioned in Framework Runner step.
snooping directory contains intemediate outputs obtained from inference engine step stored in separate directories with respective layer names. Also it contains final report named cumulative_layerwise.csv which contains verifier scores for each layer. User can identify layers with most deviating scores as problematic nodes.
Snapshot of cumulative_layerwise.csv:
Understanding the cumulative-layerwise report:
Column |
Description |
|---|---|
O/P Name |
Output name of the current layer. |
Status |
|
Layer Type |
Type of the current layer. |
Shape |
Shape of this layer’s output. |
Activations |
The Min, Max and Median of the outputs at this layer taken from reference execution. |
<Verifier name> |
Absolute verifier value of the current layer compared to reference platform. |
Orig outputs |
Displays the original outputs verifier score observed when the model was run with the current
layer output enabled starting from the last partitioned layer.
|
Info |
Displays information for the output verifiers, if the values are abnormal. |
layerwise Snooping¶
This algorithm is designed to debug a single layer model at a time by performing the following steps:
Get golden reference per layer outputs from an external tool or, if a golden reference is not given, run framework runner to collect reference outputs from all intermediate tensors of a model in fp32 precision
Iteratively execute inference engine and verification to: - Collect target outputs in target precision for each single layer model by removing all of the preceding and subsequent layers - Compare intermediate output from golden reference with inference engine single layer partitioned model output
Layer-wise snooping provides deeper analysis to identify all model layers causing accuracy deviation on hardware with respect to framework/simulation outputs. This algorithm can be used to identify kernel issues for layers/ops present in the model.
Note
Debugging Accuracy issue for models exhibiting Accuracy discrepancy between golden reference (for ex. - AIMET/framework runtime output) vs target output using Layerwise Snooping
- One of the popular usecase for layerwise snooping is debugging accuracy difference between AIMET vs target
Though we are creating an exact simulation of hardware using tools like AIMET, still it is expected to have a very minute mismatch due to environment differences. This can be because simulation executes on GPU FP32 kernels and is simulating noise rather than actual execution on integer kernels in the case of hardware execution.
If we have a higher deviation between simulation and hardware, then layerwise snooping could be used to point out to the nodes having higher deviations. The nodes showing higher deviation as per layerwise.csv can be identified as the erroneous nodes.
Other usecases include debugging Framework runtime’s FP32 output vs target INT16 output deviations.
Usage
usage: qairt-accuracy-debugger --snooping layerwise [-h]
--default_verifier DEFAULT_VERIFIER [DEFAULT_VERIFIER ...]
[--result_csv RESULT_CSV]
[--verifier_threshold VERIFIER_THRESHOLD]
[--verifier_config VERIFIER_CONFIG] -r
{cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic}
-a
{x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}
-l INPUT_LIST [--input_network MODEL_PATH]
[--input_tensor INPUT_TENSOR [INPUT_TENSOR ...]]
[--out_tensor_node OUTPUT_TENSOR]
[--io_config IO_CONFIG]
[--converter_float_bitwidth {32,16}]
[--extra_converter_args EXTRA_CONVERTER_ARGS]
[--calibration_input_list CALIBRATION_INPUT_LIST]
[-bbw {8,32}] [-abw {8,16}] [-wbw {8,4}]
[--quantizer_float_bitwidth {32,16}]
[--act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}]
[--param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}]
[--act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}]
[--param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}]
[--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE]
[--use_per_channel_quantization]
[--use_per_row_quantization] [--float_fallback]
[--extra_quantizer_args EXTRA_QUANTIZER_ARGS]
[--perf_profile {low_balanced,balanced,default,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}]
[--profiling_level PROFILING_LEVEL]
[--userlogs {warn,verbose,info,error,fatal}]
[--log_level {error,warn,info,debug,verbose}]
[--extra_runtime_args EXTRA_RUNTIME_ARGS]
[--executor_type {qnn,snpe}]
[--stage {source,converted,quantized}]
[-p ENGINE_PATH] [--deviceId DEVICEID] [-v]
[--host_device {x86,x86_64-windows-msvc,wos}]
[-w WORKING_DIR]
[--output_dirname OUTPUT_DIRNAME]
[--debug_mode_off] [--args_config ARGS_CONFIG]
[--remote_server REMOTE_SERVER]
[--remote_username REMOTE_USERNAME]
[--remote_password REMOTE_PASSWORD]
[--golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY]
[--disable_offline_prepare]
[--backend_extension_config BACKEND_EXTENSION_CONFIG]
[--context_config_params CONTEXT_CONFIG_PARAMS]
[--graph_config_params GRAPH_CONFIG_PARAMS]
[--extra_contextbin_args EXTRA_CONTEXTBIN_ARGS]
[--disable_graph_optimization]
[--onnx_custom_op_lib ONNX_CUSTOM_OP_LIB]
[-f FRAMEWORK [FRAMEWORK ...]]
[-qo QUANTIZATION_OVERRIDES]
[--start_layer START_LAYER]
[--end_layer END_LAYER]
Script to run layerwise snooping.
options:
-h, --help show this help message and exit
Verifier Arguments:
--default_verifier DEFAULT_VERIFIER [DEFAULT_VERIFIER ...]
Default verifier used for verification. The options
"RtolAtol", "AdjustedRtolAtol", "TopK", "L1Error",
"CosineSimilarity", "MSE", "MAE", "SQNR", "ScaledDiff"
are supported. An optional list of hyperparameters can
be appended. For example: --default_verifier
rtolatol,rtolmargin,0.01,atolmargin,0.01 An optional
list of placeholders can be appended. For example:
--default_verifier CosineSimilarity param1 1 param2 2.
to use multiple verifiers, add additional
--default_verifier CosineSimilarity
--result_csv RESULT_CSV
Path to the csv summary report comparing the inference
vs frameworkPaths may be absolute, or relative to the
working directory.if not specified, then a
--problem_inference_tensor must be specified
--verifier_threshold VERIFIER_THRESHOLD
Verifier threshold for problematic tensor to be
chosen.
--verifier_config VERIFIER_CONFIG
Path to the verifiers' config file
Required Arguments:
-r {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic}, --runtime {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic}
Runtime to be used. Note: In case of SNPE
execution(--executor_type snpe), aic runtime is not
supported.
-a {x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}, --architecture {x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}
Name of the architecture to use for inference engine.
Note: In case of SNPE execution(--executor_type snpe),
aarch64-qnx architecture is not supported.
-l INPUT_LIST, --input_list INPUT_LIST
Path to the input list text file to run inference(used
with net-run). Note: When having multiple entries in
text file, in order to save memory and time, you can
pass --debug_mode_off to skip intermediate outputs
dump.
QAIRT Converter Arguments:
--input_network MODEL_PATH, --model_path MODEL_PATH
Path to the model file(s).
--input_tensor INPUT_TENSOR [INPUT_TENSOR ...]
The name and dimension of all the input buffers to the
network specified in the format [input_name comma-
separated-dimensions sample-data data-type] Note:
sample-data and data-type are optional for example:
'data' 1,224,224,3. Note that the quotes should always
be included in order to handle special characters,
spaces, etc. For multiple inputs, specify multiple
--input_tensor on the command line like:
--input_tensor "data1" 1,224,224,3 sample1.raw float32
--input_tensor "data2" 1,50,100,3 sample2.raw int64
NOTE: Required for TensorFlow and PyTorch. Optional
for Onnx and Tflite. In case of Onnx, this feature
works only with Onnx 1.6.0 and above.
--out_tensor_node OUTPUT_TENSOR, --output_tensor OUTPUT_TENSOR
Name of the graph's output Tensor Names. Multiple
output names should be provided separately like:
--out_tensor_node out_1 --out_tensor_node out_2 NOTE:
Required for TensorFlow. Optional for Onnx, Tflite and
PyTorch
--io_config IO_CONFIG
Use this option to specify a yaml file for input and
output options.
--converter_float_bitwidth {32,16}
Use this option to convert the graph to the specified
float bitwidth, either 32 (default) or 16. Note:
Cannot be used with --calibration_input_list and
--quantization_overrides
--extra_converter_args EXTRA_CONVERTER_ARGS
additional converter arguments in a quoted string.
example: --extra_converter_args
'arg1=value1;arg2=value2'
-qo QUANTIZATION_OVERRIDES, --quantization_overrides QUANTIZATION_OVERRIDES
Path to quantization overrides json file.
QAIRT Quantizer Arguments:
--calibration_input_list CALIBRATION_INPUT_LIST
Path to the inputs list text file to run
quantization(used with qairt-quantizer).
-bbw {8,32}, --bias_bitwidth {8,32}
option to select the bitwidth to use when quantizing
the bias. default 8
-abw {8,16}, --act_bitwidth {8,16}
option to select the bitwidth to use when quantizing
the activations. default 8
-wbw {8,4}, --weights_bitwidth {8,4}
option to select the bitwidth to use when quantizing
the weights. default 8
--quantizer_float_bitwidth {32,16}
Use this option to select the bitwidth to use for
float tensors, either 32 (default) or 16.
--act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}
Specify which quantization calibration method to use
for activations. This option has to be paired with
--act_quantizer_schema.
--param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}
Specify which quantization calibration method to use
for parameters. This option has to be paired with
--param_quantizer_schema.
--act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}
Specify which quantization schema to use for
activations. Note: Default is asymmetric.
--param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}
Specify which quantization schema to use for
parameters. Note: Default is asymmetric.
--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE
Value must lie between 90 and 100. Default is 99.99
--use_per_channel_quantization
Use per-channel quantization for convolution-based op
weights. Note: This will replace built-in model QAT
encodings when used for a given weight.
--use_per_row_quantization
Use this option to enable rowwise quantization of
Matmul and FullyConnected ops.
--float_fallback Use this option to enable fallback to floating point
(FP) instead of fixed point. This option can be paired
with --quantizer_float_bitwidth to indicate the
bitwidth for FP (by default 32). If this option is
enabled, then input list must not be provided and
--ignore_encodings must not be provided. The external
quantization encodings (encoding file/FakeQuant
encodings) might be missing quantization parameters
for some interim tensors. First it will try to fill
the gaps by propagating across math-invariant
functions. If the quantization params are still
missing, then it will apply fallback to nodes to
floating point.
--extra_quantizer_args EXTRA_QUANTIZER_ARGS
additional quantizer arguments in a quoted string.
example: --extra_quantizer_args
'arg1=value1;arg2=value2'
Net-run Arguments:
--perf_profile {low_balanced,balanced,default,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}
Specifies perf profile to set. Valid settings are
"low_balanced" , "balanced" , "default",
"high_performance" ,"sustained_high_performance",
"burst", "low_power_saver", "power_saver",
"high_power_saver", "extreme_power_saver", and
"system_settings". Note: perf_profile argument is now
deprecated for HTP backend, user can specify
performance profile through backend extension config
now.
--profiling_level PROFILING_LEVEL
Enables profiling and sets its level. For QNN
executor, valid settings are "basic", "detailed" and
"client" For SNPE executor, valid settings are "off",
"basic", "moderate", "detailed", and "linting".
Default is detailed.
--userlogs {warn,verbose,info,error,fatal}
Enable verbose logging. Note: This argument is
applicable only when --executor_type snpe
--log_level {error,warn,info,debug,verbose}
Enable verbose logging. Note: This argument is
applicable only when --executor_type qnn
--extra_runtime_args EXTRA_RUNTIME_ARGS
additional net runner arguments in a quoted string.
example: --extra_runtime_args
'arg1=value1;arg2=value2'
Other optional Arguments:
--executor_type {qnn,snpe}
Choose between qnn(qnn-net-run) and snpe(snpe-net-run)
execution. If not provided, qnn-net-run will be
executed for QAIRT or QNN SDK, or else snpe-net-run
will be executed for SNPE SDK.
--stage {source,converted,quantized}
Specifies the starting stage in the Accuracy Debugger
pipeline. source: starting with a source framework
model, converted: starting with a converted model,
quantized: starting with a quantized model. Default is
source.
-p ENGINE_PATH, --engine_path ENGINE_PATH
Path to SDK folder.
--deviceId DEVICEID The serial number of the device to use. If not passed,
the first in a list of queried devices will be used
for validation.
-v, --verbose Set verbose logging at debugger tool level
--host_device {x86,x86_64-windows-msvc,wos}
The device that will be running conversion. Set to x86
by default.
-w WORKING_DIR, --working_dir WORKING_DIR
Working directory for the snooping to store temporary
files. Creates a new directory if the specified
working directory does not exist
--output_dirname OUTPUT_DIRNAME
output directory name for the snooping to store
temporary files under <working_dir>/snooping .Creates
a new directory if the specified working directory
does not exist
--debug_mode_off This option can be used to avoid dumping intermediate
outputs.
--args_config ARGS_CONFIG
Path to a config file with arguments. This can be used
to feed arguments to the AccuracyDebugger as an
alternative to supplying them on the command line.
--remote_server REMOTE_SERVER
ip address of remote machine
--remote_username REMOTE_USERNAME
username of remote machine
--remote_password REMOTE_PASSWORD
password of remote machine
--golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY, --golden_dir_for_mapping GOLDEN_OUTPUT_REFERENCE_DIRECTORY
Optional parameter to indicate the directory of the
goldens, it's used for tensor mapping without running
model with framework runtime.
--disable_offline_prepare
Use this option to disable offline preparation. Note:
By default offline preparation will be done for
DSP/HTP runtimes.
--backend_extension_config BACKEND_EXTENSION_CONFIG
Path to config to be used with qnn-context-binary-
generator. Note: This argument is applicable only when
--executor_type qnn
--context_config_params CONTEXT_CONFIG_PARAMS
optional context config params in a quoted string.
example: --context_config_params
'context_priority=high;
cache_compatibility_mode=strict' Note: This argument
is applicable only when --executor_type qnn
--graph_config_params GRAPH_CONFIG_PARAMS
optional graph config params in a quoted string.
example: --graph_config_params 'graph_priority=low;
graph_profiling_num_executions=10'
--extra_contextbin_args EXTRA_CONTEXTBIN_ARGS
Additional context binary generator arguments in a
quoted string(applicable only when --executor_type
qnn). example: --extra_contextbin_args
'arg1=value1;arg2=value2'
--disable_graph_optimization
Disables basic model optimization
--onnx_custom_op_lib ONNX_CUSTOM_OP_LIB
path to onnx custom operator library
-f FRAMEWORK [FRAMEWORK ...], --framework FRAMEWORK [FRAMEWORK ...]
Framework type and version, version is optional.
Currently supported frameworks are [tensorflow,
tflite, onnx, pytorch]. For example, tensorflow 2.10.1
--start_layer START_LAYER
save all intermediate layer outputs from provided
start layer to bottom layer of model. Can be used in
conjunction with --end_layer.
--end_layer END_LAYER
save all intermediate layer outputs from top layer to
provided end layer of model. Can be used in
conjunction with --start_layer.
Sample Commands
qairt-accuracy-debugger \
--snooping layerwise \
--runtime dspv75 \
--architecture aarch64-android \
--framework onnx \
--model_path artifacts/mobilenet-v2.onnx \
--input_list artifacts/list.txt \
--input_tensor "input.1" 1,3,224,224 /local/mnt/workspace/gautr/notebooks/artifacts/inputFiles/dog.raw \
--output_tensor "473" \
--default_verifier mse \
--quantization_overrides artifacts/quantized_encoding.json \
--executor_type qnn
Output
Below is the output directory structure:
working_directory
├── framework_runner
│ ├── 2024-08-07_15-58-09
│ └── latest
├── inputs_32
│ ├── dog.raw
│ └── input_list.txt
└── snooping
└── 2024-08-07_15-58-09
├── base_quantized.json
├── extracted_model.onnx
├── inference_engine
├── layerwise.csv
├── log.txt
├── snooping_options.json
└── temp-list.txt
framework_runner directory contains timestamped directory that contains the intermediate layers outputs stored in .raw format just as mentioned in Framework Runner step.
snooping contains each single layer model outputs obtained from the inference engine stage stored in separate directories and the final report named layerwise.csv which contains verifier scores for each layer model. Users can identify layers with the most deviating scores as problematic nodes.
layerwise.csv is similar to the cumulative-layerwise report (cumulative_layerwise.csv), except that original outputs column will not be present in layerwise snooping. Please refer to cumulative-layerwise report for more details.
Snapshot of layerwise.csv:
Understanding the layerwise report:
Column |
Description |
|---|---|
O/P Name |
Output name of the current layer. |
Status |
|
Layer Type |
Type of the current layer. |
Shape |
Shape of this layer’s output. |
Activations |
The Min, Max and Median of the outputs at this layer taken from reference execution. |
<Verifier name> |
Absolute verifier value of the current layer compared to reference platform. |
Info |
Displays information for the output verifiers, if the values are abnormal. |
binary Snooping¶
The binary snooping tool debugs the given ONNX graph in a binary search fashion.
For the graph under analysis, it quantizes half of the graph and lets the other half run in fp16/32. The final model output is used to calculate the subgraph quantization effect. If the subgraph has a high effect(verifier scores greater than 60% of sum of the two subgraphs scores) on the final model output due to quantization, the process repeats until the subgraph size is less than the min_graph_size or the subgraph cannot be divided again. If both subgraphs have similar scores(verifier scores greater than 40% of sum of the two subgraphs scores), both subgraphs are investigated further.
Usage
usage: qairt-accuracy-debugger --snooping binary [-h]
-r {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic}
-a
{x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}
-l INPUT_LIST [--input_network MODEL_PATH]
[--input_tensor INPUT_TENSOR [INPUT_TENSOR ...]]
[--out_tensor_node OUTPUT_TENSOR]
[--io_config IO_CONFIG]
[--converter_float_bitwidth {32,16}]
[--extra_converter_args EXTRA_CONVERTER_ARGS]
[--calibration_input_list CALIBRATION_INPUT_LIST]
[-bbw {8,32}] [-abw {8,16}] [-wbw {8,4}]
[--quantizer_float_bitwidth {32,16}]
[--act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}]
[--param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}]
[--act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}]
[--param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}]
[--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE]
[--use_per_channel_quantization]
[--use_per_row_quantization] [--float_fallback]
[--extra_quantizer_args EXTRA_QUANTIZER_ARGS]
[--perf_profile {low_balanced,balanced,default,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}]
[--profiling_level PROFILING_LEVEL]
[--userlogs {warn,verbose,info,error,fatal}]
[--log_level {error,warn,info,debug,verbose}]
[--extra_runtime_args EXTRA_RUNTIME_ARGS]
[--executor_type {qnn,snpe}]
[--stage {source,converted,quantized}]
[-p ENGINE_PATH] [--deviceId DEVICEID] [-v]
[--host_device {x86,x86_64-windows-msvc,wos}]
[-w WORKING_DIR]
[--output_dirname OUTPUT_DIRNAME]
[--debug_mode_off] [--args_config ARGS_CONFIG]
[--remote_server REMOTE_SERVER]
[--remote_username REMOTE_USERNAME]
[--remote_password REMOTE_PASSWORD]
[--golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY]
[--disable_offline_prepare]
[--backend_extension_config BACKEND_EXTENSION_CONFIG]
[--context_config_params CONTEXT_CONFIG_PARAMS]
[--graph_config_params GRAPH_CONFIG_PARAMS]
[--extra_contextbin_args EXTRA_CONTEXTBIN_ARGS]
[--disable_graph_optimization]
[--onnx_custom_op_lib ONNX_CUSTOM_OP_LIB]
[-f FRAMEWORK [FRAMEWORK ...]] -qo
QUANTIZATION_OVERRIDES
[--min_graph_size MIN_GRAPH_SIZE]
[--subgraph_relative_weight SUBGRAPH_RELATIVE_WEIGHT]
[--verifier VERIFIER]
Script to run binary snooping.
options:
-h, --help show this help message and exit
Required Arguments:
-r {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic}, --runtime {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic}
Runtime to be used. Note: In case of SNPE
execution(--executor_type snpe), aic runtime is not
supported.
-a {x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}, --architecture {x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}
Name of the architecture to use for inference engine.
Note: In case of SNPE execution(--executor_type snpe),
aarch64-qnx architecture is not supported.
-l INPUT_LIST, --input_list INPUT_LIST
Path to the input list text file to run inference(used
with net-run). Note: When having multiple entries in
text file, in order to save memory and time, you can
pass --debug_mode_off to skip intermediate outputs
dump.
-qo QUANTIZATION_OVERRIDES, --quantization_overrides QUANTIZATION_OVERRIDES
Path to quantization overrides json file. Note: This
is used with converter as well.
QAIRT Converter Arguments:
--input_network MODEL_PATH, --model_path MODEL_PATH
Path to the model file(s).
--input_tensor INPUT_TENSOR [INPUT_TENSOR ...]
The name and dimension of all the input buffers to the
network specified in the format [input_name comma-
separated-dimensions sample-data data-type] Note:
sample-data and data-type are optional for example:
'data' 1,224,224,3. Note that the quotes should always
be included in order to handle special characters,
spaces, etc. For multiple inputs, specify multiple
--input_tensor on the command line like:
--input_tensor "data1" 1,224,224,3 sample1.raw float32
--input_tensor "data2" 1,50,100,3 sample2.raw int64
NOTE: Required for TensorFlow and PyTorch. Optional
for Onnx and Tflite. In case of Onnx, this feature
works only with Onnx 1.6.0 and above.
--out_tensor_node OUTPUT_TENSOR, --output_tensor OUTPUT_TENSOR
Name of the graph's output Tensor Names. Multiple
output names should be provided separately like:
--out_tensor_node out_1 --out_tensor_node out_2 NOTE:
Required for TensorFlow. Optional for Onnx, Tflite and
PyTorch
--io_config IO_CONFIG
Use this option to specify a yaml file for input and
output options.
--converter_float_bitwidth {32,16}
Use this option to convert the graph to the specified
float bitwidth, either 32 (default) or 16. Note:
Cannot be used with --calibration_input_list and
--quantization_overrides
--extra_converter_args EXTRA_CONVERTER_ARGS
additional converter arguments in a quoted string.
example: --extra_converter_args
'arg1=value1;arg2=value2'
QAIRT Quantizer Arguments:
--calibration_input_list CALIBRATION_INPUT_LIST
Path to the inputs list text file to run
quantization(used with qairt-quantizer).
-bbw {8,32}, --bias_bitwidth {8,32}
option to select the bitwidth to use when quantizing
the bias. default 8
-abw {8,16}, --act_bitwidth {8,16}
option to select the bitwidth to use when quantizing
the activations. default 8
-wbw {8,4}, --weights_bitwidth {8,4}
option to select the bitwidth to use when quantizing
the weights. default 8
--quantizer_float_bitwidth {32,16}
Use this option to select the bitwidth to use for
float tensors, either 32 (default) or 16.
--act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}
Specify which quantization calibration method to use
for activations. This option has to be paired with
--act_quantizer_schema.
--param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}
Specify which quantization calibration method to use
for parameters. This option has to be paired with
--param_quantizer_schema.
--act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}
Specify which quantization schema to use for
activations. Note: Default is asymmetric.
--param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}
Specify which quantization schema to use for
parameters. Note: Default is asymmetric.
--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE
Value must lie between 90 and 100. Default is 99.99
--use_per_channel_quantization
Use per-channel quantization for convolution-based op
weights. Note: This will replace built-in model QAT
encodings when used for a given weight.
--use_per_row_quantization
Use this option to enable rowwise quantization of
Matmul and FullyConnected ops.
--float_fallback Use this option to enable fallback to floating point
(FP) instead of fixed point. This option can be paired
with --quantizer_float_bitwidth to indicate the
bitwidth for FP (by default 32). If this option is
enabled, then input list must not be provided and
--ignore_encodings must not be provided. The external
quantization encodings (encoding file/FakeQuant
encodings) might be missing quantization parameters
for some interim tensors. First it will try to fill
the gaps by propagating across math-invariant
functions. If the quantization params are still
missing, then it will apply fallback to nodes to
floating point.
--extra_quantizer_args EXTRA_QUANTIZER_ARGS
additional quantizer arguments in a quoted string.
example: --extra_quantizer_args
'arg1=value1;arg2=value2'
Net-run Arguments:
--perf_profile {low_balanced,balanced,default,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}
Specifies perf profile to set. Valid settings are
"low_balanced" , "balanced" , "default",
"high_performance" ,"sustained_high_performance",
"burst", "low_power_saver", "power_saver",
"high_power_saver", "extreme_power_saver", and
"system_settings". Note: perf_profile argument is now
deprecated for HTP backend, user can specify
performance profile through backend extension config
now.
--profiling_level PROFILING_LEVEL
Enables profiling and sets its level. For QNN
executor, valid settings are "basic", "detailed" and
"client" For SNPE executor, valid settings are "off",
"basic", "moderate", "detailed", and "linting".
Default is detailed.
--userlogs {warn,verbose,info,error,fatal}
Enable verbose logging. Note: This argument is
applicable only when --executor_type snpe
--log_level {error,warn,info,debug,verbose}
Enable verbose logging. Note: This argument is
applicable only when --executor_type qnn
--extra_runtime_args EXTRA_RUNTIME_ARGS
additional net runner arguments in a quoted string.
example: --extra_runtime_args
'arg1=value1;arg2=value2'
Other optional Arguments:
--executor_type {qnn,snpe}
Choose between qnn(qnn-net-run) and snpe(snpe-net-run)
execution. If not provided, qnn-net-run will be
executed for QAIRT or QNN SDK, or else snpe-net-run
will be executed for SNPE SDK.
--stage {source,converted,quantized}
Specifies the starting stage in the Accuracy Debugger
pipeline. source: starting with a source framework
model, converted: starting with a converted model,
quantized: starting with a quantized model. Default is
source.
-p ENGINE_PATH, --engine_path ENGINE_PATH
Path to SDK folder.
--deviceId DEVICEID The serial number of the device to use. If not passed,
the first in a list of queried devices will be used
for validation.
-v, --verbose Set verbose logging at debugger tool level
--host_device {x86,x86_64-windows-msvc,wos}
The device that will be running conversion. Set to x86
by default.
-w WORKING_DIR, --working_dir WORKING_DIR
Working directory for the snooping to store temporary
files. Creates a new directory if the specified
working directory does not exist
--output_dirname OUTPUT_DIRNAME
output directory name for the snooping to store
temporary files under <working_dir>/snooping .Creates
a new directory if the specified working directory
does not exist
--debug_mode_off This option can be used to avoid dumping intermediate
outputs.
--args_config ARGS_CONFIG
Path to a config file with arguments. This can be used
to feed arguments to the AccuracyDebugger as an
alternative to supplying them on the command line.
--remote_server REMOTE_SERVER
ip address of remote machine
--remote_username REMOTE_USERNAME
username of remote machine
--remote_password REMOTE_PASSWORD
password of remote machine
--golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY, --golden_dir_for_mapping GOLDEN_OUTPUT_REFERENCE_DIRECTORY
Optional parameter to indicate the directory of the
goldens, it's used for tensor mapping without running
model with framework runtime.
--disable_offline_prepare
Use this option to disable offline preparation. Note:
By default offline preparation will be done for
DSP/HTP runtimes.
--backend_extension_config BACKEND_EXTENSION_CONFIG
Path to config to be used with qnn-context-binary-
generator. Note: This argument is applicable only when
--executor_type qnn
--context_config_params CONTEXT_CONFIG_PARAMS
optional context config params in a quoted string.
example: --context_config_params
'context_priority=high;
cache_compatibility_mode=strict' Note: This argument
is applicable only when --executor_type qnn
--graph_config_params GRAPH_CONFIG_PARAMS
optional graph config params in a quoted string.
example: --graph_config_params 'graph_priority=low;
graph_profiling_num_executions=10'
--extra_contextbin_args EXTRA_CONTEXTBIN_ARGS
Additional context binary generator arguments in a
quoted string(applicable only when --executor_type
qnn). example: --extra_contextbin_args
'arg1=value1;arg2=value2'
--disable_graph_optimization
Disables basic model optimization
--onnx_custom_op_lib ONNX_CUSTOM_OP_LIB
path to onnx custom operator library
-f FRAMEWORK [FRAMEWORK ...], --framework FRAMEWORK [FRAMEWORK ...]
Framework type and version, version is optional.
Currently supported frameworks are [tensorflow,
tflite, onnx, pytorch]. For example, tensorflow 2.10.1
--min_graph_size MIN_GRAPH_SIZE
Provide the minimum subgraph size
--subgraph_relative_weight SUBGRAPH_RELATIVE_WEIGHT
Helps in deciding whether a sub graph is further
debugged or not. If a subgraph scores > 40 percent of
the aggreagte score of two subgraphs, we investage the
subgraph further.
--verifier VERIFIER Choose verifer among [sqnr, mse] for the comparison
Sample Commands
Sample command to run binary snooping on mv2 large model
qairt-accuracy-debugger\
--snooping binary\
--framework onnx\
--model_path models/mv2/mobilenet-v2.onnx\
--architecture aarch64-android\
--input_list models/mv2/inputs/input_list_1.txt\
--calibration_input_list models/mv2/inputs/input_list_1.txt\
--input_tensor "input.1" 1,3,224,224 /local/mnt/workspace/harsraj/models/mv2/inputs/data1.raw\
--output_tensor "473"\
--engine_path $QAIRT_SDK_ROOT\
--working_dir tmp/QAIRT_BINARY\
--runtime dspv75\
--verifier mse\
--quantization_overrides /local/mnt/workspace/harsraj/models/mv2/quantized_encoding.json\
--min_graph_size 16
Outputs The algorithm provides two JSON files:
graph_result.json (for each subgraph) - Contains verifier scores for two child subgraphs; for example 318_473 has child subgraphs 318_392 and 393_473.
subgraph_result.json (for each subgraph) - Contains the corresponding and sorted verifier scores.
Keys in both files look like “subgraph_start_node_activation_name” + _ + “subgraph_end_node_activation_name”.
For example, 318_473 means a subgraph starts at node activation 318 and ends at node activation 473. Only the subgraph from 318 to 473 is quantized while the rest of the model runs in fp16/32.
Debugging accuracy issues with binary snooping results Subgraphs with maximum verifier scores in subgraph_result.json are the culprit subgraphs.
One subgraph can be a subset of another subgraph. In this case, prioritize a subgraph size you are comfortable debugging. The details of a subset can be found in graph_result.json.
qnn-platform-validator¶
qnn-platform-validator checks the QNN compatibility/capability of a device. The output is saved in a CSV file in the “output” directory, in a csv format. Basic logs are also displayed on the console.
DESCRIPTION:
------------
Helper script to set up the environment for and launch the qnn-platform-
validator executable.
REQUIRED ARGUMENTS:
-------------------
--backend <BACKEND> Specify the backend to validate: <gpu>, <dsp>
<all>.
--directory <DIR> Path to the root of the unpacked SDK directory containing
the executable and library files
--dsp_type <DSP_VERSION> Specify DSP variant: v66 or v68
OPTIONALS ARGUMENTS:
--------------------
--buildVariant <TOOLCHAIN> Specify the build variant
aarch64-android or aarch64-windows-msvc to be validated.
Default: aarch64-android
--testBackend Runs a small program on the runtime and Checks if QNN is supported for
backend.
--deviceId <DEVICE_ID> Uses the device for running the adb command.
Defaults to first device in the adb devices list..
--coreVersion Outputs the version of the runtime that is present on the target.
--libVersion Outputs the library version of the runtime that is present on the target.
--targetPath <DIR> The path to be used on the device.
Defaults to /data/local/tmp/platformValidator
--remoteHost <REMOTEHOST> Run on remote host through remote adb server.
Defaults to localhost.
--debug Set to turn on Debug log
- The following files need to be pushed to the device for the DSP to pass validator test.Note that the stub and skel libraries are specific to the DSP architecture version(e.g., v73):
// Android bin/aarch64-android/qnn-platform-validator lib/aarch64-android/libQnnHtpV73CalculatorStub.so lib/hexagon-${DSP_ARCH}/unsigned/libCalculator_skel.so // Windows bin/aarch64-windows-msvc/qnn-platform-validator.exe lib/aarch64-windows-msvc/QnnHtpV73CalculatorStub.dll lib/hexagon-${DSP_ARCH}/unsigned/libCalculator_skel.so The following example pushes the aarch64-android variant to /data/local/tmp/platformValidator
adb push $SNPE_ROOT/bin/aarch64-android/snpe-platform-validator /data/local/tmp/platformValidator/bin/qnn-platform-validator adb push $SNPE_ROOT/lib/aarch64-android/ /data/local/tmp/platformValidator/lib adb push $SNPE_ROOT/lib/dsp /data/local/tmp/platformValidator/dsp
qnn-profile-viewer¶
The qnn-profile-viewer tool is used to parse profiling data that is generated using qnn-net-run. Additionally, the same data can be saved to a csv file.
usage: qnn-profile-viewer --input_log PROFILING_LOG [--help] [--output=CSV_FILE] [--extract_opaque_objects] [--reader=CUSTOM_READER_SHARED_LIB] [--schematic=SCHEMATIC_BINARY]
Reads profiling logs and outputs the contents to stdout
Note: The IPS calculation takes the following into account: graph execute time, tensor file IO time, and misc. time for quantization, callbacks, etc.
required arguments:
--input_log PROFILING_LOG1,PROFILING_LOG2
Provides a comma-separated list of Profiling log files
optional arguments:
--output PATH
Output file with processed profiling data. File formats vary depending upon the reader used
(see --reader). If not provided, not output is created.
--help Displays this help message.
--reader CUSTOM_READER_SHARED_LIB
Path to a reader library. If not specified, the default reader outputs a CSV file.
--schematic SCHEMATIC_BINARY
Path to the schematic binary file.
Please note that this option is specific to the QnnHtpOptraceProfilingReader library.
--config CONFIG_JSON_FILE
Path to the config json file.
Please note that this option is specific to the QnnHtpOptraceProfilingReader library.
--dlc DLC_FILE
Path to the dlc file.
Please note that this option is specific to the QnnHtpOptraceProfilingReader library.
--zoom_start PROFILE_SUBMODULE_START_NODE
Name of starting node for a profile submodule optrace. If you specify this option you must also specify --zoom_end.
Please note that this option is specific to the QnnHtpOptraceProfilingReader library.
--zoom_end PROFILE_SUBMODULE_END_NODE
Name of ending node for a profile submodule optrace. If you specify this option you must also specify --zoom_start.
Please note that this option is specific to the QnnHtpOptraceProfilingReader library.
--version Displays version information.
--extract_opaque_objects Specifies that the opaque objects will be dumped to output files
qnn-netron (Beta)¶
Overview¶
QNN Netron tool is making model debugging and visualization less daunting. qnn-netron is an extension of the netron graph tool. It provides for easier graph debugging and convenient runtime information. There are currently two key functionalities of the tool:
The Visualize section allows customers to view their desired models after using the QNN Converter by importing the JSON representation of the model
The Diff section allows customers to run networks of their choosing on different runtimes in order to compare network accuracy and performance
Launching Tool¶
Dependencies
The QNN netron tool leverages electron JS framework for building GUI frontend and depends on npm/node_js to be available in system. Additionally, python libraries for accuracy analysis are required by backend of tool. A convenient script is available in the QNN SDK to download necessary dependencies for building and running the tool.
# Note: following command should be run as administrator/root to be able to install system libraries
$ sudo bash ${QNN_SDK_ROOT}/bin/check-linux-dependency.sh
$ ${QNN_SDK_ROOT}/bin/check-python-dependency
Launching Application
qnn-netron script is used to be able to launch the QNN Netron application. This script:
Clones vanilla netron git project
Applies custom patches for enabling Netron for QNN
Build the npm project
Launches application
$ qnn-netron -h
usage: qnn-netron [-h] [-w <working_dir>]
Script to build and launch QNN Netron tool for visualizing and running analysis on Qnn Models.
Optional argument(s):
-w <working_dir> Location for building QNN Netron tool. Default: current_dir
# To build and run application use
$ qnn-netron -w <my_working_dir>
QNN Netron Visualize Deep Dive¶
First, the user is prompted to open a JSON file that represents their converted model. This JSON comes from the converter tool. Please refer to this Overview for more details.
Once the file is loaded into the tool, the graph should be displayed in the UI as shown below:
After loading in the model, the user can click on any of the nodes and a side pop-up section will display node information such as the type and name as well as vital parameter information such as inputs and outputs (datatypes, encodings, and shapes)
Netron Diff Customization Deep Dive¶
Limitations
Diff Tool comparison between source framework goldens only works for framework goldens that are spatial first axis order. (NHWC)
For usecases where source framework golden is used for comparison, Diff Tool is only tested to work for tensorflow and tensorflow variant frameworks.
In order for the user to open the Diff Customization tool, they can either click file and then “Open Diff…” or on tool startup by clicking “Diff…” as shown below:
Upon launch of the Diff Customization tool, at the top, the user is prompted to select a use case for the tool. There are 3 options to choose from:
For the purposes of this documentation, only inference vs inference will be detailed. The setup procedure for the other use cases is similar. The other two use cases are explained below:
Golden vs Inference: Used to test inference run using goldens from a particular ML framework and comparing against the output of a QNN backend
Output vs Output: Used to test existing inference results against ML framework goldens OR used to test differences between two existing inference results
Inference Vs Inference: Used to test inference between two converted QNN models or the same QNN model on different QNN backends
Inference vs Inference¶
If this use case is selected, the user is presented with various form fields for the purposes of running two jobs asynchronously with the option of choosing different runtimes for each QNN network being run.
A more detailed view of what the user is prompted is displayed below:
In order to execute the networks, the user has two options:
Running on Host machine
When the Target Device is selected as “host”, the user can only use the CPU as a runtime. In addition, the user can only select “x86_64-linux-clang” as the architecture in this use case.
Running On-Device
When the Target Device is selected as “on-device”, a Device ID is required to connect to the device via adb. Thereafter, the user can select any of the three QNN backend runtimes available (CPU, GPU, or DSPv[68, 69, 73]) and the user can select architecture “aarch64-android”
After choosing the desired target device and runtime configurations, the rest of the fields are explained in detail below:
Note
Users are able to click again and change the location to any of the path fields
Setup Parameters |
Configurations to Select |
|---|---|
The options for what verifier to run on the outputs of the model are (See Note below table for custom verifier (accuracy + performance) thresholds and see table below for providing custom accuracy verifier hyperparameters): |
RtolAtol, AdjustedRtolAtol, TopK, MeanIOU, L1Error, CosineSimilarity, MSE, SQNR |
Model JSON |
upload <model>_net.json file that was outputted from the QNN converters. |
Model Cpp |
upload <model>.cpp that was outputted from the QNN converters. |
Model Bin |
upload <model>.bin that was outputted from the QNN converters. |
NDK Path |
upload the path to your Android NDK |
Devices Engine Path |
upload the path to the top-level of the unzipped qnn-sdk |
Input List |
provide a path to the input file for the model |
Save Run Configurations |
provide a location where the inference and runtime results from the Diff customization tool will be stored |
Note
Users have the option of providing a custom accuracy and performance verifier threshold when running diff. A custom accuracy verifier threshold can be provided for any of the accuracy verifiers. By default the verifier thresholds are 0.01. The custom thresholds can be provided in the text boxes labelled “Accuracy Threshold” and “Perf Threshold”.
Users now have the option to enter accuracy verifier specific hyperparameters inside textboxes. The Default Values are displayed inside the text-boxes and can be customized as per user needs. The table below highlights the hyperparameters for each verifier that can be customized.
Verifier |
Hyperparameters |
|---|---|
AdjustedRtolAtol |
Number of Levels |
RtolAtol |
Rtol Margin, Atol Margin |
Topk |
K, Ordered |
MeanIOU |
Background Classification |
L1Error |
Multiplier, Scale |
CosineSimilarity |
Multiplier, Scale |
MSE (Mean Square Error) |
N/A |
SQNR (Signal-To-Noise Ratio) |
N/A |
Below is an example of what the fields should look like once filled to completion:
After running the Diff Customization tool, the output directories/files should be present in the working directory file path provided in the last field
Results and Outputs:¶
After pressing the Run button as mentioned above, the visualization of the network should pop-up. Nodes will be highlighted if there are any accuracy and/or performance variations. Clicking on each node will show more information about the accuracy and performance diff information as shown below.
Performance and Accuracy Diff Visualizations:¶
As seen above, the performance and accuracy diff information is shown under the Diff section of any given node. The color of the node boundary in the viewer represents whether a performance or accuracy error (above the default verifier threshold of 0.01) was reported. For example, in the Conv2d node shown below, there are two boundaries of orange and red indicating that this node has both an accuracy and performance difference across the runs. The FullyConnected node shown only has a yellow boundary indicating that only a performance difference was found.
qnn-context-binary-utility¶
The qnn-context-binary-utility tool validates and serializes the metadata of context binary into a json file. This json file can then be used for inspecting the context binary aiding in debugging. A QNN context can be serialized to binary using QNN APIs or qnn-context-binary-generator tool.
usage: qnn-context-binary-utility --context_binary CONTEXT_BINARY_FILE --json_file JSON_FILE_NAME [--help] [--version]
Reads a serialized context binary and validates its metadata.
If --json_file is provided, it outputs the metadata to a json file
required arguments:
--context_binary CONTEXT_BINARY_FILE
Path to cached context binary from which the binary info will be extracted
and written to json.
--json_file JSON_FILE_NAME
Provide path along with the file name <DIR>/<FILE_NAME> to serialize
context binary info into json.
The directory path must exist. File with the FILE_NAME will be created at DIR.
optional arguments:
--help Displays this help message.
--version Displays version information.
Accuracy Evaluator plugins¶
File-based plugins¶
This section lists the built-in file-based plugins.
Dataset plugins¶
create_squad_examples - Extracts examples from given squad dataset file and save them to a file.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
squad_version |
Squad version 1 or 2 |
Integer |
1 |
filter_dataset - Filters the dataset including the input list, calibration and annotation files.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
max_inputs |
Maximum number of inputs in inputlist to be considered for execution |
Integer |
Mandatory |
max_calib |
Maximum number of inputs in calibration to be considered for execution |
Integer |
Mandatory |
random |
Shuffles the inputlist and calibration files |
Boolean |
False |
gpt2_tokenizer - Tokenizes data from files using GPT2TokenizerFast.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
vocab_file |
Path to the vocabulary file |
String |
Mandatory |
merges_file |
Path to the merges file |
String |
Mandatory |
seq_length |
Sequence length for the generated model inputs |
Integer |
Mandatory |
past_seq_length |
Sequence length for the “past” inputs |
Integer |
Mandatory |
past_shape |
Shape of the ‘past’ inputs |
List |
|
num_past |
Number of ‘past’ inputs |
Integer |
0 |
split_txt_data - Saves individual text files for each line present in the given input text file.
Preprocessing plugins¶
centernet_preproc - Performs preprocessing on CenterNet dataset examples.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
dims |
Height and width; comma delimited, e.g., 416,416 |
String |
Mandatory |
scale |
Scale factor for image |
Float |
1.0 |
fix_res |
Resolution of the image |
Boolean |
True |
pad |
Image padding |
Integer |
0 |
convert_nchw - Transposes WHC to CHW or CHW to WHC and adds an extra N dimension.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
expand-dims |
Add the Nth dimension |
Boolean |
True |
create_batch - Concatenates raw input files into a single file using numpy.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
delete_prior |
To delete prior unbatched data to save space |
Boolean |
True |
truncate |
If num inputs are not a multiple of batch size, then truncate left over inputs in the last batch or not |
Boolean |
False |
crop - Center crops an image to the given dimensions using numpy or torchvision based on the library parameter.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
dims |
Height and width; comma delimited, e.g., 640,640 |
String |
Mandatory |
library |
Python library used to crop the given input; valid values are: numpy | torchvision |
String |
numpy |
typecasting_required |
To convert final output to numpy or not. Note: This option is specific to torchvision library |
Boolean |
True |
expand_dims - Adds the N dimension for images, e.g., HWC to NHWC.
image_transformers_input - Creates input files with image and/or text for image transformer models like ViT and CLIP.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
dims |
Expected processed output dimension in CHW format |
String |
Mandatory |
num_base_class |
Number of base classes in classification; used in the scenario where text input is also provided |
Integer |
Total classes available |
num_prompt |
Number of prompts for text classes; used in the scenario where text input is also provided |
Integer |
Total classes available |
image_only |
Data type of raw data |
Boolean |
False |
normalize - Normalizes input per the given scheme; data must be of NHWC format.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
library |
Python library used to crop the given input; valid values are: numpy | torchvision |
String |
numpy |
norm |
Normalization factor, all values divided by norm |
float32 |
255 |
means |
Dictionary of means to be subtracted, e.g., {“R”:0.485, “G”:0.456, “B”:0.406} |
RGB dictionary |
{“R”:0, “G”:0, “B”:0} |
std |
Dictionary of std-dev for rescaling the values, e.g., {“R”:0.229, “G”:0.224, “B”:0.225} |
RGB dictionary |
{“R”:1, “G”:1, “B”:1} |
channel_order |
Channel order to specify means and std values per channel - RGB | BGR |
String |
RGB |
normalize_first |
To perform normalization before or after mean subtraction and standard deviation.
normalize_first=True means perform normalization before.
Note: torchvision library does not use this option
|
Boolean |
True |
typecasting_required |
To convert final output to numpy or not. Note: This option is specific to the Torchvision library |
Boolean |
True |
pil_to_tensor_input |
To convert input to tensor before normalization. Note: This option is specific to the Torchvision library |
Boolean |
True |
onmt_preprocess - Performs preprocessing on WMT dataset for FasterTransformer OpenNMT model
Parameters |
Description |
Type |
Default |
|---|---|---|---|
vocab_path |
Path to OpenNMT model vocabulary file (pickle file) |
String |
Mandatory |
src_seq_len |
The maximum total input sequence length |
Integer |
128 |
skip_sentencepiece |
Skip sentencepiece encoding |
Boolean |
True |
sentencepiece_model_path |
Path to sentencepiece model for WMT dataset (mandatory when “skip_sentencepiece” is False) |
String |
None |
pad - Image padding with constant pad size or based on target dimensions
Parameters |
Description |
Type |
Default |
|---|---|---|---|
type |
|
String |
Mandatory |
dims |
Height and width comma delimited, e.g., 416,416 for ‘target-dims’ type of padding |
String |
Mandatory |
pad_size |
Size of padding for ‘constant’ type of padding |
Integer |
None |
img_position |
Parameter to specify position of image, either ‘center’ or ‘corner’ (top-left). Padding is added accordingly. Currently used for ‘target_dims’ type padding |
String |
center |
color |
Padding value for all planes |
Integer |
114 |
resize - Resizes an image using the specified library parameter: cv2(Default), pillow or torchvision
Parameters |
Description |
Type |
Default |
|---|---|---|---|
dims |
Height and width; comma delimited, e.g., 640,640 |
String |
Mandatory |
library |
Python library to be used for resizing a given input; valid values are: opencv | pillow | torchvision |
String |
opencv |
channel_order |
Convert image to specified channel order. At present this parameter only takes the ‘RGB’ value |
String |
RGB |
interp |
|
String |
For opencv and torchvision: bilinear
For pillow: bicubic
|
type |
|
String |
auto-resize |
resize_before_typecast |
To resize before or after conversion to target datatype e.g., fp32 |
Boolean |
True |
typecasting_required |
To convert final output to numpy or not. Note: This option is specific to the Torchvision library |
Boolean |
True |
mean |
Dictionary of means to be subtracted, e.g., {“R”:0.485, “G”:0.456, “B”:0.406}. Note: This option is specific to the Tensorflow library |
RGB dictionary |
{“R”:0, “G”:0, “B”:0} |
std |
Dictionary of std-dev for rescaling the values, e.g., {“R”:0.229, “G”:0.224, “B”:0.225}. Note: This option is specific to the Tensorflow library |
RGB dictionary |
{“R”:0, “G”:0, “B”:0} |
normalize_before_resize |
To perform normalization before or after mean subtraction and standard deviation. Note: This option is specific to the Tensorflow library |
Boolean |
False |
crop_before_resize |
To perform cropping before resize. Note: This option is specific to the Tensorflow library |
Boolean |
False |
squad_read - Reads the SQuAD dataset JSON file. Preprocesses the question-context pairs into features for language models like BERT-Large
Parameters |
Description |
Type |
Default |
|---|---|---|---|
vocab_path |
Path for local directory containing vocabulary files |
String |
Mandatory |
max_seq_length |
The maximum total input sequence length after WordPiece tokenization. Sequences longer than this will be truncated, and sequences shorter than this will be padded |
Integer |
384 |
max_query_length |
The maximum number of tokens for the question. Questions longer than this will be truncated to this length |
Integer |
64 |
doc_stride |
When splitting up a long document into chunks, how much stride to take between chunks |
Integer |
128 |
packing_strategy |
Set this flag when using packing strategy for bert based models |
Boolean |
False |
max_sequence_per_pack |
The maximum number of sequences which can be packed together |
Integer |
3 |
mask_type |
This can take either of three values - ‘None’, ‘Boolean’ or ‘Compressed’ depending on the masking to be done on input_mask |
String |
None |
compressed_mask_length |
Set this value if mask_type is set to compressed |
Integer |
None |
Postprocessing plugins¶
bert_predict - Predicts answers for a SQuAD dataset given start and end logits.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
vocab_path |
Path for a local directory containing vocabulary files |
String |
Mandatory |
max_seq_length |
The maximum total input sequence length after WordPiece tokenization. Sequences longer than this will be truncated, and sequences shorter than this will be padded (optional if preprocessing is run) |
Integer |
384 |
doc_stride |
When splitting up a long document into chunks, how much stride to take between chunks (optional if preprocessing is run) |
Integer |
128 |
max_query_length |
The maximum number of tokens for the question. Questions longer than this will be truncated to this length (optional if preprocessing is run) |
Integer |
64 |
n_best_size |
The total number of n-best predictions to generate in the post.json output file |
Integer |
20 |
max_answer_length |
The maximum length of an answer that can be generated. This is needed because the start and end predictions are not conditioned on one another |
Integer |
30 |
packing_strategy |
This flag is set to True if using packing strategy |
Boolean |
False |
centerface_postproc - Processes the inference outputs to parse detections and generates a detections file for the metric evaluator. Used for processing CenterFace face detector.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
dims |
Height and width; comma delimited, e.g., 640,640 |
String |
Mandatory |
dtypes |
List of datatypes to be used for bounding boxes, scores, and labels (in order), e.g., [float32, float32, int64]. Defaults to the datatypes fetched from the ‘outputs_info’ for the model’s config.yaml |
List |
Datatypes from the outputs_info section of the model config.yaml |
heatmap_threshold |
User input for heatmap threshold |
Float |
0.05 |
nms_threshold |
User input for nms threshold |
Float |
0.3 |
centernet_postprocess - Processes the inference outputs to parse detections and generate a detections file for the metric evaluator. Used for processing CenterNet detector.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
dtypes |
List of datatypes (at least 3) to be used to infer outputs |
String |
Mandatory |
output_dims |
Height and width; comma delimited, e.g., 640,640 |
String |
Mandatory |
top_k |
Top K proposals are given from the postprocess plugin |
Integer |
100 |
num_classes |
Number of classes |
Integer |
1 |
score |
Threshold to purify the detections |
Integer |
1 |
lprnet_predict - Used for LPRNET license plate prediction.
object_detection - Processes the inference outputs to parse detections and generate a detections file for metric evaluator
Parameters |
Description |
Type |
Default |
|---|---|---|---|
dims |
Height and width; comma delimited, e.g., 640,640 |
String |
Mandatory |
type |
Type of post-processing (e.g., letterbox, stretch) |
String |
None |
label_offset |
Offset for the labels information |
Integer |
0 |
score_threshold |
Threshold limit for the detection scores |
Float |
0.001 |
xywh_to_xyxy |
Convert bounding box format from box center (xywh) to box corner (xyxy) format |
Boolean |
False |
xy_swap |
Swap the X and Y coordinates of bbox |
Boolean |
False |
dtypes |
List of datatypes used for bounding boxes, scores, and labels in order, e.g., [float32, float32, int64]. Defaults to the datatypes fetched from the ‘outputs_info’ for the model’s config.yaml. |
List |
Datatypes from the outputs_info section of the model config.yaml |
mask |
Do postprocessing on mask |
Boolean |
False |
mask_dims |
Output dims of model. Provide this only if mask = True. E.g., 100,80,28,28 |
String |
None |
padded_outputs |
Pad the outputs |
Boolean |
False |
scale |
Comma separated scale values |
String |
‘1’ |
skip_padding |
Skip padding while rescaling to original image shape |
Boolean |
False |
onmt_postprocess - Performs preprocessing for OpenNMT model outputs
Parameters |
Description |
Type |
Default |
|---|---|---|---|
sentencepiece_model_path |
Path to sentencepiece model for WMT dataset |
String |
Mandatory |
unrolled_count |
Upper limit on the unrolls required for the output (no. of output tokens to be considered for metric) |
Integer |
26 |
vocab_path |
Path to OpenNMT model vocabulary file (pickle file), optional if preprocessing is run |
String |
None |
skip_sentencepiece |
Skip sentencepiece encoding, optional if preprocessing is run |
Boolean |
None |
Metric plugins¶
bleu - Evaluates bleu score using sacrebleu library
Parameters |
Description |
Type |
Default |
|---|---|---|---|
round |
Number of decimal places to round the result to |
Integer |
1 |
map_coco - Evaluates the mAP score 50 and 50:05:95 for COCO dataset
Parameters |
Description |
Type |
Default |
|---|---|---|---|
map_80_to_90 |
Mapping of classes in range 0-80 to 0-90 |
Boolean |
False |
segm |
Flag to calculate mAP for mask |
Boolean |
False |
keypoint_map |
Flag to calculate mAP for keypoint |
Boolean |
False |
perplexity - Calculates the perplexity metric. Model outputs are expected to be the logits of proper shape. Ground truth data is expected to be in tokenized format and in the form of token IDs. The ground truth will be automatically generated, if using the “gpt2_tokenizer” dataset plugin.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
logits_index |
Index of the logits output if the model has multiple outputs |
Integer |
0 |
precision - Calculates the precision metric, i.e., (correct predictions / total predictions). Ground truth data is expected in the format “filename <space> correct_text”. The postprocessed model outputs are expected to be text files with just the “predicted_text”.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
round |
Number of decimal places to round the result to |
Integer |
7 |
input_image_index |
For multi input models, the index of image file in input file list csv |
Integer |
0 |
squad_em - Calculates the exact match for SQuAD v1.1 dataset predictions and ground truth.
squad_f1 - Calculates F1 score for SQuAD v1.1 dataset predictions and ground truth.
topk - Evaluates topk value by comparing results and annotations.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
kval |
Top k values, e.g., 1,5 evaluates top1 and top5 |
String |
5 |
softmax_index |
Index of the softmax output in the results file list |
Integer |
0 |
label_offset |
Offset required in the labels’ scores, e,g., if shape is 1x1001, then labels_offset=1 |
Integer |
0 |
round |
Number of decimal places to round the result to |
Integer |
3 |
input_image_index |
For multi input models, the index of image file in input file list csv |
Integer |
0 |
widerface_AP - Computes average precision for easy, medium, and hard cases.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
IoU_threshold |
User input for IoU threshold |
Float |
0.4 |
Memory-based plugins¶
This section lists the built-in memory-based plugins.
Dataset plugins¶
create_squad_examples - Extracts examples from a given squad dataset file and saves them to a file.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
squad_version |
Squad version 1 or 2 |
Integer |
1 |
max_inputs |
Maximum number of inputs in inputlist to be considered for execution |
Integer |
-1 (Complete Dataset) |
max_calib |
Maximum number of inputs in calibration to be considered for execution |
Integer |
-1 (Complete Dataset) |
filter_dataset - Filters the dataset including the input list, calibration, and annotation files.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
max_inputs |
Maximum number of inputs in inputlist to be considered for execution |
Integer |
Mandatory |
max_calib |
Maximum number of inputs in calibration to be considered for execution |
Integer |
Mandatory |
random |
Shuffles the inputlist and calibration files |
Boolean |
False |
tokenize_wikitext_2 - Tokenizes wikitext-2 dataset into model inputs.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
seq_length |
Sequence length for the generated model inputs |
Integer |
Mandatory |
tokenizer_name |
Name of the tokenizer to be used for generating model inputs |
String |
Mandatory |
past_shape |
Shape of the ‘past’ inputs |
List |
0 |
num_past |
Number of ‘past’ inputs |
Integer |
0 |
pos_id |
Flag to configure whether position ids are required |
Bool |
True |
mask_dtype |
Data type of the mask used. |
String |
‘float32’ |
cached_path |
Path to cached tokenizer file (if available) |
String |
split_txt_data - Saves individual text files for each line present in the given input text file.
Preprocessing memory plugins¶
centernet_preproc - Performs preprocessing on CenterNet dataset examples.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
dims |
Height and width; comma delimited, e.g., 416,416 |
String |
Mandatory |
scale |
Scale factor for image |
Float |
1.0 |
fix_res |
Resolution of the image |
Boolean |
True |
pad |
Image padding |
Integer |
0 |
convert_nchw - Transposes WHC to CHW or CHW to WHC and adds an extra N dimension.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
expand-dims |
Add the Nth dimension |
Boolean |
True |
create_batch - Concatenates raw input files into a single file using numpy.
crop - Center crops an image to the given dimensions using numpy or torchvision based on the library parameter.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
dims |
Height and width; comma delimited, e.g., 640,640 |
String |
Mandatory |
library |
Python library used to crop the given input; valid values are: numpy | torchvision |
String |
numpy |
typecasting_required |
To convert final output to numpy or not. Note: This option is specific to torchvision library |
Boolean |
True |
expand_dims - Adds the N dimension for images, e.g., HWC to NHWC.
image_transformers_input - Creates input files with image and/or text for image transformer models like ViT and CLIP. (Note: This plugin requires Pillow package version:10.0.0)
Parameters |
Description |
Type |
Default |
|---|---|---|---|
dims |
Expected processed output dimension in CHW format |
String |
Mandatory |
image_only |
Data type of raw data |
Boolean |
True |
normalize - Normalizes input per the given scheme; data must be of NHWC format.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
library |
Python library used to crop the given input; valid values are: numpy | torchvision |
String |
numpy |
norm |
Normalization factor, all values divided by norm |
float32 |
255.0 |
means |
Dictionary of means to be subtracted, e.g., {“R”:0.485, “G”:0.456, “B”:0.406} |
RGB dictionary |
{“R”:0, “G”:0, “B”:0} |
std |
Dictionary of std-dev for rescaling the values, e.g., {“R”:0.229, “G”:0.224, “B”:0.225} |
RGB dictionary |
{“R”:1, “G”:1, “B”:1} |
channel_order |
Channel order to specify means and std values per channel - RGB | BGR |
String |
‘RGB’ |
normalize_first |
To perform normalization before or after mean subtraction and standard deviation.
normalize_first=True means perform normalization before.
Note: torchvision library does not use this option
|
Boolean |
True |
typecasting_required |
To convert final output to numpy or not. Note: This option is specific to the Torchvision library |
Boolean |
True |
pil_to_tensor_input |
To convert input to tensor before normalization. Note: This option is specific to the Torchvision library |
Boolean |
True |
pad - Image padding with constant pad size or based on target dimensions
Parameters |
Description |
Type |
Default |
|---|---|---|---|
dims |
Height and width comma delimited, e.g., 416,416 for ‘target-dims’ type of padding |
String |
Mandatory |
type |
|
String |
Mandatory |
pad_size |
Size of padding for ‘constant’ type of padding |
Integer |
None |
img_position |
Parameter to specify position of image, either ‘center’ or ‘corner’ (top-left). Padding is added accordingly. Currently used for ‘target_dims’ type padding |
String |
‘center’ |
color |
Padding value for all planes |
Integer |
114 |
resize - Resizes an image using the specified library parameter: cv2(Default), pillow or torchvision
Parameters |
Description |
Type |
Default |
|---|---|---|---|
dims |
Height and width; comma delimited, e.g., 640,640 |
String |
Mandatory |
library |
Python library to be used for resizing a given input; valid values are: opencv | pillow | torchvision |
String |
opencv |
channel_order |
Convert image to specified channel order. At present this parameter only takes the ‘RGB’ value |
String |
RGB |
interp |
|
String |
For opencv and torchvision: bilinear
For pillow: bicubic
|
type |
|
String |
auto-resize |
resize_before_typecast |
To resize before or after conversion to target datatype e.g., fp32 |
Boolean |
True |
typecasting_required |
To convert final output to numpy or not. Note: This option is specific to the Torchvision library |
Boolean |
True |
mean |
Dictionary of means to be subtracted, e.g., {“R”:0.485, “G”:0.456, “B”:0.406}. Note: This option is specific to the Tensorflow library |
RGB dictionary |
{“R”:0, “G”:0, “B”:0} |
std |
Dictionary of std-dev for rescaling the values, e.g., {“R”:0.229, “G”:0.224, “B”:0.225}. Note: This option is specific to the Tensorflow library |
RGB dictionary |
{“R”:0, “G”:0, “B”:0} |
normalize_before_resize |
To perform normalization before or after mean subtraction and standard deviation. Note: This option is specific to the Tensorflow library |
Boolean |
False |
crop_before_resize |
To perform cropping before resize. Note: This option is specific to the Tensorflow library |
Boolean |
False |
norm |
Normalization factor, all values divided by norm |
float32 |
255.0 |
normalize_first |
To perform normalization before or after mean subtraction and standard deviation.
normalize_first=True means perform normalization before.
Note: torchvision library does not use this option
|
Boolean |
True |
squad_preprocess - Reads the processed files created by the create_squad_examples plugin.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
mask_type |
The type of masking to apply. If ‘bool’, boolean masking is applied. If None, no masking is applied. |
String |
None |
Postprocessing memory plugins¶
squad_postprocess - Predicts answers for a SQuAD dataset for the given start and end scores.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
packing_strategy |
This flag is set to True if using packing strategy |
Boolean |
False |
centerface_postproc - Processes the inference outputs to parse detections and generates a detections file for the metric evaluator. Used for processing CenterFace face detector.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
dims |
Height and width; comma delimited, e.g., 640,640 |
String |
Mandatory |
heatmap_threshold |
User input for heatmap threshold |
Float |
0.05 |
nms_threshold |
User input for nms threshold |
Float |
0.3 |
centernet_postprocess - Processes the inference outputs to parse detections and generate a detections file for the metric evaluator. Used for processing CenterNet detector.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
output_dims |
Height and width; comma delimited, e.g., 640,640 |
String |
Mandatory |
top_k |
Top K proposals are given from the postprocess plugin |
Integer |
100 |
num_classes |
Number of classes |
Integer |
1 |
score |
Threshold to purify the detections |
Integer |
1 |
lprnet_predict - Used for LPRNET license plate prediction.
object_detection - Processes the inference outputs to parse detections and generate a detections file for metric evaluator
Parameters |
Description |
Type |
Default |
|---|---|---|---|
dims |
Height and width; comma delimited, e.g., 640,640 |
String |
Mandatory |
type |
Type of post-processing (e.g., letterbox, stretch) |
String |
None |
label_offset |
Offset for the labels information |
Integer |
0 |
score_threshold |
Threshold limit for the detection scores |
Float |
0.001 |
xywh_to_xyxy |
Convert bounding box format from box center (xywh) to box corner (xyxy) format |
Boolean |
False |
xy_swap |
Swap the X and Y coordinates of bbox |
Boolean |
False |
dtypes |
List of datatypes used for bounding boxes, scores, and labels in order, e.g., [float32, float32, int64]. Defaults to the datatypes fetched from the ‘outputs_info’ for the model’s config.yaml. |
List |
Datatypes from the outputs_info section of the model config.yaml |
mask |
Do postprocessing on mask |
Boolean |
False |
mask_dims |
Output dims of model. Provide this only if mask = True. E.g., 100,80,28,28 |
String |
None |
padded_outputs |
Pad the outputs |
Boolean |
False |
scale |
Comma separated scale values |
String |
‘1’ |
skip_padding |
Skip padding while rescaling to original image shape |
Boolean |
False |
Metric memory plugins¶
map_coco - Evaluates the mAP score 50 and 50:05:95 for COCO dataset
Parameters |
Description |
Type |
Default |
|---|---|---|---|
map_80_to_90 |
Mapping of classes in range 0-80 to 0-90 |
Boolean |
False |
segm |
Flag to calculate mAP for mask |
Boolean |
False |
keypoint_map |
Flag to calculate mAP for keypoint |
Boolean |
False |
data |
Dataset used for evaluation. data must be one of ‘openimages’ or ‘coco’ |
String |
‘coco’ |
perplexity - Calculates the perplexity metric. Model outputs are expected to be the logits of proper shape. Ground truth data is expected to be in tokenized format and in the form of token IDs. The ground truth will be automatically generated, if using the “gpt2_tokenizer” dataset plugin.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
logits_index |
Index of the logits output if the model has multiple outputs |
Integer |
0 |
precision - Calculates the precision metric, i.e., (correct predictions / total predictions). Ground truth data is expected in the format “filename <space> correct_text”. The postprocessed model outputs are expected to be text files with just the “predicted_text”.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
round |
Number of decimal places to round the result to |
Integer |
7 |
input_image_index |
For multi input models, the index of image file in input file list csv |
Integer |
0 |
squad_eval - Calculates F1 score and exact match scores for SQuAD dataset based on predictions and ground truth.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
vocabulary |
|
String |
Mandatory |
max_answer_length |
The maximum length of an answer, after tokenization. In SQuAD v2 this was set to 30 tokens; in SQuAD v1 it was not specified so a default value of 30 was used. |
Integer |
30 |
n_best_size |
Specifies how many of the possible answers to return for a given question along with corresponding confidence scores. |
Integer |
20 |
do_lower_case |
Whether or not to lowercase all text before processing. |
Bool |
False |
squad_version |
Indicates which version of SQuAD style questions and answers we’re dealing with (“v1” or “v2”). |
Integer |
1 |
round |
Number of decimal places to round the result to |
Integer |
6 |
cached_vocab_path |
Path to cached vocab_file to be used for creating tokenizer |
String |
None |
topk - Evaluates topk value by comparing results and annotations.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
kval |
Top k values, e.g., 1,5 evaluates top1 and top5 |
String |
‘1,5’ |
softmax_index |
Index of the softmax output in the results file list |
Integer |
0 |
label_offset |
Offset required in the labels’ scores, e,g., if shape is 1x1001, then labels_offset=1 |
Integer |
0 |
round |
Number of decimal places to round the result to |
Integer |
3 |
input_image_index |
For multi input models, the index of image file in input file list csv |
Integer |
0 |
widerface_AP - Computes average precision for easy, medium, and hard cases.
Parameters |
Description |
Type |
Default |
|---|---|---|---|
IoU_threshold |
User input for IoU threshold |
Float |
0.4 |
SDK Compatibility Verification¶
The model generated by the converter should be inferred by net-run tools from the same SDK as the converter. We can quickly check the SDK info of model.cpp/model.so by running these string grep commands:
strings model.cpp | grep qaisw
strings libqnn_model.so | grep qaisw