Tools

This page describes the various SDK tools and feature for Linux/Android and Windows developers. For the integration flow of different developers, please refer to Overview page for further information.

Category

Tool

Developer

Linux/Android

Windows

Ubuntu

WSL x86

Device

WSL x86

Windows x86_64

Windows on Snapdragon

Model Conversion

qnn-tensorflow-converter

YES

YES

YES

YES

YES**

qnn-tflite-converter

YES

YES

YES

qnn-pytorch-converter

YES

YES

YES

qnn-onnx-converter

YES

YES

YES

YES

YES**

Model Preparation

Quantization Support

YES

YES

YES

YES

YES

qnn-model-lib-generator

YES

YES

YES

YES

qnn-op-package-generator

YES

YES

YES

qnn-context-binary-generator

YES

YES

YES

YES

YES

YES

Execution

qnn-net-run

YES

YES

YES

YES

YES

qnn-throughput-net-run

YES

YES

YES

YES

Analysis

qairt-accuracy-evaluator (Beta)

YES

qnn-architecture-checker (Beta)

YES

YES

YES

YES

YES**

qnn-accuracy-debugger (Beta)

YES

YES

YES***

YES

qairt-accuracy-debugger (Beta)

YES

YES

YES***

YES

qnn-platform-validator

YES

YES

YES

qnn-profile-viewer

YES

YES

YES

YES*

YES*

Benchmarking

YES

qnn-netron (Beta)

YES

qnn-context-binary-utility

YES

Note

The Beta designation indicates pre-production quality. This means that the component is currently undergoing more rigorous testing and may not fully satisfy compatibility requirements as expected in the production version. In other words, incompatible changes (such as alterations in behavior or interface) between releases are allowed without prior notice, although every effort is made to minimize such changes.

Note

* When using converter tools in Windows PowerShell, make sure a virtual environment with the required python packages (see Setup for more details) is activated and converters are executed via python, as shown in the following example.
(venv-3.10) > python qnn-onnx-converter <options>

Note

  • Extension naming of library: For Windows developers, please replace all ‘.so’ files with the analogous ‘.dll’ file in the following sections. Please refer to Platform Differences for more details.

  • For more detailed information on converters please refer to Converters.

  • [*] libQnnGpuProfilingReader.dll is not supported on Windows platform for qnn-profile-viewer.

  • [**] Requires the python scripts and the executables from the Windows x86_64 binary folder(bin\x86_64-windows-msvc).

  • [***] Accuracy debugger on Windows x86 system is tested only for CPU runtime currently.

Model Conversion

qnn-tensorflow-converter

The qnn-tensorflow-converter tool converts a model from the TensorFlow framework to a CPP file representing the model as a series of QNN API calls. Additionally, a binary file containing static weights of the model is produced.

usage: qnn-tensorflow-converter -d INPUT_NAME INPUT_DIM --out_node OUT_NAMES
                                [--input_type INPUT_NAME INPUT_TYPE]
                                [--input_dtype INPUT_NAME INPUT_DTYPE] [--input_encoding  ...]
                                [--input_layout INPUT_NAME INPUT_LAYOUT] [--custom_io CUSTOM_IO]
                                [--show_unconsumed_nodes] [--saved_model_tag SAVED_MODEL_TAG]
                                [--saved_model_signature_key SAVED_MODEL_SIGNATURE_KEY]
                                [--quantization_overrides QUANTIZATION_OVERRIDES]
                                [--keep_quant_nodes] [--disable_batchnorm_folding]
                                [--expand_lstm_op_structure]
                                [--keep_disconnected_nodes] [--input_list INPUT_LIST]
                                [--param_quantizer PARAM_QUANTIZER] [--act_quantizer ACT_QUANTIZER]
                                [--algorithms ALGORITHMS [ALGORITHMS ...]]
                                [--bias_bitwidth BIAS_BITWIDTH] [--bias_bw BIAS_BW]
                                [--act_bitwidth ACT_BITWIDTH] [--act_bw ACT_BW]
                                [--weights_bitwidth WEIGHTS_BITWIDTH] [--weight_bw WEIGHT_BW]
                                [--float_bias_bitwidth FLOAT_BIAS_BITWIDTH] [--ignore_encodings]
                                [--use_per_channel_quantization] [--use_per_row_quantization]
                                [--float_fallback] [--use_native_input_files] [--use_native_dtype]
                                [--use_native_output_files] [--disable_relu_squashing]
                                [--restrict_quantization_steps ENCODING_MIN, ENCODING_MAX]
                                --input_network INPUT_NETWORK [--debug [DEBUG]]
                                [-o OUTPUT_PATH] [--copyright_file COPYRIGHT_FILE]
                                [--float_bitwidth FLOAT_BITWIDTH] [--float_bw FLOAT_BW]
                                [--float_bias_bw FLOAT_BIAS_BW] [--overwrite_model_prefix]
                                [--exclude_named_tensors] [--op_package_lib OP_PACKAGE_LIB]
                                [--converter_op_package_lib CONVERTER_OP_PACKAGE_LIB]
                                [-p PACKAGE_NAME | --op_package_config CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]]
                                [-h] [--arch_checker]

Script to convert TF model into QNN

required arguments:
  -d INPUT_NAME INPUT_DIM, --input_dim INPUT_NAME INPUT_DIM
                        The names and dimensions of the network input layers specified in the format
                        [input_name comma-separated-dimensions], for example:
                            'data' 1,224,224,3
                        Note that the quotes should always be included in order to
                        handlespecial characters, spaces, etc.
                        For multiple inputs specify multiple --input_dim on the command line like:
                            --input_dim 'data1' 1,224,224,3 --input_dim 'data2' 1,50,100,3
  --out_node OUT_NODE,  --out_name OUT_NAMES
                        Name of the graph's output nodes. Multiple output nodes should be
                        provided separately like:
                            --out_node out_1 --out_node out_2
  --input_network INPUT_NETWORK, -i INPUT_NETWORK
                        Path to the source framework model.

optional arguments:
  --input_type INPUT_NAME INPUT_TYPE, -t INPUT_NAME INPUT_TYPE
                        Type of data expected by each input op/layer. Type for each input is
                        |default| if not specified. For example: "data" image.Note that the quotes
                        should always be included in order to handle special characters, spaces,etc.
                        For multiple inputs specify multiple --input_type on the command line.
                        Eg:
                            --input_type "data1" image --input_type "data2" opaque
                        These options get used by DSP runtime and following descriptions state how
                        input will be handled for each option.
                        Image:
                        Input is float between 0-255 and the input's mean is 0.0f and the input's
                        max is 255.0f. We will cast the float to uint8ts and pass the uint8ts to the
                        DSP.
                        Default:
                        Pass the input as floats to the dsp directly and the DSP will quantize it.
                        Opaque:
                        Assumes input is float because the consumer layer(i.e next layer) requires
                        it as float, therefore it won't be quantized.
                        Choices supported:
                            image
                            default
                            opaque
  --input_dtype INPUT_NAME INPUT_DTYPE
                        The names and datatype of the network input layers specified in the format
                        [input_name datatype], for example:
                            'data' 'float32'.
                        Default is float32 if not specified.
                        Note that the quotes should always be included in order to handle special
                        characters, spaces, etc.
                        For multiple inputs specify multiple --input_dtype on the command line like:
                            --input_dtype 'data1' 'float32' --input_dtype 'data2' 'float32'
  --input_encoding INPUT_ENCODING [INPUT_ENCODING ...], -e INPUT_ENCODING [INPUT_ENCODING ...]
                        Usage:     --input_encoding "INPUT_NAME" INPUT_ENCODING_IN
                        [INPUT_ENCODING_OUT]
                        Input encoding of the network inputs. Default is bgr.
                        e.g.
                            --input_encoding "data" rgba
                        Quotes must wrap the input node name to handle special characters,
                        spaces, etc. To specify encodings for multiple inputs, invoke
                        --input_encoding for each one.
                        e.g.
                            --input_encoding "data1" rgba --input_encoding "data2" other
                        Optionally, an output encoding may be specified for an input node by
                        providing a second encoding. The default output encoding is bgr.
                        e.g.
                            --input_encoding "data3" rgba rgb
                        Input encoding types:
                            image color encodings: bgr,rgb, nv21, nv12, ...
                            time_series: for inputs of rnn models;
                            other: not available above or is unknown.
                        Supported encodings:
                            bgr
                            rgb
                            rgba
                            argb32
                            nv21
                            nv12
                            time_series
                            other
  --input_layout INPUT_NAME INPUT_LAYOUT, -l INPUT_NAME INPUT_LAYOUT
                        Layout of each input tensor. If not specified, it will use the default
                        based on the Source Framework, shape of input and input encoding.
                        Accepted values are-
                            NCDHW, NDHWC, NCHW, NHWC, NFC, NCF, NTF, TNF, NF, NC, F, NONTRIVIAL
                        N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature, T = Time
                        NDHWC/NCDHW used for 5d inputs
                        NHWC/NCHW used for 4d image-like inputs
                        NFC/NCF used for inputs to Conv1D or other 1D ops
                        NTF/TNF used for inputs with time steps like the ones used for LSTM op
                        NF used for 2D inputs, like the inputs to Dense/FullyConnected layers
                        NC used for 2D inputs with 1 for batch and other for Channels (rarely used)
                        F used for 1D inputs, e.g. Bias tensor
                        NONTRIVIAL for everything elseFor multiple inputs specify multiple
                        --input_layout on the command line.
                        Eg:
                           --input_layout "data1" NCHW --input_layout "data2" NCHW
  --custom_io CUSTOM_IO
                        Use this option to specify a yaml file for custom IO
  --show_unconsumed_nodes
                        Displays a list of unconsumed nodes, if there any are found. Nodes which are
                        unconsumed do not violate the structural fidelity of thegenerated graph.
  --saved_model_tag SAVED_MODEL_TAG
                        Specify the tag to seletet a MetaGraph from savedmodel. ex:
                        --saved_model_tag serve. Default value will be 'serve' when it is not
                        assigned.
  --saved_model_signature_key SAVED_MODEL_SIGNATURE_KEY
                        Specify signature key to select input and output of the model. ex:
                        --saved_model_signature_key serving_default. Default value will be
                        'serving_default' when it is not assigned
  --disable_batchnorm_folding
  --expand_lstm_op_structure
                        Enables optimization that breaks the LSTM op to equivalent math ops
  --keep_disconnected_nodes
                        Disable Optimization that removes Ops not connected to the main graph.
                        This optimization uses output names provided over commandline OR
                        inputs/outputs extracted from the Source model to determine the main graph
  --debug [DEBUG]       Run the converter in debug mode.
  -o OUTPUT_PATH, --output_path OUTPUT_PATH
                        Path where the converted Output model should be saved.If not specified, the
                        converter model will be written to a file with same name as the input model
  --copyright_file COPYRIGHT_FILE
                        Path to copyright file. If provided, the content of the file will be added
                        to the output model.
  --float_bitwidth FLOAT_BITWIDTH
                        Selects the bitwidth to use when using float for parameters (weights/bias)
                        and activations for all ops or a specific op (via encodings) selected
                        through encoding; 32 (default) or 16.
  --float_bw FLOAT_BW   Deprecated; use --float_bitwidth.
  --float_bias_bw FLOAT_BIAS_BW
                        Deprecated; use --float_bias_bitwidth.
  --overwrite_model_prefix
                        If option passed, model generator will use the output path name to use as
                        model prefix to name functions in <qnn_model_name>.cpp. (Useful for running
                        multiple models at once) eg: ModelName_composeGraphs. Default is to use
                        generic "QnnModel_".
  --exclude_named_tensors
                        Remove using source framework tensorNames; instead use a counter for naming
                        tensors. Note: This can potentially help to reduce the final model library
                        that will be generated(Recommended for deploying model). Default is False.
  -h, --help            show this help message and exit

Quantizer Options:
  --quantization_overrides QUANTIZATION_OVERRIDES
                        Use this option to specify a json file with parameters to use for
                        quantization. These will override any quantization data carried from
                        conversion (eg TF fake quantization) or calculated during the normal
                        quantization process. Format defined as per AIMET specification.
  --keep_quant_nodes    Use this option to keep activation quantization nodes in the graph rather
                        than stripping them.
  --input_list INPUT_LIST
                        Path to a file specifying the input data. This file should be a plain text
                        file, containing one or more absolute file paths per line. Each path is
                        expected to point to a binary file containing one input in the "raw" format,
                        ready to be consumed by the quantizer without any further preprocessing.
                        Multiple files per line separated by spaces indicate multiple inputs to the
                        network. See documentation for more details. Must be specified for
                        quantization. All subsequent quantization options are ignored when this is
                        not provided.
  --param_quantizer PARAM_QUANTIZER
                        Optional parameter to indicate the weight/bias quantizer to use. Must be followed by one of the following options:
                        "tf": Uses the real min/max of the data and specified bitwidth (default).
                        "enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
                        "adjusted": Deprecated.
                        "symmetric": Ensures min and max have the same absolute values about zero.
                                     Data will be stored as int#_t data such that the offset is always 0.
  --act_quantizer ACT_QUANTIZER
                        Optional parameter to indicate the activation quantizer to use. Must be followed by one of the following options:
                        "tf": Uses the real min/max of the data and specified bitwidth (default).
                        "enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
                        "adjusted": Deprecated.
                        "symmetric": Ensures min and max have the same absolute values about zero.
                                     Data will be stored as int#_t data such that the offset is always 0.
  --algorithms ALGORITHMS [ALGORITHMS ...]
                        Use this option to enable new optimization algorithms. Usage is:
                            --algorithms <algo_name1> ... The available optimization algorithms are:
                        "cle" - Cross layer equalization includes a number of methods for equalizing
                        weights and biases across layers in order to rectify imbalances that cause
                        quantization errors.
  --bias_bitwidth BIAS_BITWIDTH
                        Selects the bitwidth to use when quantizing the biases; 8 (default) or 32.
  --bias_bw BIAS_BW     Deprecated; use --bias_bitwidth.
  --act_bitwidth ACT_BITWIDTH
                        Selects the bitwidth to use when quantizing the activations; 8 (default) or 16.
  --act_bw ACT_BW       Deprecated; use --act_bitwidth.
  --weights_bitwidth WEIGHTS_BITWIDTH
                        Selects the bitwidth to use when quantizing the weights; 4 or 8 (default).
  --weight_bw WEIGHT_BW
                        Deprecated; use --weights_bitwidth.
  --float_bias_bitwidth FLOAT_BIAS_BITWIDTH
                        Selects the bitwidth to use when biases are in float; 32 or 16.
  --ignore_encodings    Use only quantizer generated encodings, ignoring any user or model provided
                        encodings.
                        Note: Cannot use --ignore_encodings with --quantization_overrides
  --use_per_channel_quantization
                        Enables per-channel quantization for convolution-based op weights.
                        This replaces the built-in model QAT encodings when used for a given weight.
  --use_per_row_quantization
                        Enables row wise quantization of Matmul and FullyConnected ops.
  --float_fallback      Enables fallback to floating point (FP) instead of fixed point.
                        This option can be paired with --float_bitwidth to indicate the bitwidth for FP (by default 32).
                        If this option is enabled, --input_list must not be provided and --ignore_encodings must not be provided.
                        The external quantization encodings (encoding file/FakeQuant encodings) might be missing
                        quantization parameters for some interim tensors. First it will try to fill the gaps by
                        propagating across math-invariant functions. If the quantization params are still missing,
                        it applies fallback to nodes to floating point.
  --use_native_input_files
                        Boolean flag to indicate how to read input files:
                        1. float (default): reads inputs as floats and quantizes if necessary based
                        on quantization parameters in the model.
                        2. native: reads inputs assuming the data type to be native to the
                        model. For ex., uint8_t.
  --use_native_dtype    Note: This option is deprecated, use --use_native_input_files option in
                        future.
                        Boolean flag to indicate how to read input files:
                        1. float (default): reads inputs as floats and quantizes if necessary based
                        on quantization parameters in the model.
                        2. native: reads inputs assuming the data type to be native to the
                        model. For ex., uint8_t.
  --use_native_output_files
                        Use this option to indicate the data type of the output files
                        1. float (default): output the file as floats.
                        2. native: outputs the file that is native to the model. For ex.,
                        uint8_t.
  --disable_relu_squashing
                        Disables squashing of ReLU against convolution-based ops for quantized models.
  --restrict_quantization_steps ENCODING_MIN, ENCODING_MAX
                        Specifies the number of steps to use for computing quantization encodings
                        such that scale = (max - min) / number of quantization steps.
                        The option should be passed as a space separated pair of hexadecimal string
                        minimum and maximum values. i.e. --restrict_quantization_steps "MIN MAX".
                        Please note that this is a hexadecimal string literal and not a signed
                        integer, to supply a negative value an explicit minus sign is required.
                        E.g.--restrict_quantization_steps "-0x80 0x7F" indicates an example 8 bit range,
                        --restrict_quantization_steps "-0x8000 0x7F7F" indicates an example 16
                        bit range. This argument is required for 16-bit Matmul operations.

 Custom Op Package Options:
  --op_package_lib OP_PACKAGE_LIB, -opl OP_PACKAGE_LIB
                        Use this argument to pass an op package library for quantization. Must be in
                        the form
                        <op_package_lib_path:interfaceProviderName> and be separated by a
                        comma for multiple package libs
  -p PACKAGE_NAME, --package_name PACKAGE_NAME
                        A global package name to be used for each node in the Model.cpp file.
                        Defaults to Qnn header defined package name
  --converter_op_package_lib CONVERTER_OP_PACKAGE_LIB, -cpl CONVERTER_OP_PACKAGE_LIB
                        Absolute path to converter op package library compiled by the OpPackage
                        generator. Must be separated by a comma for multiple package libraries.
                        Note: Libraries must follow the same order as the xml files.
                        E.g.1: --converter_op_package_lib absolute_path_to/libExample.so
                        E.g.2: -cpl absolute_path_to/libExample1.so,absolute_path_to/libExample2.so
  --op_package_config OP_PACKAGE_CONFIG [OP_PACKAGE_CONFIG ...], -opc OP_PACKAGE_CONFIG [OP_PACKAGE_CONFIG ...]
                        Path to a Qnn Op Package XML configuration file that contains user defined
                        custom operations.

Architecture Checker Options(Experimental):
  --arch_checker        Note: This option will be soon deprecated. Use the qnn-architecture-checker tool to achieve the same result.

Note: Only one of: {'package_name', 'op_package_config'} can be specified

Basic command line usage looks like:

$ qnn-tensorflow-converter -i <path>/frozen_graph.pb
                    -d <network_input_name> <dims>
                    --out_node <network_output_name>
                    -o <optional_output_path>
                    --allow_unconsumed_nodes  # optional, but most likely will be need for larger models
                    -p <optional_package_name> # Defaults to "qti.aisw"

qnn-tflite-converter

The qnn-tflite-converter tool converts a TFLite model to a CPP file representing the model as a series of QNN API calls. Additionally, a binary file containing static weights of the model is produced.

usage: qnn-tflite-converter [-d INPUT_NAME INPUT_DIM] [--signature_name SIGNATURE_NAME]
                            [--out_node OUT_NAMES] [--input_type INPUT_NAME INPUT_TYPE]
                            [--input_dtype INPUT_NAME INPUT_DTYPE] [--input_encoding  ...]
                            [--input_layout INPUT_NAME INPUT_LAYOUT] [--custom_io CUSTOM_IO]
                            [--dump_relay DUMP_RELAY]
                            [--quantization_overrides QUANTIZATION_OVERRIDES] [--keep_quant_nodes]
                            [--disable_batchnorm_folding] [--expand_lstm_op_structure]
                            [--keep_disconnected_nodes]
                            [--input_list INPUT_LIST] [--param_quantizer PARAM_QUANTIZER]
                            [--act_quantizer ACT_QUANTIZER]
                            [--algorithms ALGORITHMS [ALGORITHMS ...]]
                            [--bias_bitwidth BIAS_BITWIDTH] [--bias_bw BIAS_BW]
                            [--act_bitwidth ACT_BITWIDTH] [--act_bw ACT_BW]
                            [--weights_bitwidth WEIGHTS_BITWIDTH] [--weight_bw WEIGHT_BW]
                            [--float_bias_bitwidth FLOAT_BIAS_BITWIDTH] [--ignore_encodings]
                            [--use_per_channel_quantization] [--use_per_row_quantization]
                            [--float_fallback] [--use_native_input_files] [--use_native_dtype]
                            [--use_native_output_files] [--disable_relu_squashing]
                            [--restrict_quantization_steps ENCODING_MIN, ENCODING_MAX]
                            --input_network INPUT_NETWORK [--debug [DEBUG]]
                            [-o OUTPUT_PATH] [--copyright_file COPYRIGHT_FILE]
                            [--float_bitwidth FLOAT_BITWIDTH] [--float_bw FLOAT_BW]
                            [--float_bias_bw FLOAT_BIAS_BW] [--overwrite_model_prefix]
                            [--exclude_named_tensors] [--op_package_lib OP_PACKAGE_LIB]
                            [--converter_op_package_lib CONVERTER_OP_PACKAGE_LIB]
                            [-p PACKAGE_NAME | --op_package_config CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]]
                            [-h] [--arch_checker]

Script to convert TFLite model into QNN

required arguments:
  --input_network INPUT_NETWORK, -i INPUT_NETWORK
                        Path to the source framework model.

optional arguments:
  -d INPUT_NAME INPUT_DIM, --input_dim INPUT_NAME INPUT_DIM
                        The names and dimensions of the network input layers specified in the format
                        [input_name comma-separated-dimensions], for example:
                            'data' 1,224,224,3
                        Note that the quotes should always be included in order to handle special
                        characters, spaces, etc.
                        For multiple inputs specify multiple --input_dim on the command line like:
                            --input_dim 'data1' 1,224,224,3 --input_dim 'data2' 1,50,100,3
  --signature_name SIGNATURE_NAME, -sn SIGNATURE_NAME
                        Specifies a specific subgraph signature to convert.
  --out_node OUT_NAMES, --out_name OUT_NAMES
                        Name of the graph's output Tensor Names. Multiple output names should be
                        provided separately like:
                            --out_name out_1 --out_name out_2
  --input_type INPUT_NAME INPUT_TYPE, -t INPUT_NAME INPUT_TYPE
                        Type of data expected by each input op/layer. Type for each input is
                        |default| if not specified. For example: "data" image.Note that the quotes
                        should always be included in order to handle special characters, spaces,etc.
                        For multiple inputs specify multiple --input_type on the command line.
                        Eg:
                            --input_type "data1" image --input_type "data2" opaque
                        These options get used by DSP runtime and following descriptions state how
                        input will be handled for each option.
                        Image:
                        Input is float between 0-255 and the input's mean is 0.0f and the input's
                        max is 255.0f. We will cast the float to uint8ts and pass the uint8ts to the
                        DSP.
                        Default:
                        Pass the input as floats to the dsp directly and the DSP will quantize it.
                        Opaque:
                        Assumes input is float because the consumer layer(i.e next layer) requires
                        it as float, therefore it won't be quantized.
                        Choices supported:
                            image
                            default
                            opaque
  --input_dtype INPUT_NAME INPUT_DTYPE
                        The names and datatype of the network input layers specified in the format
                        [input_name datatype], for example:
                            'data' 'float32'
                        Default is float32 if not specified
                        Note that the quotes should always be included in order to handlespecial
                        characters, spaces, etc.
                        For multiple inputs specify multiple --input_dtype on the command line like:
                            --input_dtype 'data1' 'float32' --input_dtype 'data2' 'float32'
  --input_encoding INPUT_ENCODING [INPUT_ENCODING ...], -e INPUT_ENCODING [INPUT_ENCODING ...]
                        Usage:     --input_encoding "INPUT_NAME" INPUT_ENCODING_IN
                        [INPUT_ENCODING_OUT]
                        Input encoding of the network inputs. Default is bgr.
                        e.g.
                            --input_encoding "data" rgba
                        Quotes must wrap the input node name to handle special characters,
                        spaces, etc. To specify encodings for multiple inputs, invoke
                        --input_encoding for each one.
                        e.g.
                            --input_encoding "data1" rgba --input_encoding "data2" other
                        Optionally, an output encoding may be specified for an input node by
                        providing a second encoding. The default output encoding is bgr.
                        e.g.
                            --input_encoding "data3" rgba rgb
                        Input encoding types:
                            image color encodings: bgr,rgb, nv21, nv12, ...
                            time_series: for inputs of rnn models;
                            other: not available above or is unknown.
                        Supported encodings:
                            bgr
                            rgb
                            rgba
                            argb32
                            nv21
                            nv12
                            time_series
                            other
  --input_layout INPUT_NAME INPUT_LAYOUT, -l INPUT_NAME INPUT_LAYOUT
                        Layout of each input tensor. If not specified, it will use the default
                        based on the Source Framework, shape of input and input encoding.
                        Accepted values are-
                            NCDHW, NDHWC, NCHW, NHWC, NFC, NCF, NTF, TNF, NF, NC, F, NONTRIVIAL
                        N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature, T = Time
                        NDHWC/NCDHW used for 5d inputs
                        NHWC/NCHW used for 4d image-like inputs
                        NFC/NCF used for inputs to Conv1D or other 1D ops
                        NTF/TNF used for inputs with time steps like the ones used for LSTM op
                        NF used for 2D inputs, like the inputs to Dense/FullyConnected layers
                        NC used for 2D inputs with 1 for batch and other for Channels (rarely used)
                        F used for 1D inputs, e.g. Bias tensor
                        NONTRIVIAL for everything elseFor multiple inputs specify multiple
                        --input_layout on the command line.
                        Eg:
                           --input_layout "data1" NCHW --input_layout "data2" NCHW
  --custom_io CUSTOM_IO
                        Use this option to specify a yaml file for custom IO.
  --dump_relay DUMP_RELAY
                        Dump Relay ASM and Params at the path provided with the argument
                        Usage: --dump_relay <path_to_dump>
  --show_unconsumed_nodes
                        Displays a list of unconsumed nodes, if there any are
                        found. Nodes which are unconsumed do not violate the
                        structural fidelity of the generated graph.
  --disable_batchnorm_folding
  --expand_lstm_op_structure
                        Enables optimization that breaks the LSTM op to equivalent math ops
  --keep_disconnected_nodes
                        Disable Optimization that removes Ops not connected to the main graph.
                        This optimization uses output names provided over commandline OR
                        inputs/outputs extracted from the Source model to determine the main graph
  -o OUTPUT_PATH, --output_path OUTPUT_PATH
                        Path where the converted Output model should be saved.If not specified, the
                        converter model will be written to a file with same name as the input model
  --copyright_file COPYRIGHT_FILE
                        Path to copyright file. If provided, the content of the file will be added
                        to the output model.
  --float_bitwidth FLOAT_BITWIDTH
                        Selects the bitwidth to use when using float for parameters (weights/bias)
                        and activations for all ops or a specific op (via encodings) selected
                        through encoding; 32 (default) or 16.
  --float_bw FLOAT_BW   Deprecated; use --float_bitwidth.
  --float_bias_bw FLOAT_BIAS_BW
                        Deprecated; use --float_bias_bitwidth.
  --overwrite_model_prefix
                        If option passed, model generator will use the output path name to use as
                        model prefix to name functions in <qnn_model_name>.cpp. (Useful for running
                        multiple models at once) eg: ModelName_composeGraphs. Default is to use
                        generic "QnnModel_".
  --exclude_named_tensors
                        Remove using source framework tensorNames; instead use a counter for naming
                        tensors. Note: This can potentially help to reduce the final model library
                        that will be generated(Recommended for deploying model). Default is False.
  -h, --help            show this help message and exit

Quantizer Options:
  --quantization_overrides QUANTIZATION_OVERRIDES
                        Use this option to specify a json file with parameters to use for
                        quantization. These will override any quantization data carried from
                        conversion (eg TF fake quantization) or calculated during the normal
                        quantization process. Format defined as per AIMET specification.
  --keep_quant_nodes    Use this option to keep activation quantization nodes in the graph rather
                        than stripping them.
  --input_list INPUT_LIST
                        Path to a file specifying the input data. This file should be a plain text
                        file, containing one or more absolute file paths per line. Each path is
                        expected to point to a binary file containing one input in the "raw" format,
                        ready to be consumed by the quantizer without any further preprocessing.
                        Multiple files per line separated by spaces indicate multiple inputs to the
                        network. See documentation for more details. Must be specified for
                        quantization. All subsequent quantization options are ignored when this is
                        not provided.
  --param_quantizer PARAM_QUANTIZER
                        Optional parameter to indicate the weight/bias quantizer to use. Must be followed by one of the following options:
                        "tf": Uses the real min/max of the data and specified bitwidth (default).
                        "enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
                        "adjusted": Deprecated.
                        "symmetric": Ensures min and max have the same absolute values about zero.
                                     Data will be stored as int#_t data such that the offset is always 0.
  --act_quantizer ACT_QUANTIZER
                        Optional parameter to indicate the activation quantizer to use. Must be followed by one of the following options:
                        "tf": Uses the real min/max of the data and specified bitwidth (default).
                        "enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
                        "adjusted": Deprecated.
                        "symmetric": Ensures min and max have the same absolute values about zero.
                                     Data will be stored as int#_t data such that the offset is always 0.
  --algorithms ALGORITHMS [ALGORITHMS ...]
                        Use this option to enable new optimization algorithms. Usage is:
                            --algorithms <algo_name1> ... The available optimization algorithms are:
                        "cle" - Cross layer equalization includes a number of methods for equalizing
                        weights and biases across layers in order to rectify imbalances that cause
                        quantization errors.
  --bias_bitwidth BIAS_BITWIDTH
                        Selects the bitwidth to use when quantizing the biases; 8 (default) or 32.
  --bias_bw BIAS_BW     Deprecated; use --bias_bitwidth.
  --act_bitwidth ACT_BITWIDTH
                        Selects the bitwidth to use when quantizing the activations; 8 (default) or 16.
  --act_bw ACT_BW       Deprecated; use --act_bitwidth.
  --weights_bitwidth WEIGHTS_BITWIDTH
                        Selects the bitwidth to use when quantizing the weights; 4 or 8 (default).
  --weight_bw WEIGHT_BW
                        Deprecated; use --weights_bitwidth.
  --float_bias_bitwidth FLOAT_BIAS_BITWIDTH
                        Selects the bitwidth to use when biases are in float; 32 or 16.
  --ignore_encodings    Use only quantizer generated encodings, ignoring any user or model provided
                        encodings.
                        Note: Cannot use --ignore_encodings with --quantization_overrides
  --use_per_channel_quantization
                        Enables per-channel quantization for convolution-based op weights.
                        This replaces the built-in model QAT encodings when used for a given weight.
  --use_per_row_quantization
                        Enables row wise quantization of Matmul and FullyConnected ops.
  --float_fallback      Enables fallback to floating point (FP) instead of fixed point.
                        This option can be paired with --float_bitwidth to indicate the bitwidth for FP (by default 32).
                        If this option is enabled, --input_list must not be provided and --ignore_encodings must not be provided.
                        The external quantization encodings (encoding file/FakeQuant encodings) might be missing
                        quantization parameters for some interim tensors. First it will try to fill the gaps by
                        propagating across math-invariant functions. If the quantization params are still missing,
                        it applies fallback to nodes to floating point.
  --use_native_input_files
                        Boolean flag to indicate how to read input files:
                        1. float (default): reads inputs as floats and quantizes if necessary based
                        on quantization parameters in the model.
                        2. native: reads inputs assuming the data type to be native to the
                        model. For ex., uint8_t.
  --use_native_dtype    Note: This option is deprecated, use --use_native_input_files option in
                        future.
                        Boolean flag to indicate how to read input files:
                        1. float (default): reads inputs as floats and quantizes if necessary based
                        on quantization parameters in the model.
                        2. native: reads inputs assuming the data type to be native to the
                        model. For ex., uint8_t.
  --use_native_output_files
                        Use this option to indicate the data type of the output files
                        1. float (default): output the file as floats.
                        2. native: outputs the file that is native to the model. For ex.,
                        uint8_t.
  --disable_relu_squashing
                        Disables squashing of ReLU against convolution-based ops for quantized models.
  --restrict_quantization_steps ENCODING_MIN, ENCODING_MAX
                        Specifies the number of steps to use for computing quantization encodings
                        such that scale = (max - min) / number of quantization steps.
                        The option should be passed as a space separated pair of hexadecimal string
                        minimum and maximum values. i.e. --restrict_quantization_steps "MIN MAX".
                        Please note that this is a hexadecimal string literal and not a signed
                        integer, to supply a negative value an explicit minus sign is required.
                        E.g.--restrict_quantization_steps "-0x80 0x7F" indicates an example 8 bit range,
                        --restrict_quantization_steps "-0x8000 0x7F7F" indicates an example 16
                        bit range.

Custom Op Package Options:
  --op_package_lib OP_PACKAGE_LIB, -opl OP_PACKAGE_LIB
                        Use this argument to pass an op package library for quantization. Must be in
                        the form <op_package_lib_path:interfaceProviderName> and be separated by a
                        comma for multiple package libs
  --converter_op_package_lib CONVERTER_OP_PACKAGE_LIB, -cpl CONVERTER_OP_PACKAGE_LIB
                        Absolute path to converter op package library compiled by the OpPackage
                        generator. Must be separated by a comma for multiple package libraries.
                        Note: Libraries must follow the same order as the xml files.
                        E.g.1: --converter_op_package_lib absolute_path_to/libExample.so
                        E.g.2: -cpl absolute_path_to/libExample1.so,absolute_path_to/libExample2.so
  -p PACKAGE_NAME, --package_name PACKAGE_NAME
                        A global package name to be used for each node in the Model.cpp file.
                        Defaults to Qnn header defined package name
  --op_package_config OP_PACKAGE_CONFIG [OP_PACKAGE_CONFIG ...], -opc OP_PACKAGE_CONFIG [OP_PACKAGE_CONFIG ...]
                        Path to a Qnn Op Package XML configuration file that contains user defined
                        custom operations.

Architecture Checker Options(Experimental):
  --arch_checker        Note: This option will be soon deprecated. Use the qnn-architecture-checker tool to achieve the same result.

Note: Only one of: {'package_name', 'op_package_config'} can be specified

Basic command line usage looks like:

$ qnn-tflite-converter -i <path>/model.tflite
                       -d <optional_network_input_name> <dims>
                       -o <optional_output_path>
                       -p <optional_package_name> # Defaults to "qti.aisw"

qnn-pytorch-converter

The qnn-pytorch-converter tool converts a PyTorch model to a CPP file representing the model as a series of QNN API calls. Additionally, a binary file containing static weights of the model is produced.

usage: qnn-pytorch-converter -d INPUT_NAME INPUT_DIM [--out_node OUT_NAMES]
                             [--input_type INPUT_NAME INPUT_TYPE]
                             [--input_dtype INPUT_NAME INPUT_DTYPE] [--input_encoding  ...]
                             [--input_layout INPUT_NAME INPUT_LAYOUT] [--custom_io CUSTOM_IO]
                             [--preserve_io [PRESERVE_IO [PRESERVE_IO ...]]]
                             [--dump_relay DUMP_RELAY] [--dry_run] [--dump_out_names]
                             [--pytorch_custom_op_lib PYTORCH_CUSTOM_OP_LIB]
                             [--quantization_overrides QUANTIZATION_OVERRIDES] [--keep_quant_nodes]
                             [--disable_batchnorm_folding] [--expand_lstm_op_structure]
                             [--keep_disconnected_nodes]
                             [--input_list INPUT_LIST] [--param_quantizer PARAM_QUANTIZER]
                             [--act_quantizer ACT_QUANTIZER]
                             [--algorithms ALGORITHMS [ALGORITHMS ...]]
                             [--bias_bitwidth BIAS_BITWIDTH] [--bias_bw BIAS_BW]
                             [--act_bitwidth ACT_BITWIDTH] [--act_bw ACT_BW]
                             [--weights_bitwidth WEIGHTS_BITWIDTH] [--weight_bw WEIGHT_BW]
                             [--float_bias_bitwidth FLOAT_BIAS_BITWIDTH] [--ignore_encodings]
                             [--use_per_channel_quantization] [--use_per_row_quantization]
                             [--float_fallback] [--use_native_input_files] [--use_native_dtype]
                             [--use_native_output_files] [--disable_relu_squashing]
                             [--restrict_quantization_steps ENCODING_MIN, ENCODING_MAX]
                             --input_network INPUT_NETWORK [--debug [DEBUG]]
                             [-o OUTPUT_PATH] [--copyright_file COPYRIGHT_FILE]
                             [--float_bitwidth FLOAT_BITWIDTH] [--float_bw FLOAT_BW]
                             [--float_bias_bw FLOAT_BIAS_BW] [--overwrite_model_prefix]
                             [--exclude_named_tensors] [--op_package_lib OP_PACKAGE_LIB]
                             [--converter_op_package_lib CONVERTER_OP_PACKAGE_LIB]
                             [-p PACKAGE_NAME | --op_package_config CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]]
                             [-h] [--arch_checker]

Script to convert PyTorch model into QNN

required arguments:
  -d INPUT_NAME INPUT_DIM, --input_dim INPUT_NAME INPUT_DIM
                        The names and dimensions of the network input layers specified in the format
                        [input_name comma-separated-
                        dimensions], for example:
                            'data' 1,3,224,224
                        Note that the quotes should always be included in order to handle special
                        characters, spaces, etc.
                        For multiple inputs specify multiple --input_dim on the command line like:
                            --input_dim 'data1' 1,3,224,224 --input_dim 'data2' 1,50,100,3
  --input_network INPUT_NETWORK, -i INPUT_NETWORK
                        Path to the source framework model.

optional arguments:
  --out_node OUT_NAMES, --out_name OUT_NAMES
                        Name of the graph's output Tensor Names. Multiple output names should be
                        provided separately like:
                            --out_name out_1 --out_name out_2
  --input_type INPUT_NAME INPUT_TYPE, -t INPUT_NAME INPUT_TYPE
                        Type of data expected by each input op/layer. Type for each input is
                        |default| if not specified. For example: "data" image.Note that the quotes
                        should always be included in order to handle special characters, spaces, etc.
                        For multiple inputs specify multiple --input_type on the command line.
                        Eg:
                            --input_type "data1" image --input_type "data2" opaque
                        These options get used by DSP runtime and following descriptions state how
                        input will be handled for each option.
                        Image:
                        Input is float between 0-255 and the input's mean is 0.0f and the input's
                        max is 255.0f. We will cast the float to uint8ts and pass the uint8ts to the
                        DSP.
                        Default:
                        Pass the input as floats to the dsp directly and the DSP will quantize it.
                        Opaque:
                        Assumes
                        input is float because the consumer layer(i.e next layer) requires
                        it as float, therefore it won't be quantized.
                        Choices supported:
                            image
                            default
                            opaque
  --input_dtype INPUT_NAME INPUT_DTYPE
                        The names and datatype of the network input layers specified in the format
                        [input_name datatype], for example:
                            'data' 'float32'
                        Default is float32 if not specified
                        Note that the quotes should always be included in order to handlespecial
                        characters, spaces, etc.
                        For multiple inputs specify multiple --input_dtype on the command line like:
                            --input_dtype 'data1' 'float32' --input_dtype 'data2' 'float32'
  --input_encoding INPUT_ENCODING [INPUT_ENCODING ...], -e INPUT_ENCODING [INPUT_ENCODING ...]
                        Usage:     --input_encoding "INPUT_NAME" INPUT_ENCODING_IN
                        [INPUT_ENCODING_OUT]
                        Input encoding of the network inputs. Default is bgr.
                        e.g.
                            --input_encoding "data" rgba
                        Quotes must wrap the input node name to handle special characters,
                        spaces, etc. To specify encodings for multiple inputs, invoke
                        --input_encoding for each one.
                        e.g.
                            --input_encoding "data1" rgba --input_encoding "data2" other
                        Optionally, an output encoding may be specified for an input node by
                        providing a second encoding. The default output encoding is bgr.
                        e.g.
                            --input_encoding "data3" rgba rgb
                        Input encoding types:
                            image color encodings: bgr,rgb, nv21, nv12, ...
                            time_series: for inputs of rnn models;
                            other: not available above or is unknown.
                        Supported encodings:
                            bgr
                            rgb
                            rgba
                            argb32
                            nv21
                            nv12
                            time_series
                            other
  --input_layout INPUT_NAME INPUT_LAYOUT, -l INPUT_NAME INPUT_LAYOUT
                        Layout of each input tensor. If not specified, it will use the default
                        based on the Source Framework, shape of input and input encoding.
                        Accepted values are-
                            NCDHW, NDHWC, NCHW, NHWC, NFC, NCF, NTF, TNF, NF, NC, F, NONTRIVIAL
                        N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature, T = Time
                        NDHWC/NCDHW used for 5d inputs
                        NHWC/NCHW used for 4d image-like inputs
                        NFC/NCF used for inputs to Conv1D or other 1D ops
                        NTF/TNF used for inputs with time steps like the ones used for LSTM op
                        NF used for 2D inputs, like the inputs to Dense/FullyConnected layers
                        NC used for 2D inputs with 1 for batch and other for Channels (rarely used)
                        F used for 1D inputs, e.g. Bias tensor
                        NONTRIVIAL for everything elseFor multiple inputs specify multiple
                        --input_layout on the command line.
                        Eg:
                           --input_layout "data1" NCHW --input_layout "data2" NCHW
  --custom_io CUSTOM_IO
                        Use this option to specify a yaml file for custom IO.
  --preserve_io [PRESERVE_IO [PRESERVE_IO ...]]
                        Use this option to preserve IO layout and datatype. The different ways of
                        using this option are as follows:
                            --preserve_io layout <space separated list of names of inputs and
                        outputs of the graph>
                            --preserve_io datatype <space separated list of names of inputs and
                        outputs of the graph>
                        In this case, user should also specify the string - layout or datatype in
                        the command to indicate that converter needs to
                        preserve the layout or datatype. e.g.
                        --preserve_io layout input1 input2 output1
                        --preserve_io datatype input1 input2 output1
                        Optionally, the user may choose to preserve the layout and/or datatype for
                        all the inputs and outputs of the graph.
                        This can be done in the following two ways:
                            --preserve_io layout
                            --preserve_io datatype
                        Additionally, the user may choose to preserve both layout and datatypes for
                        all IO tensors by just passing the option as follows:
                            --preserve_io
                        Note: Only one of the above usages are allowed at a time.
                        Note: --custom_io gets higher precedence than --preserve_io.
  --dump_relay DUMP_RELAY
                        Dump Relay ASM and Params at the path provided with the argument
                        Usage: --dump_relay <path_to_dump>
  --dry_run             Evaluates the model without actually converting any ops, and
                         returns unsupported ops if any.
  --dump_out_names      Dump output names mapped from QNN CPP stored names to converter used
                        names and save to file 'model_output_names.json'.
  --pytorch_custom_op_lib PYTORCH_CUSTOM_OP_LIB, -pcl PYTORCH_CUSTOM_OP_LIB
                        Absolute path to the PyTorch library containing the custom op definition.
                        Multiple custom op libraries must be comma-separated.
                        For PyTorch custom op details, refer to:
                             https://pytorch.org/tutorials/advanced/torch_script_custom_ops.html
                        For custom C++ extension details, refer to:
                             https://pytorch.org/tutorials/advanced/cpp_extension.html
                        Eg. 1: --pytorch_custom_op_lib absolute_path_to/Example.so
                        Eg. 2: -pcl absolute_path_to/Example1.so,absolute_path_to/Example2.so
  --disable_batchnorm_folding
  --expand_lstm_op_structure
                        Enables optimization that breaks the LSTM op to equivalent math ops
  --keep_disconnected_nodes
                        Disable Optimization that removes Ops not connected to the main graph.
                        This optimization uses output names provided over commandline OR
                        inputs/outputs extracted from the Source model to determine the main graph
  --debug [DEBUG]       Run the converter in debug mode.
  -o OUTPUT_PATH, --output_path OUTPUT_PATH
                        Path where the converted Output model should be saved.If not specified, the
                        converter model will be written to a file with same name as the input model
  --copyright_file COPYRIGHT_FILE
                        Path to copyright file. If provided, the content of the file will be added
                        to the output model.
  --float_bitwidth FLOAT_BITWIDTH
                        Selects the bitwidth to use when using float for parameters (weights/bias)
                        and activations for all ops or a specific op (via encodings) selected
                        through encoding; 32 (default) or 16.
  --float_bw FLOAT_BW   Deprecated; use --float_bitwidth.
  --float_bias_bw FLOAT_BIAS_BW
                        Deprecated; use --float_bias_bitwidth.
  --overwrite_model_prefix
                        If option passed, model generator will use the output path name to use as
                        model prefix to name functions in <qnn_model_name>.cpp. (Useful for running
                        multiple models at once) eg: ModelName_composeGraphs. Default is to use
                        generic "QnnModel_".
  --exclude_named_tensors
                        Remove using source framework tensorNames; instead use a counter for naming
                        tensors. Note: This can potentially help to reduce the final model library
                        that will be generated(Recommended for deploying model). Default is False.
  -h, --help            show this help message and exit

Quantizer Options:
  --quantization_overrides QUANTIZATION_OVERRIDES
                        Use this option to specify a json file with parameters to use for
                        quantization. These will override any quantization data carried from
                        conversion (eg TF fake quantization) or calculated during the normal
                        quantization process. Format defined as per AIMET specification.
  --keep_quant_nodes    Use this option to keep activation quantization nodes in the graph rather
                        than stripping them.
  --input_list INPUT_LIST
                        Path to a file specifying the input data. This file should be a plain text
                        file, containing one or more absolute file paths per line. Each path is
                        expected to point to a binary file containing one input in the "raw" format,
                        ready to be consumed by the quantizer without any further preprocessing.
                        Multiple files per line separated by spaces indicate multiple inputs to the
                        network. See documentation for more details. Must be specified for
                        quantization. All subsequent quantization options are ignored when this is
                        not provided.
  --param_quantizer PARAM_QUANTIZER
                        Optional parameter to indicate the weight/bias quantizer to use. Must be followed by one of the following options:
                        "tf": Uses the real min/max of the data and specified bitwidth (default).
                        "enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
                        "adjusted": Deprecated.
                        "symmetric": Ensures min and max have the same absolute values about zero.
                                     Data will be stored as int#_t data such that the offset is always 0.
  --act_quantizer ACT_QUANTIZER
                        Optional parameter to indicate the activation quantizer to use. Must be followed by one of the following options:
                        "tf": Uses the real min/max of the data and specified bitwidth (default).
                        "enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
                        "adjusted": Deprecated.
                        "symmetric": Ensures min and max have the same absolute values about zero.
                                     Data will be stored as int#_t data such that the offset is always 0.
  --algorithms ALGORITHMS [ALGORITHMS ...]
                        Use this option to enable new optimization algorithms. Usage is:
                            --algorithms <algo_name1> ... The available optimization algorithms are:
                        "cle" - Cross layer equalization includes a number of methods for equalizing
                        weights and biases across layers in order to rectify imbalances that cause
                        quantization errors.
  --bias_bitwidth BIAS_BITWIDTH
                        Selects the bitwidth to use when quantizing the biases; 8 (default) or 32.
  --bias_bw BIAS_BW     Deprecated; use --bias_bitwidth.
  --act_bitwidth ACT_BITWIDTH
                        Selects the bitwidth to use when quantizing the activations; 8 (default) or 16.
  --act_bw ACT_BW       Deprecated; use --act_bitwidth.
  --weights_bitwidth WEIGHTS_BITWIDTH
                        Selects the bitwidth to use when quantizing the weights; 4 or 8 (default).
  --weight_bw WEIGHT_BW
                        Deprecated; use --weights_bitwidth.
  --float_bias_bitwidth FLOAT_BIAS_BITWIDTH
                        Selects the bitwidth to use when biases are in float; 32 or 16.
  --ignore_encodings    Use only quantizer generated encodings, ignoring any user or model provided
                        encodings.
                        Note: Cannot use --ignore_encodings with --quantization_overrides
  --use_per_channel_quantization
                        Enables per-channel quantization for convolution-based op weights.
                        This replaces the built-in model QAT encodings when used for a given weight.
  --use_per_row_quantization
                        Enables row wise quantization of Matmul and FullyConnected ops.
  --float_fallback      Enables fallback to floating point (FP) instead of fixed point.
                        This option can be paired with --float_bitwidth to indicate the bitwidth for FP (by default 32).
                        If this option is enabled, --input_list must not be provided and --ignore_encodings must not be provided.
                        The external quantization encodings (encoding file/FakeQuant encodings) might be missing
                        quantization parameters for some interim tensors. First it will try to fill the gaps by
                        propagating across math-invariant functions. If the quantization params are still missing,
                        it applies fallback to nodes to floating point.
  --use_native_input_files
                        Boolean flag to indicate how to read input files:
                        1. float (default): reads inputs as floats and quantizes if necessary based
                        on quantization parameters in the model.
                        2. native: reads inputs assuming the data type to be native to the
                        model. For ex., uint8_t.
  --use_native_dtype    Note: This option is deprecated, use --use_native_input_files option in
                        future.
                        Boolean flag to indicate how to read input files:
                        1. float (default): reads inputs as floats and quantizes if necessary based
                        on quantization parameters in the model.
                        2. native: reads inputs assuming the data type to be native to the
                        model. For ex., uint8_t.
  --use_native_output_files
                        Use this option to indicate the data type of the output files
                        1. float (default): output the file as floats.
                        2. native: outputs the file that is native to the model. For ex.,
                        uint8_t.
  --disable_relu_squashing
                        Disables squashing of ReLU against convolution-based ops for quantized models.
  --restrict_quantization_steps ENCODING_MIN, ENCODING_MAX
                        Specifies the number of steps to use for computing quantization encodings
                        such that scale = (max - min) / number of quantization steps.
                        The option should be passed as a space separated pair of hexadecimal string
                        minimum and maximum values. i.e. --restrict_quantization_steps "MIN MAX".
                        Please note that this is a hexadecimal string literal and not a signed
                        integer, to supply a negative value an explicit minus sign is required.
                        E.g.--restrict_quantization_steps "-0x80 0x7F" indicates an example 8 bit range,
                        --restrict_quantization_steps "-0x8000 0x7F7F" indicates an example 16
                        bit range.

Custom Op Package Options:
  --op_package_lib OP_PACKAGE_LIB, -opl OP_PACKAGE_LIB
                        Use this argument to pass an op package library for quantization. Must be in
                        the form <op_package_lib_path:interfaceProviderName> and be separated by a
                        comma for multiple package libs
  --converter_op_package_lib CONVERTER_OP_PACKAGE_LIB, -cpl CONVERTER_OP_PACKAGE_LIB
                        Absolute path to converter op package library compiled by the OpPackage
                        generator. Must be separated by a comma for multiple package libraries.
                        Note: Libraries must follow the same order as the xml files.
                        E.g.1: --converter_op_package_lib absolute_path_to/libExample.so
                        E.g.2: -cpl absolute_path_to/libExample1.so,absolute_path_to/libExample2.so
  -p PACKAGE_NAME, --package_name PACKAGE_NAME
                        A global package name to be used for each node in the Model.cpp file.
                        Defaults to Qnn header defined package name
  --op_package_config CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...], -opc CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]
                        Path to a Qnn Op Package XML configuration file that contains user defined
                        custom operations.

Architecture Checker Options(Experimental):
  --arch_checker        Note: This option will be soon deprecated. Use the qnn-architecture-checker tool to achieve the same result.

Note: Only one of: {‘package_name’, ‘op_package_config’} can be specified

Basic command line usage looks like:

$ qnn-pytorch-converter -i <path>/model.pt
                       -d <network_input_name> <dims>
                       -o <optional_output_path>
                       -p <optional_package_name> # Defaults to "qti.aisw"

qnn-onnx-converter

The qnn-onnx-converter tool converts a model from the ONNX framework to a CPP file representing the model as a series of QNN API calls. Additionally, a binary file containing static weights of the model is produced.

usage: qnn-onnx-converter [--out_node OUT_NAMES] [--input_type INPUT_NAME INPUT_TYPE]
                          [--input_dtype INPUT_NAME INPUT_DTYPE] [--input_encoding  ...]
                          [--input_layout INPUT_NAME INPUT_LAYOUT] [--custom_io CUSTOM_IO]
                          [--preserve_io [PRESERVE_IO [PRESERVE_IO ...]]] [--dry_run [DRY_RUN]]
                          [-d INPUT_NAME INPUT_DIM] [-n] [-b BATCH] [-s SYMBOL_NAME VALUE]
                          [--dump_custom_io_config_template DUMP_CUSTOM_IO_CONFIG_TEMPLATE]
                          [--quantization_overrides QUANTIZATION_OVERRIDES] [--keep_quant_nodes]
                          [--disable_batchnorm_folding] [--expand_lstm_op_structure]
                          [--keep_disconnected_nodes]
                          [--input_list INPUT_LIST] [--param_quantizer PARAM_QUANTIZER]
                          [--act_quantizer ACT_QUANTIZER] [--algorithms ALGORITHMS [ALGORITHMS ...]]
                          [--bias_bitwidth BIAS_BITWIDTH] [--bias_bw BIAS_BW]
                          [--act_bitwidth ACT_BITWIDTH] [--act_bw ACT_BW]
                          [--weights_bitwidth WEIGHTS_BITWIDTH] [--weight_bw WEIGHT_BW]
                          [--float_bias_bitwidth FLOAT_BIAS_BITWIDTH] [--ignore_encodings]
                          [--use_per_channel_quantization] [--use_per_row_quantization]
                          [--float_fallback] [--use_native_input_files] [--use_native_dtype]
                          [--use_native_output_files] [--disable_relu_squashing]
                          [--restrict_quantization_steps ENCODING_MIN, ENCODING_MAX]
                          --input_network INPUT_NETWORK [--debug [DEBUG]]
                          [-o OUTPUT_PATH] [--copyright_file COPYRIGHT_FILE]
                          [--float_bitwidth FLOAT_BITWIDTH] [--float_bw FLOAT_BW]
                          [--float_bias_bw FLOAT_BIAS_BW] [--overwrite_model_prefix]
                          [--exclude_named_tensors] [--op_package_lib OP_PACKAGE_LIB]
                          [--converter_op_package_lib CONVERTER_OP_PACKAGE_LIB]
                          [-p PACKAGE_NAME | --op_package_config CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]]
                          [-h] [--arch_checker]

Script to convert ONNX model into QNN

required arguments:
  --input_network INPUT_NETWORK, -i INPUT_NETWORK
                        Path to the source framework model.

optional arguments:
  --out_node OUT_NAMES, --out_name OUT_NAMES
                        Name of the graph's output tensor names. Multiple output
                        nodes should be provided separately like:
                            --out_name out_1 --out_name out_2
  --input_type INPUT_NAME INPUT_TYPE, -t INPUT_NAME INPUT_TYPE
                        Type of data expected by each input op/layer. Type for
                        each input is |default| if not specified. For example:
                        "data" image.Note that the quotes should always be
                        included in order to handle special characters,
                        spaces,etc. For multiple inputs specify multiple
                        --input_type on the command line. Eg:
                            --input_type "data1" image --input_type "data2" opaque
                        These options get used by DSP runtime and following
                        descriptions state how input will be handled for each
                        option.
                        Image:
                        Input is float between 0-255 and the input's mean is 0.0f and the input's
                        max is 255.0f. We will cast the float to uint8ts and pass the uint8ts to
                        the DSP.
                        Default:
                        Pass the input as floats to the dsp
                        directly and the DSP will quantize it.
                        Opaque:
                        Assumes input is float because the consumer layer(i.e next
                        layer) requires it as float, therefore it won't be
                        quantized.
                        Choices supported:
                            image
                            default
                            opaque
  --input_dtype INPUT_NAME INPUT_DTYPE
                        The names and datatype of the network input layers
                        specified in the format [input_name datatype], for
                        example:
                            'data' 'float32'.
                        Default is float32 if not specified.
                        Note that the quotes should always be included in order to handle special
                        characters, spaces, etc.
                        For multiple inputs specify multiple --input_dtype on the command line like:
                            --input_dtype 'data1' 'float32' --input_dtype 'data2' 'float32'
  --input_encoding INPUT_ENCODING [INPUT_ENCODING ...], -e INPUT_ENCODING [INPUT_ENCODING ...]
                        Usage:     --input_encoding "INPUT_NAME" INPUT_ENCODING_IN
                        [INPUT_ENCODING_OUT]
                        e.g.
                            --input_encoding "data" rgba
                        Quotes must wrap the input node name to handle special characters,
                        spaces, etc. To specify encodings for multiple inputs, invoke
                        --input_encoding for each one.
                        e.g.
                            --input_encoding "data1" rgba --input_encoding "data2" other
                        Optionally, an output encoding may be specified for an input node by
                        providing a second encoding. The default output encoding is bgr.
                        e.g.
                            --input_encoding "data3" rgba rgb
                        Input encoding types:
                            image color encodings: bgr,rgb, nv21, nv12, ...
                            time_series: for inputs of rnn models;
                            other: not available above or is unknown.
                        Supported encodings:
                            bgr
                            rgb
                            rgba
                            argb32
                            nv21
                            nv12
                            time_series
                            other
  --input_layout INPUT_NAME INPUT_LAYOUT, -l INPUT_NAME INPUT_LAYOUT
                        Layout of each input tensor. If not specified, it will use the default
                        based on the Source Framework, shape of input and input encoding.
                        Accepted values are-
                            NCDHW, NDHWC, NCHW, NHWC, NFC, NCF, NTF, TNF, NF, NC, F, NONTRIVIAL
                        N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature, T = Time
                        NDHWC/NCDHW used for 5d inputs
                        NHWC/NCHW used for 4d image-like inputs
                        NFC/NCF used for inputs to Conv1D or other 1D ops
                        NTF/TNF used for inputs with time steps like the ones used for LSTM op
                        NF used for 2D inputs, like the inputs to Dense/FullyConnected layers
                        NC used for 2D inputs with 1 for batch and other for Channels (rarely used)
                        F used for 1D inputs, e.g. Bias tensor
                        NONTRIVIAL for everything elseFor multiple inputs specify multiple
                        --input_layout on the command line.
                        Eg:
                            --input_layout "data1" NCHW --input_layout "data2" NCHW
                        Note: This flag does not set the layout of the input tensor in the converted DLC.
                            Please use --custom_io for that.
  --custom_io CUSTOM_IO
                        Use this option to specify a yaml file for custom IO.
  --preserve_io PRESERVE_IO
                        Use this option to preserve IO layout and datatype. The different ways of using
                        this option are as follows:
                            --preserve_io layout <space separated list of names of inputs and outputs of the graph>
                            --preserve_io datatype <space separated list of names of inputs and outputs of the graph>
                        In this case, user should also specify the string - layout or datatype in the command
                        to indicate that converter needs to preserve the layout or datatype. e.g.
                            --preserve_io layout input1 input2 output1
                            --preserve_io datatype input1 input2 output1
                        Optionally, the user may choose to preserve the layout and/or datatype for all
                        the inputs and outputs of the graph. This can be done in the following two ways:
                            --preserve_io layout
                            --preserve_io datatype
                        Additionally, the user may choose to preserve both layout and datatypes for all
                        IO tensors by just passing the option as follows:
                            --preserve_io
                        Note: Only one of the above usages are allowed at a time.
                        Note: --custom_io gets higher precedence than --preserve_io.
  --dry_run [DRY_RUN]   Evaluates the model without actually converting any ops, and returns
                        unsupported ops/attributes as well as unused inputs and/or outputs if any.
                        Leave empty or specify "info" to see dry run as a table, or specify "debug"
                        to show more detailed messages only"
  -d INPUT_NAME INPUT_DIM, --input_dim INPUT_NAME INPUT_DIM
                        The name and dimension of all the input buffers to the network specified in
                        the format [input_name comma-separated-dimensions],
                        for example: 'data' 1,224,224,3.
                        Note that the quotes should always be included in order to handle special
                        characters, spaces, etc.
                        NOTE: This feature works only with Onnx 1.6.0 and above
  -n, --no_simplification
                        Do not attempt to simplify the model automatically. This may prevent some
                        models from properly converting
  -b BATCH, --batch BATCH
                        The batch dimension override. This will take the first dimension of all
                        inputs and treat it as a batch dim, overriding it with the value provided
                        here. For example:
                        --batch 6
                        will result in a shape change from [1,3,224,224] to [6,3,224,224].
                        If there are inputs without batch dim this should not be used and each input
                        should be overridden independently using -d option for input dimension
                        overrides.
  -s SYMBOL_NAME VALUE, --define_symbol SYMBOL_NAME VALUE
                        This option allows overriding specific input dimension symbols. For instance
                        you might see input shapes specified with variables such as :
                        data: [1,3,height,width]
                        To override these simply pass the option as:
                        --define_symbol height 224 --define_symbol width 448
                        which results in dimensions that look like:
                        data: [1,3,224,448]
  --dump_custom_io_config_template
                        Dumps the yaml template for Custom I/O configuration. This file can be edited
                        as per the custom requirements and passed using the option --custom_ioUse
                        this option to specify a yaml file to which the custom IO config template is
                        dumped.
  --disable_batchnorm_folding
  --expand_lstm_op_structure
                        Enables optimization that breaks the LSTM op to equivalent math ops
  --keep_disconnected_nodes
                        Disable Optimization that removes Ops not connected to the main graph.
                        This optimization uses output names provided over commandline OR
                        inputs/outputs extracted from the Source model to determine the main graph
  --debug [DEBUG]       Run the converter in debug mode.
  -o OUTPUT_PATH, --output_path OUTPUT_PATH
                        Path where the converted Output model should be saved.If not specified, the
                        converter model will be written to a file with same name as the input model
  --copyright_file COPYRIGHT_FILE
                        Path to copyright file. If provided, the content of the file will be added
                        to the output model.
  --float_bitwidth FLOAT_BITWIDTH
                        Selects the bitwidth to use when using float for parameters (weights/bias)
                        and activations for all ops or a specific op (via encodings) selected
                        through encoding; 32 (default) or 16.
  --float_bw FLOAT_BW   Deprecated; use --float_bitwidth.
  --float_bias_bw FLOAT_BIAS_BW
                        Deprecated; use --float_bias_bitwidth.
  --overwrite_model_prefix
                        If option passed, model generator will use the output path name to use as
                        model prefix to name functions in <qnn_model_name>.cpp. (Useful for running
                        multiple models at once) eg: ModelName_composeGraphs. Default is to use
                        generic "QnnModel_".
  --exclude_named_tensors
                        Remove using source framework tensorNames; instead use a counter for naming
                        tensors. Note: This can potentially help to reduce  the final model library
                        that will be generated(Recommended for deploying model). Default is False.
  -h, --help            show this help message and exit

Quantizer Options:
  --quantization_overrides QUANTIZATION_OVERRIDES
                        Use this option to specify a json file with parameters to use for
                        quantization. These will override any quantization data carried from
                        conversion (eg TF fake quantization) or calculated during the normal
                        quantization process. Format defined as per AIMET specification.

  --keep_quant_nodes    Use this option to keep activation quantization nodes in the graph rather
                        than stripping them.
  --input_list INPUT_LIST
                        Path to a file specifying the input data. This file should be a plain text
                        file, containing one or more absolute file paths per line. Each path is
                        expected to point to a binary file containing one input in the "raw" format,
                        ready to be consumed by the quantizer without any further preprocessing.
                        Multiple files per line separated by spaces indicate multiple inputs to the
                        network. See documentation for more details. Must be specified for
                        quantization. All subsequent quantization options are ignored when this is
                        not provided.
  --param_quantizer PARAM_QUANTIZER
                        Optional parameter to indicate the weight/bias quantizer to use. Must be followed by one of the following options:
                        "tf": Uses the real min/max of the data and specified bitwidth (default).
                        "enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
                        "adjusted": Deprecated.
                        "symmetric": Ensures min and max have the same absolute values about zero.
                                     Data will be stored as int#_t data such that the offset is always 0.
  --act_quantizer ACT_QUANTIZER
                        Optional parameter to indicate the activation quantizer to use. Must be followed by one of the following options:
                        "tf": Uses the real min/max of the data and specified bitwidth (default).
                        "enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
                        "adjusted": Deprecated.
                        "symmetric": Ensures min and max have the same absolute values about zero.
                                     Data will be stored as int#_t data such that the offset is always 0.
  --algorithms ALGORITHMS [ALGORITHMS ...]
                        Use this option to enable new optimization algorithms. Usage is:
                            --algorithms <algo_name1> ... The available optimization algorithms are:
                        "cle" - Cross layer equalization includes a number of methods for equalizing
                        weights and biases across layers in order to rectify imbalances that cause
                        quantization errors.
  --bias_bitwidth BIAS_BITWIDTH
                        Selects the bitwidth to use when quantizing the biases; 8 (default) or 32.
  --bias_bw BIAS_BW     Deprecated; use --bias_bitwidth.
  --act_bitwidth ACT_BITWIDTH
                        Selects the bitwidth to use when quantizing the activations; 8 (default) or 16.
  --act_bw ACT_BW       Deprecated; use --act_bitwidth.
  --weights_bitwidth WEIGHTS_BITWIDTH
                        Selects the bitwidth to use when quantizing the weights; 4 or 8 (default).
  --weight_bw WEIGHT_BW
                        Deprecated; use --weights_bitwidth.
  --float_bias_bitwidth FLOAT_BIAS_BITWIDTH
                        Selects the bitwidth to use when biases are in float; 32 or 16.
  --ignore_encodings    Use only quantizer generated encodings, ignoring any user or model provided
                        encodings.
                        Note: Cannot use --ignore_encodings with --quantization_overrides
  --use_per_channel_quantization
                        Enables per-channel quantization for convolution-based op weights.
                        This replaces the built-in model QAT encodings when used for a given weight.
  --use_per_row_quantization
                        Enables row wise quantization of Matmul and FullyConnected ops.
  --float_fallback      Enables fallback to floating point (FP) instead of fixed point.
                        This option can be paired with --float_bitwidth to indicate the bitwidth for FP (by default 32).
                        If this option is enabled, --input_list must not be provided and --ignore_encodings must not be provided.
                        The external quantization encodings (encoding file/FakeQuant encodings) might be missing
                        quantization parameters for some interim tensors. First it will try to fill the gaps by
                        propagating across math-invariant functions. If the quantization params are still missing,
                        it applies fallback to nodes to floating point.
  --use_native_input_files
                        Boolean flag to indicate how to read input files:
                        1. float (default): reads inputs as floats and quantizes if necessary based
                        on quantization parameters in the model.
                        2. native: reads inputs assuming the data type to be native to the
                        model. For ex., uint8_t.
  --use_native_dtype    Note: This option is deprecated, use --use_native_input_files option in
                        future.
                        Boolean flag to indicate how to read input files:
                        1. float (default): reads inputs as floats and quantizes if necessary based
                        on quantization parameters in the model.
                        2. native: reads inputs assuming the data type to be native to the
                        model. For ex., uint8_t.
  --use_native_output_files
                        Use this option to indicate the data type of the output files
                        1. float (default): output the file as floats.
                        2. native:          outputs the file that is native to the model. For ex.,
                        uint8_t.
  --disable_relu_squashing
                        Disables squashing of ReLU against convolution-based ops for quantized models.
  --restrict_quantization_steps ENCODING_MIN, ENCODING_MAX
                        Specifies the number of steps to use for computing quantization encodings
                        such that scale = (max - min) / number of quantization steps.
                        The option should be passed as a space separated pair of hexadecimal string
                        minimum and maximum values. i.e. --restrict_quantization_steps "MIN MAX".
                        Please note that this is a hexadecimal string literal and not a signed
                        integer, to supply a negative value an explicit minus sign is required.
                        E.g.--restrict_quantization_steps "-0x80 0x7F" indicates an example 8 bit range,
                        --restrict_quantization_steps "-0x8000 0x7F7F" indicates an example 16
                        bit range. This argument is required for 16-bit Matmul operations.

Custom Op Package Options:
  --op_package_lib OP_PACKAGE_LIB, -opl OP_PACKAGE_LIB
                        Use this argument to pass an op package library for quantization. Must be in
                        the form <op_package_lib_path:interfaceProviderName> and be separated by a
                        comma for multiple package libs
  --converter_op_package_lib CONVERTER_OP_PACKAGE_LIB, -cpl CONVERTER_OP_PACKAGE_LIB
                        Absolute path to converter op package library compiled by the OpPackage
                        generator. Must be separated by a comma for multiple package libraries.
                        Note: Libraries must follow the same order as the xml files.
                        E.g.1: --converter_op_package_lib absolute_path_to/libExample.so
                        E.g.2: -cpl absolute_path_to/libExample1.so,absolute_path_to/libExample2.so
  -p PACKAGE_NAME, --package_name PACKAGE_NAME
                        A global package name to be used for each node in the Model.cpp file.
                        Defaults to Qnn header defined package name
  --op_package_config OP_PACKAGE_CONFIG [OP_PACKAGE_CONFIG ...], -opc OP_PACKAGE_CONFIG [OP_PACKAGE_CONFIG ...]
                        Path to a Qnn Op Package XML configuration file that contains user defined
                        custom operations.

Architecture Checker Options(Experimental):
  --arch_checker        Note: This option will be soon deprecated. Use the qnn-architecture-checker tool to achieve the same result.

Note: Only one of: {'package_name', 'op_package_config'} can be specified

qairt-converter

The qairt-converter tool converts a model from the one of Onnx/TensorFlow/TFLite/PyTorch framework to a DLC file representing the QNN graph format that can enable inference on Qualcomm AI IP/HW. The converter auto detects the framework based on the source model extension.

Basic command line usage looks like:

usage: qairt-converter [--desired_input_shape INPUT_NAME INPUT_DIM] [--out_tensor_node OUT_NAMES]
                       [--source_model_input_datatype INPUT_NAME INPUT_DTYPE]
                       [--source_model_input_layout INPUT_NAME INPUT_LAYOUT]
                       [--desired_input_color_encoding  [ ...]]
                       [--dump_io_config_template DUMP_IO_CONFIG_TEMPLATE] [--io_config IO_CONFIG]
                       [--dry_run [DRY_RUN]] [--quantization_overrides QUANTIZATION_OVERRIDES]
                       [--onnx_no_simplification] [--onnx_batch BATCH]
                       [--onnx_define_symbol SYMBOL_NAME VALUE] [--tf_no_optimization]
                       [--tf_show_unconsumed_nodes] [--tf_saved_model_tag SAVED_MODEL_TAG]
                       [--tf_saved_model_signature_key SAVED_MODEL_SIGNATURE_KEY]
                       [--tf_validate_models] [--tflite_signature_name SIGNATURE_NAME]
                       --input_network INPUT_NETWORK [-h] [--debug [DEBUG]]
                       [--output_path OUTPUT_PATH] [--copyright_file COPYRIGHT_FILE]
                       [--float_bitwidth FLOAT_BITWIDTH] [--float_bias_bitwidth FLOAT_BIAS_BITWIDTH]
                       [--model_version MODEL_VERSION] [--converter_op_package_lib CONVERTER_OP_PACKAGE_LIB]
                       [--package_name PACKAGE_NAME | --op_package_config CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]]

required arguments:
  --input_network INPUT_NETWORK, -i INPUT_NETWORK
                        Path to the source framework model.

optional arguments:
  --desired_input_shape INPUT_NAME INPUT_DIM, -d INPUT_NAME INPUT_DIM
                        The name and dimension of all the input buffers to the network specified in
                        the format [input_name comma-separated-dimensions],
                        for example: 'data' 1,224,224,3.
                        Note that the quotes should always be included in order to handle special
                        characters, spaces, etc.
                        NOTE: Required for TensorFlow and PyTorch. Optional for Onnx and Tflite
                        In case of Onnx, this feature works only with Onnx 1.6.0 and above
  --out_tensor_node OUT_NAMES, --out_tensor_name OUT_NAMES
                        Name of the graph's output Tensor Names. Multiple output names should be
                        provided separately like:
                            --out_name out_1 --out_name out_2
                        NOTE: Required for TensorFlow. Optional for Onnx, Tflite and PyTorch
  --source_model_input_datatype INPUT_NAME INPUT_DTYPE
                        The names and datatype of the network input layers specified in the format
                        [input_name datatype], for example:
                            'data' 'float32'
                        Default is float32 if not specified
                        Note that the quotes should always be included in order to handlespecial
                        characters, spaces, etc.
                        For multiple inputs specify multiple --input_dtype on the command line like:
                            --input_dtype 'data1' 'float32' --input_dtype 'data2' 'float32'
  --source_model_input_layout INPUT_NAME INPUT_LAYOUT, -l INPUT_NAME INPUT_LAYOUT
                        Layout of each input tensor. If not specified, it will use the default
                        based on the Source Framework, shape of input and input encoding.
                        Accepted values are-
                            NCDHW, NDHWC, NCHW, NHWC, NFC, NCF, NTF, TNF, NF, NC, F, NONTRIVIAL
                        N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature, T =
                        Time
                        NDHWC/NCDHW used for 5d inputs
                        NHWC/NCHW used for 4d image-like inputs
                        NFC/NCF used for inputs to Conv1D or other 1D ops
                        NTF/TNF used for inputs with time steps like the ones used for LSTM op
                        NF used for 2D inputs, like the inputs to Dense/FullyConnected layers
                        NC used for 2D inputs with 1 for batch and other for Channels (rarely used)
                        F used for 1D inputs, e.g. Bias tensor
                        NONTRIVIAL for everything elseFor multiple inputs specify multiple
                        --input_layout on the command line.
                        Eg:
                            --input_layout "data1" NCHW --input_layout "data2" NCHW
  --desired_input_color_encoding  [ ...], -e  [ ...]
                        Usage:     --input_color_encoding "INPUT_NAME" INPUT_ENCODING_IN
                        [INPUT_ENCODING_OUT]
                        Input encoding of the network inputs. Default is bgr.
                        e.g.
                           --input_color_encoding "data" rgba
                        Quotes must wrap the input node name to handle special characters,
                        spaces, etc. To specify encodings for multiple inputs, invoke
                        --input_color_encoding for each one.
                        e.g.
                            --input_color_encoding "data1" rgba --input_color_encoding "data2" other
                        Optionally, an output encoding may be specified for an input node by
                        providing a second encoding. The default output encoding is bgr.
                        e.g.
                            --input_color_encoding "data3" rgba rgb
                        Input encoding types:
                             image color encodings: bgr,rgb, nv21, nv12, ...
                            time_series: for inputs of rnn models;
                            other: not available above or is unknown.
                        Supported encodings:
                           bgr
                           rgb
                           rgba
                           argb32
                           nv21
                           nv12
  --dump_io_config_template DUMP_IO_CONFIG_TEMPLATE
                        Dumps the yaml template for I/O configuration. This file can be edited as
                        per the custom requirements and passed using the option --io_configUse this
                        option to specify a yaml file to which the IO config template is dumped.
  --io_config IO_CONFIG
                        Use this option to specify a yaml file for input and output options.
  --dry_run [DRY_RUN]   Evaluates the model without actually converting any ops, and returns
                        unsupported ops/attributes as well as unused inputs and/or outputs if any.
  -h, --help            show this help message and exit
  --debug [DEBUG]       Run the converter in debug mode.
  --output_path OUTPUT_PATH, -o OUTPUT_PATH
                        Path where the converted Output model should be saved.If not specified, the
                        converter model will be written to a file with same name as the input model
  --copyright_file COPYRIGHT_FILE
                        Path to copyright file. If provided, the content of the file will be added
                        to the output model.
  --float_bitwidth FLOAT_BITWIDTH
                        Use the --float_bitwidth option to convert the graph to the specified float
                        bitwidth, either 32 (default) or 16.
  --float_bias_bitwidth FLOAT_BIAS_BITWIDTH
                        Use the --float_bias_bitwidth option to select the bitwidth to use for float
                        bias tensor
  --model_version MODEL_VERSION
                        User-defined ASCII string to identify the model, only first 64 bytes will be
                        stored

Custom Op Package Options:
  --converter_op_package_lib CONVERTER_OP_PACKAGE_LIB, -cpl CONVERTER_OP_PACKAGE_LIB
                        Absolute path to the converter op package library compiled by the OpPackage
                        generator. Multiple package libraries must be comma separated.
                        Note: The converter op package library order must match the xml file order.
                        Ex1: --converter_op_package_lib absolute_path_to/libExample.so
                        Ex2: -cpl absolute_path_to/libExample1.so,absolute_path_to/libExample2.so
  --package_name PACKAGE_NAME, -p PACKAGE_NAME
                        A global package name to be used for each node in the Model.cpp file.
                        Defaults to the package name defined in the QNN header.
  --op_package_config CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...], -opc CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]
                        Absolute path to an XML configuration file for a QNN op package that
                        contains custom, user-defined operations.

Quantizer Options:
  --quantization_overrides QUANTIZATION_OVERRIDES
                        Use this option to specify a json file with parameters to use for
                        quantization. These will override any quantization data carried from
                        conversion (eg TF fake quantization) or calculated during the normal
                        quantization process. Format defined as per AIMET specification.

Onnx Converter Options:
  --onnx_no_simplification
                        Do not attempt to simplify the model automatically. This may prevent some
                        models from properly converting
                        when sequences of unsupported static operations are present.
  --onnx_batch BATCH    The batch dimension override. This will take the first dimension of all
                        inputs and treat it as a batch dim, overriding it with the value provided
                        here. For example:
                        --batch 6
                        will result in a shape change from [1,3,224,224] to [6,3,224,224].
                        If there are inputs without batch dim this should not be used and each input
                        should be overridden independently using -d option for input dimension
                        overrides.
  --onnx_define_symbol SYMBOL_NAME VALUE
                        This option allows overriding specific input dimension symbols. For instance
                        you might see input shapes specified with variables such as :
                        data: [1,3,height,width]
                        To override these simply pass the option as:
                        --define_symbol height 224 --define_symbol width 448
                        which results in dimensions that look like:
                        data: [1,3,224,448]

TensorFlow Converter Options:
  --tf_no_optimization  Do not attempt to optimize the model automatically.
  --tf_show_unconsumed_nodes
                        Displays a list of unconsumed nodes, if there any are found. Nodeswhich are
                        unconsumed do not violate the structural fidelity of thegenerated graph.
  --tf_saved_model_tag SAVED_MODEL_TAG
                        Specify the tag to seletet a MetaGraph from savedmodel. ex:
                        --saved_model_tag serve. Default value will be 'serve' when it is not
                        assigned.
  --tf_saved_model_signature_key SAVED_MODEL_SIGNATURE_KEY
                        Specify signature key to select input and output of the model. ex:
                        --saved_model_signature_key serving_default. Default value will be
                        'serving_default' when it is not assigned
  --tf_validate_models  Validate the original TF model against optimized TF model.
                        Constant inputs with all value 1s will be generated and will be used
                        by both models and their outputs are checked against each other.
                        The % average error and 90th percentile of output differences will be
                        calculated for this.
                        Note: Usage of this flag will incur extra time due to inference of the
                        models.

Tflite Converter Options:
  --tflite_signature_name SIGNATURE_NAME
                        Use this option to specify a specific Subgraph signature to convert

Model Preparation

Quantization Support

Quantization is supported through the converter interface and is performed at conversion time. The only required option to enable quantization along with conversion is the –input_list option, which provides the quantizer with the required input data for the given model. The following options are available in each converter listed above to enable and configure quantization:

Quantizer Options:
--quantization_overrides QUANTIZATION_OVERRIDES
                        Use this option to specify a json file with parameters
                        to use for quantization. These will override any
                        quantization data carried from conversion (eg TF fake
                        quantization) or calculated during the normal
                        quantization process. Format defined as per AIMET
                        specification.
--input_list INPUT_LIST
                      Path to a file specifying the input data. This file
                      should be a plain text file, containing one or more
                      absolute file paths per line. Each path is expected to
                      point to a binary file containing one input in the
                      "raw" format, ready to be consumed by the quantizer
                      without any further preprocessing. Multiple files per
                      line separated by spaces indicate multiple inputs to
                      the network. See documentation for more details. Must
                      be specified for quantization. All subsequent
                      quantization options are ignored when this is not
                      provided.
--param_quantizer PARAM_QUANTIZER
                      Optional parameter to indicate the weight/bias
                      quantizer to use. Must be followed by one of the
                      following options: "tf": Uses the real min/max of the
                      data and specified bitwidth (default) "enhanced": Uses
                      an algorithm useful for quantizing models with long
                      tails present in the weight distribution "adjusted":
                      Uses an adjusted min/max for computing the range,
                      particularly good for denoise models "symmetric":
                      Ensures min and max have the same absolute values
                      about zero. Data will be stored as int#_t data such
                      that the offset is always 0.
--act_quantizer ACT_QUANTIZER
                      Optional parameter to indicate the activation
                      quantizer to use. Must be followed by one of the
                      following options: "tf": Uses the real min/max of the
                      data and specified bitwidth (default) "enhanced": Uses
                      an algorithm useful for quantizing models with long
                      tails present in the weight distribution "adjusted":
                      Uses an adjusted min/max for computing the range,
                      particularly good for denoise models "symmetric":
                      Ensures min and max have the same absolute values
                      about zero. Data will be stored as int#_t data such
                      that the offset is always 0.
--algorithms ALGORITHMS [ALGORITHMS ...]
                      Use this option to enable new optimization algorithms.
                      Usage is: --algorithms <algo_name1> ... The
                      available optimization algorithms are: "cle" - Cross
                      layer equalization includes a number of methods for
                      equalizing weights and biases across layers in order
                      to rectify imbalances that cause quantization errors.
--bias_bw BIAS_BW     Use the --bias_bw option to select the bitwidth to use
                      when quantizing the biases, either 8 (default) or 32.
--act_bw ACT_BW       Use the --act_bw option to select the bitwidth to use
                      when quantizing the activations, either 8 (default) or
                      16.
--weight_bw WEIGHT_BW
                      Use the --weight_bw option to select the bitwidth to
                      use when quantizing the weights, currently only 8 bit
                      (default) supported.
--float_bias_bw FLOAT_BIAS_BW
                      Use the --float_bias_bw option to select the bitwidth to
                      use when biases are in float, either 32 or 16.
--ignore_encodings    Use only quantizer generated encodings, ignoring any
                      user or model provided encodings. Note: Cannot use
                      --ignore_encodings with --quantization_overrides
--use_per_channel_quantization [USE_PER_CHANNEL_QUANTIZATION [USE_PER_CHANNEL_QUANTIZATION ...]]
                      Use per-channel quantization for
                      convolution-based op weights. Note: This will replace
                      built-in model QAT encodings when used for a given
                      weight.Usage "--use_per_channel_quantization" to
                      enable or "--use_per_channel_quantization false"
                      (default) to disable
--use_per_row_quantization [USE_PER_ROW_QUANTIZATION [USE_PER_ROW_QUANTIZATION ...]]
                      Use this option to enable rowwise quantization of Matmul and
                      FullyConnected op. Usage "--use_per_row_quantization" to enable
                      or "--use_per_row_quantization false" (default) to
                      disable. This option may not be supported by all backends.

Basic command line usage to convert and quantize a model using the TF converter would look like:

$ qnn-tensorflow-converter -i <path>/frozen_graph.pb
                    -d <network_input_name> <dims>
                    --out_node <network_output_name>
                    -o <optional_output_path>
                    --allow_unconsumed_nodes  # optional, but most likely will be need for larger models
                    -p <optional_package_name> # Defaults to "qti.aisw"
                    --input_list input_list.txt

This will quantize the network using the default quantizer and bitwidths (8 bits for activations, weights, and biases).

For more detailed information on quantization, options, and algorithms please refer to Quantization.

qairt-quantizer

The qairt-quantizer tool converts non-quantized DLC models into quantized DLC models.

Basic command line usage looks like:

usage: qairt-quantizer --input_dlc INPUT_DLC [-h] [--output_dlc OUTPUT_DLC]
                   [--input_list INPUT_LIST] [--float_fallback]
                   [--algorithms ALGORITHMS [ALGORITHMS ...]] [--bias_bitwidth BIAS_BITWIDTH]
                   [--act_bitwidth ACT_BITWIDTH] [--weights_bitwidth WEIGHTS_BITWIDTH]
                   [--float_bitwidth FLOAT_BITWIDTH] [--float_bias_bitwidth FLOAT_BIAS_BITWIDTH]
                   [--ignore_encodings] [--use_per_channel_quantization]
                   [--use_per_row_quantization] [--use_native_input_files]
                   [--use_native_output_files]
                   [--restrict_quantization_steps ENCODING_MIN, ENCODING_MAX]
                   [--act_quantizer_calibration ACT_QUANTIZER_CALIBRATION]
                   [--param_quantizer_calibration PARAM_QUANTIZER_CALIBRATION]
                   [--act_quantizer_schema ACT_QUANTIZER_SCHEMA]
                   [--param_quantizer_schema PARAM_QUANTIZER_SCHEMA]
                   [--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE]
                   [--use_aimet_quantizer]
                   [--config_file CONFIG_FILE]
                   [--op_package_lib OP_PACKAGE_LIB]
                   [--dump_encoding_json] [--debug [DEBUG]]

required arguments:
  --input_dlc INPUT_DLC
                        Path to the dlc container containing the model for which fixed-point
                        encoding metadata should be generated. This argument is required

optional arguments:
  -h, --help            show this help message and exit
  --output_dlc OUTPUT_DLC
                        Path at which the metadata-included quantized model container should be
                        written.If this argument is omitted, the quantized model will be written at
                        <unquantized_model_name>_quantized.dlc
  --input_list INPUT_LIST
                        Path to a file specifying the input data. This file should be a plain text
                        file, containing one or more absolute file paths per line. Each path is
                        expected to point to a binary file containing one input in the "raw" format,
                        ready to be consumed by the quantizer without any further preprocessing.
                        Multiple files per line separated by spaces indicate multiple inputs to the
                        network. See documentation for more details. Must be specified for
                        quantization. All subsequent quantization options are ignored when this is
                        not provided.
  --float_fallback      Use this option to enable fallback to floating point (FP) instead of fixed
                        point.
                        This option can be paired with --float_bitwidth to indicate the bitwidth for
                        FP (by default 32).
                        If this option is enabled, then input list must not be provided and
                        --ignore_encodings must not be provided.
                        The external quantization encodings (encoding file/FakeQuant encodings)
                        might be missing quantization parameters for some interim tensors.
                        First it will try to fill the gaps by propagating across math-invariant
                        functions. If the quantization params are still missing,
                        then it will apply fallback to nodes to floating point.
  --algorithms ALGORITHMS [ALGORITHMS ...]
                        Use this option to enable new optimization algorithms. Usage is:
                        --algorithms <algo_name1> ... The available optimization algorithms are:
                        "cle" - Cross layer equalization includes a number of methods for equalizing
                        weights and biases across layers in order to rectify imbalances that cause
                        quantization errors.
  --bias_bitwidth BIAS_BITWIDTH
                        Use the --bias_bitwidth option to select the bitwidth to use when quantizing
                        the biases, either 8 (default) or 32.
  --act_bitwidth ACT_BITWIDTH
                        Use the --act_bitwidth option to select the bitwidth to use when quantizing
                        the activations, either 8 (default) or 16.
  --weights_bitwidth WEIGHTS_BITWIDTH
                        Use the --weights_bitwidth option to select the bitwidth to use when
                        quantizing the weights, either 4 or 8 (default).
  --float_bitwidth FLOAT_BITWIDTH
                        Use the --float_bitwidth option to select the bitwidth to use for float
                        tensors,either 32 (default) or 16.
  --float_bias_bitwidth FLOAT_BIAS_BITWIDTH
                        Use the --float_bias_bitwidth option to select the bitwidth to use when
                        biases are in float, either 32 or 16.
  --ignore_encodings    Use only quantizer generated encodings, ignoring any user or model provided
                        encodings.
                        Note: Cannot use --ignore_encodings with --quantization_overrides
  --use_per_channel_quantization
                        Use this option to enable per-channel quantization for convolution-based op
                        weights.
                        Note: This will replace built-in model QAT encodings when used for a given
                        weight.
  --use_per_row_quantization
                        Use this option to enable rowwise quantization of Matmul and FullyConnected
                        ops.
  --use_native_input_files
                        Boolean flag to indicate how to read input files:
                        1. float (default): reads inputs as floats and quantizes if necessary based
                        on quantization parameters in the model.
                        2. native:          reads inputs assuming the data type to be native to the
                        model. For ex., uint8_t.
  --use_native_output_files
                        Use this option to indicate the data type of the output files
                        1. float (default): output the file as floats.
                        2. native:          outputs the file that is native to the model. For ex.,
                        uint8_t.
  --restrict_quantization_steps ENCODING_MIN, ENCODING_MAX
                        Specifies the number of steps to use for computing quantization encodings
                        such that scale = (max - min) / number of quantization steps.
                        The option should be passed as a space separated pair of hexadecimal string
                        minimum and maximum valuesi.e. --restrict_quantization_steps "MIN MAX".
                         Please note that this is a hexadecimal string literal and not a signed
                        integer, to supply a negative value an explicit minus sign is required.
                        E.g.--restrict_quantization_steps "-0x80 0x7F" indicates an example 8 bit
                        range,
                            --restrict_quantization_steps "-0x8000 0x7F7F" indicates an example 16
                        bit range.
                        This argument is required for 16-bit Matmul operations.
  --act_quantizer_calibration ACT_QUANTIZER_CALIBRATION
                        Specify which quantization calibration method to use for activations
                        supported values: min-max (default), sqnr, entropy, mse, percentile
                        This option can be paired with --act_quantizer_schema to override the
                        quantization
                        schema to use for activations otherwise default schema(asymmetric) will be
                        used
  --param_quantizer_calibration PARAM_QUANTIZER_CALIBRATION
                        Specify which quantization calibration method to use for parameters
                        supported values: min-max (default), sqnr, entropy, mse, percentile
                        This option can be paired with --param_quantizer_schema to override the
                        quantization
                        schema to use for parameters otherwise default schema(asymmetric) will be
                        used
  --act_quantizer_schema ACT_QUANTIZER_SCHEMA
                        Specify which quantization schema to use for activations
                        supported values: asymmetric (default), symmetric
  --param_quantizer_schema PARAM_QUANTIZER_SCHEMA
                        Specify which quantization schema to use for parameters
                        supported values: asymmetric (default), symmetric
  --percentile_calibration_value PERCENTILE_CALIBRATION_VALUE
                        Specify the percentile value to be used with Percentile calibration method
                        The specified float value must lie within 90 and 100, default: 99.99
  --use_aimet_quantizer
                        Use AIMET Quantizer in place of IR Quantizer. The following arguments are
                        not allowed together with this option, --restrict_quantization_steps,
                        --pack_4_bit_weights, --use_dynamic_16_bit_weights, --op_package_lib,
                        --keep_weights_quantized.
  --config_file CONFIG_FILE
                        Path to a YAML quantizer config file. The config file is only required if you
                        need to run advance aimet quantization algorithms like AdaRound or AMP.
                        Currently, it is supported only along with the flag "--use_aimet_quantizer" and
                        "--algorithms" command line option as "adaround" or "amp". Please refer to SDK
                        documentation for more details.
  --op_package_lib OP_PACKAGE_LIB, -opl OP_PACKAGE_LIB
                        Use this argument to pass an op package library for quantization. Must be in
                        the form <op_package_lib_path:interfaceProviderName> and be separated by a
                        comma for multiple package libs
  --dump_encoding_json  Dumps an encoding of all tensors to the specified JSON file
  --debug [DEBUG]       Run the quantizer in debug mode.

For more information on usage, please refer to SNPE documentation on the snpe-dlc-quant tool.

qnn-model-lib-generator

Note

For developers who want to execute the model preparation tools under Windows-PC, or on a Qualcomm device with a Windows operating system.
The qnn-model-lib-generator are located under /bin/x86_64-windows-msvc within the SDK for native Windows-PC usage.
For developers who want to run qnn-model-lib-generator on a device with a Windows OS, it is located under /bin/aarch64-windows-msvc.
qnn-model-lib-generator will try to use the CMake command from your platform to generate libraries.
Please make sure the CMake in Windows-OS is feasible by making sure the compile tools are installed(windows-platform compiling tools).

The qnn-model-lib-generator tool compiles QNN model source code into artifacts for a specific target.

usage: qnn-model-lib-generator [-h] [-c <QNN_MODEL>.cpp] [-b <QNN_MODEL>.bin]
       [-t LIB_TARGETS ] [-l LIB_NAME] [-o OUTPUT_DIR]
Script compiles provided Qnn Model artifacts for specified targets.

Required argument(s):
 -c <QNN_MODEL>.cpp                    Filepath for the qnn model .cpp file

optional argument(s):
 -b <QNN_MODEL>.bin                    Filepath for the qnn model .bin file
                                       (Note: if not passed, runtime will fail if .cpp needs any items from a .bin file.)

 -t LIB_TARGETS                        Specifies the targets to build the models for. Default: aarch64-android x86_64-linux-clang
 -l LIB_NAME                           Specifies the name to use for libraries. Default: uses name in <model.bin> if provided,
                                       else generic qnn_model.so
  -o OUTPUT_DIR                         Location for saving output libraries.

Note

For Windows users, please execute this tool with python3.

qnn-op-package-generator

The qnn-op-package-generator tool is used to generate skeleton code for a QNN op package using an XML config file that describes the attributes of the package. The tool creates the package as a directory containing skeleton source code and makefiles that can be compiled to create a shared library object.

usage: qnn-op-package-generator [-h] --config_path CONFIG_PATH [--debug]
                                [--output_path OUTPUT_PATH] [-f]

optional arguments:
  -h, --help            show this help message and exit

required arguments:
  --config_path CONFIG_PATH, -p CONFIG_PATH
                        The path to a config file that defines a QNN Op
                        package(s).

optional arguments:
  --debug               Returns debugging information from generating the
                        package
  --output_path OUTPUT_PATH, -o OUTPUT_PATH
                        Path where the package should be saved
  -f, --force-generation
                        This option will delete the entire existing package
                        Note appropriate file permissions must be set to use
                        this option.
  --converter_op_package, -cop
                        Generates Converter Op Package skeleton code needed
                        by the output shape inference for converters

qnn-context-binary-generator

The qnn-context-binary-generator tool is used to create a context binary by using a particular backend and consuming a model library created by the qnn-model-lib-generator.

usage: qnn-context-binary-generator --model QNN_MODEL.so --backend QNN_BACKEND.so
                                    --binary_file BINARY_FILE_NAME
                                    [--model_prefix MODEL_PREFIX]
                                    [--output_dir OUTPUT_DIRECTORY]
                                    [--op_packages ONE_OR_MORE_OP_PACKAGES]
                                    [--config_file CONFIG_FILE.json]
                                    [--profiling_level PROFILING_LEVEL]
                                    [--verbose] [--version] [--help]

REQUIRED ARGUMENTS:
-------------------
  --model                         <FILE>      Path to the <qnn_model_name.so> file containing a QNN network.
                                              To create a context binary with multiple graphs, use
                                              comma-separated list of model.so files. The syntax is
                                              <qnn_model_name_1.so>,<qnn_model_name_2.so>.

  --backend                       <FILE>      Path to a QNN backend .so library to create the context binary.

  --binary_file                   <VAL>       Name of the binary file to save the context binary to with
                                              .bin file extension. If not provided, no backend binary is created.
                                              If absolute path is provided, binary is saved in this path.
                                              Else binary is saved in the same path as --output_dir option.


OPTIONAL ARGUMENTS:
-------------------
  --model_prefix                              Function prefix to use when loading <qnn_model_name.so> file
                                              containing a QNN network. Default: QnnModel.

  --output_dir                    <DIR>       The directory to save output to. Defaults to ./output.

  --op_packages                   <VAL>       Provide a comma separated list of op packages
                                              and interface providers to register. The syntax is:
                                              op_package_path:interface_provider[,op_package_path:interface_provider...]

  --profiling_level               <VAL>       Enable profiling. Valid Values:
                                              1. basic:    captures execution and init time.
                                              2. detailed: in addition to basic, captures per Op timing
                                                  for execution.
                                              3. backend:  backend-specific profiling level specified
                                                  in the backend extension related JSON config file.

  --profiling_option              <VAL>       Set profiling options:
                                              1. optrace:    Generates an optrace of the run.

  --config_file                   <FILE>      Path to a JSON config file. The config file currently
                                              supports options related to backend extensions and
                                              context priority. Please refer to SDK documentation
                                              for more details.

  --enable_intermediate_outputs               Enable all intermediate nodes to be output along with
                                              default outputs in the saved context.
                                              Note that options --enable_intermediate_outputs and --set_output_tensors
                                              are mutually exclusive. Only one of the options can be specified at a time.

  --set_output_tensors            <VAL>       Provide a comma-separated list of intermediate output tensor names, for which the outputs
                                              will be written in addition to final graph output tensors.
                                              Note that options --enable_intermediate_outputs and --set_output_tensors
                                              are mutually exclusive. Only one of the options can be specified at a time.
                                              The syntax is: graphName0:tensorName0,tensorName1;graphName1:tensorName0,tensorName1.
                                              In case of a single graph, its name is not necessary and a list of comma separated tensor
                                              names can be provided, e.g.: tensorName0,tensorName1.
                                              The same format can be provided in a .txt file.

  --backend_binary                <VAL>       Name of the binary file to save a backend-specific context binary to with
                                              .bin file extension. If not provided, no backend binary is created.
                                              If absolute path is provided, binary is saved in this path.
                                              Else binary is saved in the same path as --output_dir option.

  --log_level                                 Specifies max logging level to be set. Valid settings:
                                              "error", "warn", "info" and "verbose"

  --dlc_path                     <VAL>        Paths to a comma separated list of Deep Learning Containers (DLC) from which to load the models.
                                              Necessitates libQnnModelDlc.so as the --model argument.
                                              To compose multiple graphs in the context, use comma-separated list of DLC files.
                                              The syntax is <qnn_model_name_1.dlc>,<qnn_model_name_2.dlc>
                                              Default: None

  --input_output_tensor_mem_type  <VAL>       Specifies mem type to be used for input and output tensors during graph creation.
                                              Valid settings:"raw" and "memhandle"

  --platform_options              <VAL>       Specifies values to pass as platform options. Multiple platform options can be provided
                                              using the syntax: key0:value0;key1:value1;key2:value2

  --version                                   Print the QNN SDK version.

  --help                                      Show this help message.

See qnn-net-run section for more details about --op_packages and --config_file options.

Execution

qnn-net-run

The qnn-net-run tool is used to consume a model library compiled from the output of the QNN converter, and run it on a particular backend.

DESCRIPTION:
------------
Example application demonstrating how to load and execute a neural network
using QNN APIs.


REQUIRED ARGUMENTS:
-------------------
  --model             <FILE>       Path to the model containing a QNN network.
                                   To compose multiple graphs, use comma-separated list of
                                   model.so files. The syntax is
                                   <qnn_model_name_1.so>,<qnn_model_name_2.so>.

  --backend           <FILE>       Path to a QNN backend to execute the model.

  --input_list        <FILE>       Path to a file listing the inputs for the network.
                                   If there are multiple graphs in model.so, this has
                                   to be comma-separated list of input list files.
                                   When multiple graphs are present, to skip execution of a graph use
                                   "__"(double underscore without quotes) as the file name in the
                                   comma-seperated list of input list files.

  --retrieve_context  <VAL>       Path to cached binary from which to load a saved
                                  context from and execute graphs. --retrieve_context and
                                  --model are mutually exclusive. Only one of the options
                                  can be specified at a time.


OPTIONAL ARGUMENTS:
-------------------
  --model_prefix                             Function prefix to use when loading <qnn_model_name.so>.
                                             Default: QnnModel

  --debug                                    Specifies that output from all layers of the network
                                             will be saved. This option can not be used when loading
                                             a saved context through --retrieve_context option.

  --output_dir                   <DIR>       The directory to save output to. Defaults to ./output.

  --use_native_output_files                  Specifies that the output files will be generated in the data
                                             type native to the graph. If not specified, output files will
                                             be generated in floating point.

  --use_native_input_files                   Specifies that the input files will be parsed in the data
                                             type native to the graph. If not specified, input files will
                                             be parsed in floating point. Note that options --use_native_input_files
                                             and --native_input_tensor_names are mutually exclusive.
                                             Only one of the options can be specified at a time.

  --native_input_tensor_names    <VAL>       Provide a comma-separated list of input tensor names,
                                             for which the input files would be read/parsed in native format.
                                             Note that options --use_native_input_files and
                                             --native_input_tensor_names are mutually exclusive.
                                             Only one of the options can be specified at a time.
                                             The syntax is: graphName0:tensorName0,tensorName1;graphName1:tensorName0,tensorName1

  --op_packages                  <VAL>       Provide a comma-separated list of op packages, interface
                                             providers, and, optionally, targets to register. Valid values
                                             for target are CPU and HTP. The syntax is:
                                             op_package_path:interface_provider:target[,op_package_path:interface_provider:target...]

  --profiling_level              <VAL>       Enable profiling. Valid Values:
                                               1. basic:    captures execution and init time.
                                               2. detailed: in addition to basic, captures per Op timing
                                                            for execution, if a backend supports it.
                                               3. client:   captures only the performance metrics
                                                            measured by qnn-net-run.

  --perf_profile                 <VAL>       Specifies performance profile to be used. Valid settings are
                                             low_balanced, balanced, default, high_performance,
                                             sustained_high_performance, burst, low_power_saver,
                                             power_saver, high_power_saver, extreme_power_saver
                                             and system_settings.
                                             Note: perf_profile argument is now deprecated for
                                             HTP backend, user can specify performance profile
                                             through backend config now. Please refer to config_file
                                             backend extensions usage section below for more details.


  --config_file                  <FILE>      Path to a JSON config file. The config file currently
                                             supports options related to backend extensions,
                                             context priority and graph configs. Please refer to SDK
                                             documentation for more details.

  --log_level                    <VAL>       Specifies max logging level to be set. Valid settings:
                                             error, warn, info, debug, and verbose.

  --shared_buffer                            Specifies creation of shared buffers for graph I/O between the application
                                             and the device/coprocessor associated with a backend directly.

  --synchronous                              Specifies that graphs should be executed synchronously rather than asynchronously.
                                             If a backend does not support asynchronous execution, this flag is unnecessary.

  --num_inferences               <VAL>       Specifies the number of inferences. Loops over the input_list until
                                             the number of inferences has transpired.

  --duration                     <VAL>       Specifies the duration of the graph execution in seconds.
                                             Loops over the input_list until this amount of time has transpired.

  --keep_num_outputs             <VAL>       Specifies the number of outputs to be saved.
                                             Once the number of outputs reach the limit, subsequent outputs would be just discarded.

  --batch_multiplier             <VAL>       Specifies the value with which the batch value in input and output tensors dimensions
                                             will be multiplied. The modified input and output tensors will be used only during
                                             the execute graphs. Composed graphs will still use the tensor dimensions from model.

  --timeout                      <VAL>       Specifies the value of the timeout for execution of graph in micro seconds. Please note
                                             using this option with a backend that does not support timeout signals results in an error.

  --retrieve_context_timeout     <VAL>       Specifies the value of the timeout for initialization of graph in micro seconds. Please note
                                             using this option with a backend that does not support timeout signals results in an error.
                                             Also note that this option can only be used when loading a saved context through
                                             --retrieve_context option.

  --max_input_cache_tensor_sets  <VAL>       Specifies the maximum number of input tensor sets that can be cached.
                                             Use value "-1" to cache all the input tensors created.
                                             Note that options --max_input_cache_tensor_sets and --max_input_cache_size_mb are mutually exclusive.
                                             Only one of the options can be specified at a time.

  --max_input_cache_size_mb      <VAL>       Specifies the maximum cache size in mega bytes(MB).
                                             Note that options --max_input_cache_tensor_sets and --max_input_cache_size_mb are mutually exclusive.
                                             Only one of the options can be specified at a time.

  --set_output_tensors          <VAL>        Provide a comma-separated list of intermediate output tensor names, for which the outputs
                                             will be written in addition to final graph output tensors. Note that options --debug and
                                             --set_output_tensors are mutually exclusive. Only one of the options can be specified at a time.
                                             Also note that this option can not be used when graph is retrieved from context binary,
                                             since the graph is already finalized when retrieved from context binary.
                                             The syntax is: graphName0:tensorName0,tensorName1;graphName1:tensorName0,tensorName1.
                                             In case of a single graph, its name is not necessary and a list of comma separated tensor
                                             names can be provided, e.g.: tensorName0,tensorName1.
                                             The same format can be provided in a .txt file.

 --use_mmap                                  Specifies that the context binary that is being read should be loaded
                                             using the Memory-mapped (MMAP) file I/O. Please note some platforms
                                             may not support this due to OS limitations in which case an error
                                             is thrown when this option is used.

 --validate_binary                           Specifies that the context binary will be validated before creating a context.
                                             This option can only be used with backends that support binary validation.

 --platform_options             <VAL>        Specifies values to pass as platform options. Multiple platform options can be provided
                                             using the syntax: key0:value0;key1:value1;key2:value2

 --graph_profiling_start_delay  <VAL>        Specifies graph profiling start delay in seconds. Please Note that this option can only be used
                                             in conjunction with graph-level profiling handles.

 --dlc_path                     <VAL>        Paths to a comma separated list of Deep Learning Containers (DLC) from which to load the models.
                                             Necessitates libQnnModelDlc.so as the --model argument.
                                             To compose multiple graphs in the context, use comma-separated list of DLC files.
                                             The syntax is <qnn_model_name_1.dlc>,<qnn_model_name_2.dlc>
                                             Default: None

 --graph_profiling_num_executions  <VAL>     Specifies the maximum number of QnnGraph_execute/QnnGraph_executeAsync calls to be profiled.
                                             Please Note that this option can only be used in conjunction with graph-level profiling handles.

 --io_tensor_mem_handle_type       <VAL>     Specifies mem handle type to be used for Input and output tensors during graph execution.
                                             Valid settings: "ion" and "dma_buf".

  --version                                  Print the QNN SDK version.

  --help                                     Show this help message.

EXIT CODES:
------------
List of exit codes used in qnn-net-run application.

Exit codes 1, 2, 126 – 165 and 255 should be avoided for user-defined exit codes since they have
special purpose as below:
1, 2  : Abnormal termination of a program.
126 - 165 are specifically used to indicate seg faults, bus errors etc..

 3  - Application failure reason unknown. See DSP logs (logcat).

 4  - Application failure due to invalid application argument.

 6  - Application failure during setting log level.

 7  - Application failure due to null or invalid function pointer etc.

 9  - Application failure during qnn_net_run_HtpVXXHexagon initialization.

 10 - Application failure during backend creation.

 11 - Application failure during device creation.

 12 - Application failure during Op Package registration.

 13 - Application failure during creating context.

 14 - Application failure during graph prepare.

 15 - Application failure during graph finalize.

 16 - Application failure during create from binary.

 17 - Application failure during graph execution.

 18 - Application failure during context free.

 19 - Application failure during device free.

 20 - Application failure during backend termination.

 21 - Application failure during graph execution abort.

 22 - Application failure during graph execution timeout.

 23 - Application failure during the create from binary with suboptimal cache.

 24 - Application failure during backend termination.

 25 - Application failure during processing binary section or updating binary section etc.

 26 - Application failure during binary update/execution.

See <QNN_SDK_ROOT>/examples/QNN/NetRun folder for reference example on how to use qnn-net-run tool.

Typical arguments:

--backend - The appropriate argument depends on what target and backend you want to run on

Android (aarch64): <QNN_SDK_ROOT>/lib/aarch64-android/

  • CPU - libQnnCpu.so

  • GPU - libQnnGpu.so

  • HTA - libQnnHta.so

  • DSP (Hexagon v65) - libQnnDspV65Stub.so

  • DSP (Hexagon v66) - libQnnDspV66Stub.so

  • DSP - libQnnDsp.so

  • HTP (Hexagon v68) - libQnnHtp.so

  • [Deprecated] HTP Alternate Prepare (Hexagon v68) - libQnnHtpAltPrepStub.so

  • Saver - libQnnSaver.so

Linux x86: <QNN_SDK_ROOT>/lib/x86_64-linux-clang/

  • CPU - libQnnCpu.so

  • HTP (Hexagon v68) - libQnnHtp.so

  • Saver - libQnnSaver.so

Windows x86: <QNN_SDK_ROOT>/lib/x86_64-windows-msvc/

  • CPU - QnnCpu.dll

  • Saver - QnnSaver.dll

WoS: <QNN_SDK_ROOT>/lib/aarch64-windows-msvc/

  • CPU - QnnCpu.dll

  • DSP (Hexagon v66) - QnnDspV66Stub.dll

  • DSP - QnnDsp.dll

  • HTP (Hexagon v68) - QnnHtp.dll

  • Saver - QnnSaver.dll

Note

Hexagon based backend libraries are emulations on x86_64 platforms

--input_list - This argument provides a file containing paths to input files to be used for graph execution. Input files can be specified with the below format:

<input_layer_name>:=<input_layer_path>[<space><input_layer_name>:=<input_layer_path>]
[<input_layer_name>:=<input_layer_path>[<space><input_layer_name>:=<input_layer_path>]]
...

Below is an example containing 3 sets of inputs with layer names “Input_1” and “Input_2”, and files located in the relative path “Placeholder_1/real_input_inputs_1/”:

Input_1:=Placeholder_1/real_input_inputs_1/0-0#e6fb51.rawtensor Input_2:=Placeholder_1/real_input_inputs_1/0-1#8a171b.rawtensor
Input_1:=Placeholder_1/real_input_inputs_1/1-0#67c965.rawtensor Input_2:=Placeholder_1/real_input_inputs_1/1-1#54f1ff.rawtensor
Input_1:=Placeholder_1/real_input_inputs_1/2-0#b42dc6.rawtensor Input_2:=Placeholder_1/real_input_inputs_1/2-1#346a0e.rawtensor

Note: If the batch dimension of the model is greater than 1, the number of batch elements in the input file has to either match the batch dimension specified in the model or it has to be one. In the latter case, qnn-net-run will combine multiple lines into a single input tensor.

--op_packages - This argument is only needed if you are using custom op packages. The native QNN ops are already included as part of the backend libraries.

When using custom op packages, each provided op package requires a colon separated command line argument containing the path to the op package shared library (.so) file, as well as the name of the interface provider, formatted as <op_package_path>:<interface_provider>.

The interface_provider argument must be the name of the function in the op package library that satisfies the QnnOpPackage_InterfaceProvider_t interface. In the skeleton code created by qnn-op-package-generator, this function will be named <package_name><backend>InterfaceProvider.

See Generating Op Packages for more information.

--config_file - This argument is only needed if you need to specify context priority or provide backend extensions related parameters. These parameters are specified through a JSON file. The template of the JSON file is shown below:

{
  "backend_extensions" :
    {
      "shared_library_path" :  "path_to_shared_library",
      "config_file_path" :  "path_to_config_file"
    },
  "context_configs" :
    {
      "context_priority" :  "low | normal | normal_high | high",
      "async_execute_queue_depth" : uint32_value,
      "enable_graphs" :  ["<graph_name_1>", "<graph_name_2>", ...],
      "memory_limit_hint"  : uint64_value,
      "is_persistent_binary" : boolean_value,
      "cache_compatibility_mode" : "permissive | strict"
    },
  "graph_configs" : [
    {
      "graph_name" :  "graph_name_1",
      "graph_priority" :  "low | normal | normal_high | high"
      "graph_profiling_start_delay" : double_value
      "graph_profiling_num_executions" : uint64_value
    }
  ],
  "profile_configs" :
    {
      "num_max_events" : uint64_value
    },
  "async_graph_execution_config" :
    {
      "input_tensors_creation_tasks_limit" : uint32_value,
      "execute_enqueue_tasks_limit" : uint32_value
    }
}

All the options in the JSON file are optional. context_priority is used to specify priority of the context as a context config. async_execute_queue_depth is used to specify the number of executions that can be in the queue at a given time. While using a context binary, enable_graphs is used to implement the graph selection functionality. memory_limit_hint is used to set the peak memory limit hint of a deserialized context in MBs. is_persistent_binary indicates that the context binary pointer is available during QnnContext_createFromBinary and until QnnContext_free is called.

Set Cache Compatibility Mode : cache_compatibility_mode specifies the mode used to check whether cache record is optimal for the device. The available modes indicate binary cache compatibility:

  • “permissive”: Binary cache is compatible if it could run on the device; default.

  • “strict”: Binary cache is compatible if it could run on the device and fully utilize hardware capability. If it cannot fully utilize hardware, selecting this option results in a recommendation to prepare the cache again. This option returns an error if it is not supported by the selected backend.

Graph Selection : Allows to specify a subset of graphs in a context to be loaded and executed. If enable_graphs is specified, only those graphs are loaded. If a graph name is selected and it doesn’t exist, that would be an error. If enable_graphs is not specified or passed as an empty list, default behaviour continues where all graphs in a context are loaded.

graph_configs can be used to specify asynchronous execution order and depth, if a backend supports asynchronous execution. Every set of graph configs has to be specified along with a graph name. graph_profiling_start_delay is used to set the profiling start delay time in seconds. graph_profiling_num_executions is used to set the maximum number of QnnGraph_execute/QnnGraph_executeAsync calls that will be profiled.

profile_configs can be used to specify the max profile events per profiling handle.

async_graph_execution_config can be used to specify the limits on number of tasks that run in parallel when graphs are executed asynchronously using graphExecuteAsync. input_tensors_creation_tasks_limit specifies the maximum number of tasks in which input tensor sets are populated, which can be used for graph execution. execute_enqueue_tasks_limit specifies the maximum number of tasks in which the backend graphExecuteAsync will be called using the pre-populated input tensors. If unspecified, these values will be set to the specified “async_execute_queue_depth” or 10 which is the default for “async_execute_queue_depth”.

backend_extensions is used to exercise custom options in a particular backend. This can be done by providing an extensions shared library (.so) and a config file, if necessary. This is also required to enable various performance modes, which can be exercised using backend config. Currently, HTP supports it through libQnnHtpNetRunExtensions.so shared library, DSP supports it through libQnnDspNetRunExtensions.so and GPU supports it through libQnnGpuNetRunExtensions.so. For different custom options which can be enabled with HTP see HTP Backend Extensions

--shared_buffer - This argument is only needed to indicate qnn-net-run to use shared buffers for zero-copy use case with a device/coprocessor associated with a particular backend (for ex., DSP with HTP backend) for graph input and output tensor data. This option is supported on Android only. qnn-net-run implements this feature using rpcmem APIs, which further create shared buffers using ION/DMA-BUF memory allocator on Android, available through the shared library libcdsprpc.so. In addition to specifying this option, for qnn-net-run to be able to discover libcdsprpc.so, the path in which the shared library is present needs to be appended to LD_LIBRARY_PATH variable.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/vendor/lib64

Running Quantized Model on HTP backend with qnn-net-run

The HTP backend currently allows to finalize / create an optimized version of a quantized QNN model offline, on Linux development host (using x86_64-linux-clang backend library) and then execute the finalized model on device (using hexagon-v68 backend libraries).

First, configure the environment by following instructions in Setup section. Next, build QNN Model library from your network, using artifacts produced by one of QNN converters. See Building Example Model for reference. Lastly, use the qnn-context-binary-generator utility to generate a serialized representation of the finalized graph to execute the serialized binary on device.

1# Generate the optimized serialized representation of QNN Model on Linux development host.
2$ qnn-context-binary-generator --binary_file qnngraph.serialized.bin \
3                               --model <path_to_model_library>/libQnnModel.so \ # a x86_64-linux-clang built quantized QNN model
4                               --backend ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnHtp.so \
5                               --output_dir <output_dir_for_result_and_qnngraph_serialized_binary> \

To use produced serialized representation of the finalized graph (qnngraph.serialized.bin) ensure the below binaries are available on the android device:

  • libQnnHtpV68Stub.so (ARM)

  • libQnnHtpPrepare.so (ARM)

  • libQnnModel.so (ARM)

  • libQnnHtpV68Skel.so (cDSP v68)

  • qnngraph.serialized.bin (serialized binary from run on Linux development host)

See <QNN_SDK_ROOT>/examples/QNN/NetRun/android/android-qnn-net-run.sh script for reference on how to use qnn-net-run tool on android device.

1# Run the optimized graph on HTP target
2$ qnn-net-run --retrieve_context qnngraph.serialized.bin \
3              --backend <path_to_model_library>/libQnnHtp.so \
4              --output_dir <output_dir_for_result> \
5              --input_list <path_to_input_list.txt>

Running Float Model on HTP backend with qnn-net-run

The QNN HTP backend can support running float32 models on select Qualcomm SoCs.

First, configure the environment by following instructions in Setup section. Next, build QNN Model library from your network, using artifacts produced by one of QNN converters. See Building Example Model for reference.

Lastly, configure backend_extensions parameters through a JSON file and set custom options for the HTP backend. Pass this file to qnn-net-run using --config_file argument. backend_extensions take two parameters, an extensions shared library (.so) (for HTP use libQnnHtpNetRunExtensions.so) and a config file for the backend.

Below is the template for the JSON file:

{
  "backend_extensions" :
    {
      "shared_library_path" :  "path_to_shared_library",
      "config_file_path" :  "path_to_config_file"
    }
}

For HTP backend extensions configurations, you can set “vtcm_mb”, “fp16_relaxed_precision” and “graph_names” through a config file.

Here is an example of the config file:

 1{
 2   "graphs": [
 3      {
 4        "vtcm_mb": 8,  // Provides performance infrastructure configuration options that are memory specific.
 5                       // Optional; if not set, QNN HTP defaults to 4.
 6        "fp16_relaxed_precision": 1,  // Ensures that operations will run with relaxed precision math i.e. float16 math
 7
 8        "graph_names": [ "qnn_model" ]  // Provide the list of names of the graph for the inference as specified when using qnn converter tools
 9                                        // "qnn_model" must be the name of the .cpp file generated during the model conversion (without the .cpp file extension)
10        .....
11      },
12      {
13         .....  // Other graph object
14      }
15   ]
16}

Note

“fp16_relaxed_precision” is the key configuration to enable running QNN float models on HTP float runtime. HTP Graph Configurations such as fp16_relaxed_precision, vtcm_mb etc are only applied if at least one “graph_name” is provided in backend extensions config.

See <QNN_SDK_ROOT>/examples/QNN/NetRun/android/android-qnn-net-run.sh script for reference on how to use qnn-net-run tool on android device.

1# Run the optimized graph on HTP target
2$ qnn-net-run --model <path_to_model_library>/libQnnModel.so \ # a x86_64-linux-clang built float QNN model
3              --backend ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnHtp.so \
4              --config_file <path_to_JSON_file.json> \
5              --output_dir <output_dir_for_result> \
6              --input_list <path_to_input_list.txt>

qnn-throughput-net-run

The qnn-throughput-net-run tool is used to exercise the execution of multiple models on a QNN backend or on different backends in a multi-threaded fashion. It allows repeated execution of models on a specified backend for a specified duration or number of iterations.

Usage:
------
qnn-throughput-net-run [--config <config_file>.json]
                       [--output <results>.json]

REQUIRED argument(s):
 --config        <FILE>.json       Path to the json config file .

OPTIONAL argument(s):
 --output        <FILE>.json       Specify the json file used to save the performance test results.

Configuration JSON File:

qnn-throughput-net-run uses configuration file as input to run the models on the backends. The configuration json file comprises of four objects (required) - backends, models, contexts and testCase.

Below is an example of a json configuration file. Please refer the following section for detailed information on the four configuration objects backends, models, contexts and testCase.

{
  "backends": [
    {
      "backendName": "cpu_backend",
      "backendPath": "libQnnCpu.so",
      "profilingLevel": "BASIC",
      "backendExtensions": "libQnnHtpNetRunExtensions.so",
      "perfProfile": "high_performance"
    },
    {
      "backendName": "gpu_backend",
      "backendPath": "libQnnGpu.so",
      "profilingLevel": "OFF"
    }
  ],
  "models": [
    {
      "modelName": "model_1",
      "modelPath": "libqnn_model_1.so",
      "loadFromCachedBinary": false,
      "inputPath": "model_1-input_list.txt",
      "inputDataType": "FLOAT",
      "postProcessor": "MSE",
      "outputPath": "model_1-output",
      "outputDataType": "FLOAT_ONLY",
      "saveOutput": "NATIVE_ALL",
      "groundTruthPath": "model_1-golden_list.txt"
    },
    {
      "modelName": "model_2",
      "modelPath": "libqnn_model_2.so",
      "loadFromCachedBinary": false,
      "inputPath": "model_2-input_list.txt",
      "inputDataType": "FLOAT",
      "postProcessor": "MSE",
      "outputPath": "model_2-output",
      "outputDataType": "FLOAT_ONLY",
      "saveOutput": "NATIVE_LAST"
    }
  ],
  "contexts": [
    {
      "contextName": "cpu_context_1"
    },
    {
      "contextName": "gpu_context_1"
    }
  ],
  "testCase": {
    "iteration": 5,
    "logLevel": "error",
    "threads": [
      {
        "threadName": "cpu_thread_1",
        "backend": "cpu_backend",
        "context": "cpu_context_1",
        "model": "model_1",
        "interval": 10,
        "loopUnit": "count",
        "loop": 1
      },
      {
        "threadName": "gpu_thread_1",
        "backend": "gpu_backend",
        "context": "gpu_context_1",
        "model": "model_2",
        "interval": 0,
        "loopUnit": "count",
        "loop": 10
      }
    ]
  }
}

Key

Value Type

Default Value

Optional / Required

Description

backendName

string

-

Required

Is a unique identifier for the testcase to designate on which backend the model should be run.

backendPath

string

-

Required

Specifies the on device backend .so library file path.

profilingLevel

string

OFF

Optional

Sets the QNN profiling level for the backend. Possible values: OFF, BASIC, DETAILED.

  • BASIC - Captures execution and init times.

  • DETAILED - In addition to BASIC captures per Op timing for execution, if backend supports.

backendExtensions

string

-

Optional

Enables backend specific options through optional backend extensions shared library and config file. Syntax: path_to_shared_library.

This is required to enable various performance modes which are exercised using perfProfile option. Currently, HTP supports it through libQnnHtpNetRunExtensions.so shared library.

perfProfile

string

default

Optional

Specifies performance profile to set.

Possible values: low_balanced, balanced, default, high_performance, sustained_high_performance, burst, low_power_saver, power_saver, high_power_saver, extreme_power_saver and system_settings.

opPackagePath

string

Native QNN Ops. part of the backend libraries

Optional

Comma seperated list of custom op packages and interface providers for registration.

Syntax: op_package_1_path:interface_provider_1[,op_package_2_path:interface_provider_2…]

platformOption

string

-

Optional

Enables backend specific platform options through QnnBackend_Config_t.

Syntax: "key:value"

Key

Value Type

Default Value

Optional / Required

Description

modelName

string

-

Required

Is a unique identifier for the testcase to designate which model to run.

modelPath

string

-

Required

Specifies the <model>.so / <serialized_context>.bin file path.

loadFromCachedBinary

bool

false

Optional

Set to true if <serialized_context>.bin is used in modelPath.

inputPath

string

-

Optional

Path to a file listing the inputs for the model.

If there are multiple graphs in the <model>.so / <serialized_context>.bin, this has to be comma-separated list of input path of individual graph. Syntax: Graph1_input_path[,Graph2_input_path,…]

If not set, Random Input Data is used.

inputDataType

string

NATIVE

Optional

Possible values: NATIVE, FLOAT.

postProcessor

string

-

Optional

Possible values: NONE, MSE, MSE_FLOAT32, MSE_INT8, MSE_INT16. If there are multiple graphs in the <model>.so / <serialized_context>.bin, this has to be comma-separated list of postProcessor values. Syntax: MSE[,NONE,…]

MSE will output a mean squared error result for each execution with the golden file specified by the parameter groundTruthPath. If the groundTruthPath is not specified, the first execution output result is used to compute the MSE. If the datatype of the file specified in groundTruthPath is different from the network’s output type, users need to specify the relevant datatype in the postProcessor parameter.

outputPath

string

-

Optional

If postProcessor is not NONE, output files and profiling logs will be saved to this directory.

outputDataType

string

NATIVE_ONLY

Optional

Possible values: NATIVE_ONLY, FLOAT_ONLY, FLOAT_AND_NATIVE.

saveOutput

string

NONE

Optional

Possible values: NONE, NATIVE_LAST,NATIVE_ALL.

  • NATIVE_LAST - Saves only the result of the last network execution to the outputPath.

  • NATIVE_ALL - Saves the results of all network executions to the outputPath.

groundTruthPath

string

NONE

Optional

Specifies the golden file path for computing the MSE. If there are multiple graphs in the <model>.so / <serialized_context>.bin, this has to be comma-separated list of ground truth path of individual graph. Syntax: Graph1_ground_truth_path_[,Graph2_ground_truth_path_,…]

Key

Value Type

Default Value

Optional / Required

Description

contextName

string

-

Required

Is a unique identifier for the testcase to designate the context in which a model should be created.

priority

string

DEFAULT

Optional

Specifies the priority of the context. Possible values: DEFAULT, LOW, NORMAL, HIGH.

executeAsyncQueueDepth

int

-

Optional

Specfies the queue depth for async execution.

cacheCompatibilityMode

string

-

Optional

Specifies the cache compatibility check mode; valid values are: “permissive” (default), and “strict”.

Key

Value Type

Default Value

Optional / Required

Description

iteration

int

-

Required

Number of times the entire use case is repeated. If the value is negative, test runs forever until keyboard interrupt.

logLevel

string

-

Optional

Specifies max logging level to be set. Valid settings: error, warn, info, debug, and verbose

threads

string

-

Required

Property value is an array of json objects, where each object contains all the thread details, that are to be executed by the qnn-throughput-net-run. Each object of the array has the below properties listed under threads as key/value pairs.

Key

Value Type

Default Value

Optional / Required

Description

threadName

string

-

Required

Is a unique identifier for the testcase to identify the thread and save the output results.

backend

string

-

Required

Specifies the backend to be used when this thread executes the graph. The value specified should match with one of the backendName entry in the backends property of the configuration json.

context

string

-

Required

Specifies the context to be used when this thread executes the graph. The value specified should match with one of the contextName entry in the contexts property of the configuration json.

model

string

-

Required

Specifies the model to be used by the thread for execution. The value specified should match with one of the modelName entry in the models property of the configuration json.

initModelInLoop

bool

false

Optional

Set it to true if the model needs to be initialized repeatedly for every iteration. The value cannot be set to true if loadFromCachedBinary from models property is true.

loadInputDataInLoop

bool

false

Optional

Set it to true if the input needs to be reloaded for every loop of execution.

useRandomData

bool

false

Optional

Set it to true if random data is needed to be used as input.

interval

int

0

Optional

Repesents the interval (in microseconds) between each graph execution in the thread.

loopUnit

string

count

Optional

Possible values: count, second.

loop

int

1

Optional

Value is taken either as seconds or count based on the value for the loopUnit. If loopUnit is second, the value specifies the number of seconds the threads repeats execution. If loopUnit is count, the value specifies number of times thread repeats execution.

executeAsynchronous

bool

false

Optional

Set it to true if the graphs should be executed asynchronously rather than synchronously. If the backend does not support asynchronous execution, this option results in an error.

backendConfig

string

-

Optional

Specifies the backend config file to enable backend specific options through backendExtensions shared library. Syntax: path_to_backend_config_file.

An example json file sample_config.json file can be found at <QNN_SDK_ROOT>/examples/QNN/ThroughputNetRun.

Analysis

qairt-accuracy-evaluator (Beta)

The qairt-accuracy-evaluator tool provides a framework to evaluate end-to-end accuracy metrics for a model on a given dataset. In addition, the tool can be used to identify the best quantization options for a model on a given set of inputs.

Dependencies

The QNN Accuracy Evaluator assumes that the platform dependencies and environment setup instructions have been followed as outlined in the Setup page. Certain additional python packages are required by this tool, refer to Optional Python packages.

Note: The qairt-accuracy-evaluator currently supports only ONNX models.

Usage

User needs to set QNN_SDK_ROOT environment variable to root directory of QNN SDK. The following environment variables might need to be set with appropriate values: QNN_MODEL_ZOO : Path to model zoo. If not set model_zoo base directory path is assumed to present at “/home/model_zoo”.

Note: This environment variable is required only if the model path supplied is not absolute and relative to the set model zoo path. ADB_PATH : Set the path to the ADB binary. If not set, it is queried and set from its executable path.

To conduct an accuracy analysis of a given model using a specific dataset, the user must create a configuration that specifies the backends, quantization options, and reference inference frameworks. Sample config files can be found at ${QNN_SDK_ROOT}/lib/python/qti/aisw/accuracy_evaluator/configs/samples/model_configs.

The high-level structure of a model config is shown below:

model
    info
    globals
    dataset
    preprocessing
    postprocessing
    inference-engine
    verifier
    metrics

User needs to provide all dataset information under the dataset section in the model config file, failing which, an error is thrown. An example of this is shown below:

dataset:
    name: COCO2014
    path: '/home/ml-datasets/COCO/2014/'
    inputlist_file: inputlist.txt
    calibration:
        type: index
        file: calibration-index.txt

Details of the dataset fields is as follows:

Field

Description

name

Name of the dataset

path

Base directory of the dataset files

inputlist_file

Text file containing all the pre-processed input files relative to the path field, one input per line.
For models having multiple inputs, the inputs in each line have to be comma separated

calibration

Specifies the calibration file type to be used with quantization. Optional. It has following params
  • type: Type can be ‘index’, ‘raw’ or ‘dataset’
    • index - File provided contains the indexes to be picked from inputlist for calibration

    • raw - File provided contains entries of pre-processed raw files for calibration

    • dataset - File provided contains images processed separately and passed to inference

  • file: pre-processed calibration file name

The inference engine is used to run the model on multiple inference schemas. A sample inference engine section is shown below, followed by the description of the different configurable entries in the inference section.

inference-engine:
    model_path: MLPerfModels/ResNetV1.5/modelFiles/ONNX/resnet50_v1.onnx
    simplify_model : True
    inference_schemas:
        - inference_schema:
            name: qnn
            precision: quant
            target_arch: x86_64-linux-clang
            backend: htp
            tag: qnn_int8_htp_x86
            converter_params:
               float_bias_bitwidth: 32
            quantizer_params:
               param_quantizer_schema: symmetric
               act_quantizer_calibration: min-max
               use_per_channel_quantization: True
            backend_extensions:
               vtcm_mb: 4
               rpc_control_latency: 100
               dsp_arch: v75 #mandatory
    inputs_info:
        - input_tensor_0:
              type: float32
              shape: ["*", 3, 224, 224]
    outputs_info:
        - ArgMax_0:
              type: int64
              shape: ["*"]
        - softmax_tensor_0:
              type: float32
              shape: ["*", 1001]

Details of each configurable entry is given below:

Field

Description

model_path

Absolute or relative path of the model. If the path is relative, it would be taken relative to MODEL_ZOO_PATH, if set, else default /home/model_zoo

simplify_model

Flag to enable or disable model simplification for ONNX models. By default, this flag is set to True and the model would be simplified. Note: Model simplification would be skipped for models having custom operators or for inference schemas having quantization_overrides parameter configured.

inference_schemas

List of inference schemas to perform inference on. Each inference_schema has further entries as the following:
  • name - Name of the inference schema. Options: qnn, onnxrt, tensorflow, torchscript, tensorflow-session

  • precision - Precision to run inference on. Options: fp32, fp16, int8/quant

  • target_arch - Target architecture to run inference on. Options: x86_64-linux-clang, aarch64-android

  • backend - Backend to run inference on. Allowed backends for x86_64-linux-clang: {cpu,htp} and aarch64: {cpu,gpu,htp}.

  • tag - Tag unique for a inference schema

  • converter_params - Params to be passed as arguments to converter

  • quantizer_params - Params to be passed as arguments to quantizer

  • contextbin_params - Params to be passed as arguments to context-binary-generator

  • netrun_params - Params to be passed as arguments to net-run

  • backend_extensions - Params to be passed as backend extensions config file to context-binary-generator and net-run

input_info

Information about each model input. Requires following params in the given order
  • type - numpy type (float16, float32, float64, int8, int16, int32, int64)

  • shape - list of dimensions

output_info

Information about each model output. Requires following params in the given order
  • type - numpy type (float16, float32, float64, int8, int16, int32, int64)

  • shape - list of dimensions

Note

For HTP backend emulation on host, set backend to “htp” and target_arch as “x86_64-linux-clang” in the config file.
For HTP backend execution on Android device, set backend to “htp” and target_arch as “aarch64-android” in the config file. Also, user must provide dsp_arch version such as “v69”, “v73”, “v75” under backend_extensions section.

Command line options available for config mode are as follows:

qairt-acc-evaluator options

options:
    -config CONFIG        path to model config yaml
    -work_dir WORK_DIR      working directory path. default is ./qacc_temp
    -preproc_file PREPROC_FILE
                            Path to the text file containing list of preprocessed files.
                            If this file is provided evaluator will start at infer stage
    -calib_file CALIB_FILE
                            Path to the text file containing list of calibration files.
    -onnx_symbol ONNX_SYMBOL [ONNX_SYMBOL ...]
                            Replace onnx symbols in input/output shapes. Can be passed as list of multiple items.
                            Default replaced by 1. Example: __unk_200:1
    -device_id DEVICE_ID    Target device id to be provided
    -inference_schema_type INFERENCE_SCHEMA_TYPE
                            run only the inference schemas with this name. Example: qnn, onnxrt
    -inference_schema_tag INFERENCE_SCHEMA_TAG
                            run only this inference schema tag
    -cleanup CLEANUP        end: deletes the files after all stages are completed.
                            intermediate: deletes after previous stage outputs are used. (default:'')
    -use_memory_plugins     Flag to enable memory plugins.
    -silent                 Run in silent mode. Do not expect any CLI input from user.
    -debug                  Enable debug logs on console and the file. (default: False)
    -set_global SET_GLOBAL [SET_GLOBAL ...]
                            Option used to set a global variable. It can be repeated.
                            Example: -set_global count:10 -set_global calib:5 (default: None)

Note

Users can accelerate their evaluations using memory plugins to minimize unnecessary reading and writing of data during evaluation by passing the -use_memory_plugins flag to the evaluator command.

Config file options

- inference_schema:
    name: qnn
    target_arch: x86_64-linux-clang
    backend: cpu
    precision: fp32
    tag: qnn_cpu_x86

- inference_schema:
    name: qnn
    target_arch: aarch64-android
    backend: cpu
    precision: fp32
    tag: qnn_cpu_android

- inference_schema:
    name: qnn
    target_arch: aarch64-android
    backend: gpu
    precision: fp32
    tag: qnn_gpu_android

- inference_schema:
    name: qnn
    target_arch: x86_64-linux-clang
    backend: htp
    precision: quant
    tag: htp_int8
    converter_params:
        quantization_overrides: "path to the ext quant json"
    quantizer_params:
        param_quantizer_calibration: min-max | sqnr
        param_quantizer_schema: asymmetric  | symmetric
        use_per_channel_quantization: True | False
        use_per_row_quantization: True | False
        act_bitwidth: 8 | 16
        bias_bitwidth: 8 | 32
        weights_bitwidth: 8 | 4
    backend_extensions:
        dsp_arch: v79 # mandatory
        vtcm_mb: 4
        rpc_control_latency: 100

- inference_schema:
    name: qnn
    target_arch: aarch64-android
    backend: htp
    precision: quant
    tag: htp_int8
    converter_params:
        quantization_overrides: "path to the ext quant json"
    quantizer_params:
        param_quantizer_calibration: min-max | sqnr
        param_quantizer_schema: asymmetric  | symmetric
        use_per_channel_quantization: True | False
        use_per_row_quantization: True | False
        act_bitwidth: 8 | 16
        bias_bitwidth: 8 | 32
        weights_bitwidth: 8 | 4
    backend_extensions:
        dsp_arch: v79 # mandatory
        vtcm_mb: 4
        rpc_control_latency: 100

Verifiers

The verifier section provides information about the verifier being used to compare the inference outputs, in case of multiple inference schemas. A sample verifier section is shown below, followed by the description of the different configurable entries in the section.

verifier:
    enabled: True
    fetch_top: 1
    type: avg
    tol: 0.01

Details of each configurable entry is given below:

Field

Description

verifier

If multiple inference schemas are provided, compare the inference outputs with the reference inference schema. If reference inference schema is not defined, the first inference schema is considered the reference. If only one inference schema is defined, verifier is not executed. The following params need to be provided:
  • enabled - By default enabled (True)

  • fetch_top - Fetch top ‘n’ highest mismatching outputs, Default 1

  • type - One of in-built verifiers (abs, cos, topk, avg, l1norm, l2norm). Default avg

  • tol - Tolerance value. Default 0.001

Following are the verifiers that can be used to compare the outputs. Some of the verifiers output percentage match between the two tensors and some output the absolute value, corresponding to the selected verifier.

  1. abs - Percentage match between the two tensors based on the relative tolerance threshold value

  2. cos - Percentage match between the two tensors based on the Cosine Similarity score

  3. topk - Percentage match between the two tensors based on the topk match between the two tensors

  4. avg - Percentage match between the two tensors based on the average difference between the two tensors

  5. l1norm - Percentage match between the two tensors based on the L1 Norm of the diff

  6. l2norm - Percentage match between the two tensors based on the L2 Norm of the diff

  7. std - Percentage match between the two tensors based on the standard deviation difference

  8. rme - Percentage match between the two tensors based on the RMSE between the tensors

  9. snr - Signal to Noise Ratio between the two tensors

  10. maxerror - max error value between the two tensors

  11. kld - KL Divergence value between the two tensors

  12. pixelbypixel - pixel by pixel plot difference between the two tensors. For each input i, plot is saved at {work_dir}/{schema}/Result_{i}

  13. box - The box verifier requires the –box_input parameter which accepts the filename of a json file with the following format:

    {"box":"Result_0/detection_boxes_0.raw",
    "class":"Result_0/detection_classes_0.raw",
    "score":"Result_0/detection_scores_0.raw"}
    

Plugins

Plugins are Python classes used to implement different stages of the inference pipeline, such as dataset handling, preprocessing, postprocessing, and metrics logic.

Dataset and pre-processing plugins perform transformations to the input before they are passed to inference.

Post-processing plugins transform inference outputs.

Metric plugins analyze inference outputs to assess their accuracy

Sample plugins are provided in the SDK at ${QNN_SDK_ROOT}/lib/python/qti/aisw/accuracy_evaluator/plugins.

Users can implement their own plugins (custom plugins) to meet their specific requirements. To include custom plugins, export the CUSTOM_PLUGIN_PATH environment variable pointing to the location of the custom plugin(s), so that they are also included while registering the plugin(s).

export CUSTOM_PLUGIN_PATH=/path/to/custom/plugins/directory

In the model configuration file, plugins are defined as a transformation chain, as shown below:

transformations:
    - plugin:
          name: resize
          params:
              dims: 416,416
              channel_order: RGB
              type: letterbox

    - plugin:
          name: normalize
    - plugin:
          name: convert_nchw

Plugins required for dataset transformation are configured in the dataset section as shown below.

dataset:
    name: ILSVRC2012
    path: '/home/ml-datasets/imageNet/'
    inputlist_file: inputlist.txt
    annotation_file: ground_truth.txt
    calibration:
        type: dataset
        file: calibration.txt
    transformations:
        - plugin:
              name: filter_dataset
              params:
                  random: False
                  max_inputs: -1
                  max_calib: -1

The preprocessing and postprocessing plugins that the user wishes to use are configured in the processing section as shown below:

preprocessing:
    transformations:
        - plugin:
              name: resize
              params:
                  dims: 416,416
                  channel_order: RGB
                  type: letterbox

        - plugin:
              name: normalize

postprocessing:
    squash_results: True
    transformations:
        - plugin:
              name: object_detection
              params:
                  dims: 416,416
                  type: letterbox
                  dtypes: [float32, float32, float32, float32]

Metric calculation plugins are configured in the metrics section as shown below.

metrics:
    transformations:
        - plugin:
              name: topk
              params:
                  kval: 1,5
                  softmax_index: 1
                  round: 7
                  label_offset: 1

Plugins that need to be executed for a pipeline stage are listed under ‘transformations’ and preceded by the ‘plugin’ keyword. The following table lists details of each configurable entry for a plugin.

Field

Description

name

Name of the plugin

params

Parameters expected and required by the plugin

A complete list of all plugins and their parameters can be found at Accuracy Evaluator Plugins

Sample Command

qairt-accuracy-evaluator -config {path to configs}/qnn_resnet50_config.yaml

Results

The tool displays a table with quantization options ordered by output match based on the selected verifier and also generates a csv file with the same data. The comparator column shows output match percentage/value based on the selected verifier.The quant params column displays the quantization params used for that run. Other columns also show backend, runtime/compile params used. The information is also stored in a csv file at {work_dir}/metrics-info.csv.

Artifacts associated with each of the configured quantization option are stored at ` {work_dir}/infer/schema{i}_qnn_{backend}_{precision}_{j}`. Model outputs are stored at ` {work_dir}/infer/schema{i}_qnn_{backend}_{precision}_{j}/Result_{k}`.

Note

Snapshot of console log has been added for clarity.

../_static/resources/qnn_acc_eval_output.png

Note

Snapshot of csv file has been added for clarity.

../_static/resources/qnn_acc_eval_csv.png

qnn-architecture-checker (Beta)

Architecture Checker is a tool made for models running with HTP backend, including quantized 8-bit, quantized 16-bit and FP16 models. It outputs a list of issues in the model that keep the model from getting better performance while running on the HTP backend. Architecture checker tool can be invoked with the modifier feature which will apply the recommended modifications for these issues. This will help in visualizing the changes that can be applied to the model to make it a better fit on the HTP backend.

X86-Linux/ WSL Usage:
$ qnn-architecture-checker -i <path>/model.json
                         -b <optional_path>/model.bin
                         -o <optional_output_path>
                         -m <optional_modifier_argument>

X86-Windows/ Windows on Snapdragon Usage:
$ python qnn-architecture-checker -i <path>/model.json
                         -b <optional_path>/model.bin
                         -o <optional_output_path>
                         -m <optional_modifier_argument>

 required arguments:
     -i INPUT_JSON, --input_json INPUT_JSON
                             Path to json file

 optional arguments:
     -b BIN, --bin BIN
                     Path to a bin file
     -o OUTPUT_PATH, --output_path OUTPUT_PATH
                     Path where the output csv should be saved. If not specified, the output csv will be written to the same path as the input file
     -m MODIFY, --modify MODIFY
                     The query to select the modifications to apply.
                         --modify or --modify show - To see all the possible modifications. Display list of rule names and details of the modifications.
                         --modify all - To apply all the possible modifications found for the model.
                         --modify apply=rule_name1,rule_name2 - To apply modifications for specified rule names. The list of rules should be comma separated without spaces
Note:
If running on a quantized model, the quantized model generated with one input image is good enough to satisfy the quantization requirement to have the tool run properly.
QNN_SDK_ROOT environment variable must be configured before running the tool.
Deprecation Note:
The option of enabling architecture checker by passing ‘–arch_checker’ in each converter listed above will be deprecated. E.g: Running qnn-tflite-converter -i <path>/model.tflite -d <network_input_name> <dims> -o <optional_output_path> -p <optional_package_name> –arch_checker will be deprecated.
To enable the Architecture checker, run the converter tool without passing the ‘–arch_checker’ argument, then run the qnn-architecture-checker command to see the architecture checker output.
The usage of “–modify” is only supported with the qnn-architecture-checker command.

The output is a csv file and will be saved as <optional_output_path>/<model_name>_architecture_checker.csv. An example output is shown below:

Graph/Node_name

Issue

Recommendation

Type

Input_tensor_name:[dims]

Output_tensor_name:[dims]

Parameters

Previous node

Next nodes

Modification

Modification_info

1

Graph

This model uses 16-bit activation data. 16-bit activation data takes twice the amount of memory than 8-bit activation data does.

Try to use a smaller datatype to get better performance. E.g., 8-bit

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

2

Node_name_1

The number of channels in the input/output tensor of this convolution node is low (smaller than 32).

Try increasing the number of channels in the input/output tensor to 32 or greater to get better performance.

Conv2d

input_1:[1, 250, 250, 3], __param_1:[5, 5, 3, 32], convolution_0_bias:[32]

output_1:[1, 123, 123, 32]

{‘package’: ‘qti.aisw’, ‘type’: ‘Conv2d’, …}

[‘previous_node_name’]

[‘next_node_name1’, ‘next_node_name2’]

N/A

N/A

How to read the example output csv?
Row 1: This is an issue on the graph, the graph is using 16-bit activation data, as said in the recommendation, changing the activation from 16 bit to 8 bit gives better performance.
Row 2: The issue is on the node with QNN node name as “Node_name_1”. This node has three inputs: input_1, __param_1 and convolution_0_bias where the dimensions are [1, 250, 250, 3], [5, 5, 3, 32] and [32] respectively. This node has one output with QNN tensor name output_1 and the dimension of this tensor is [1, 123, 123, 32]. The type of this node is Conv2d. The previous/next node names and the full set of additional node parameters available in the Parameters column that can be used to locate the node inside the original model. The issue for this node is the channel of the input tensor is low, as the channel is smaller than 32, would recommend to increase the channel to at least 32 to get better performance on HTP backend. Currently the input dimension is [1, 250, 250, 3] and ideally have that to be [1, x, x, 32]. The Modification and Modification_info columns provide details about the modifications applied to the node. If the Architecture Checker is not invoked with modifier or if there aren’t any modifications applicable, then these value will be N/A.
Is the QNN node/tensor name the same in the original model?
It is not the same but should be similar. There is naming sanitization in converter in order to meet the QNN naming standard. The input tensor, output tensor, previous node, next node and all the additional parameters are avaliable in the output csv file to help locate the correct node inside the original model.

Sample Command

qnn-architecture-checker --input_json ./model_net.json
                       --bin ./model.bin
                       --output_path ./archCheckerOutput

Architecture Checker - Model Modifier

For appying modifications to the model, the Architecture Checker can be invoked with “–modify” or “–modify show” which will display a list of possible modifications. In this case, the Architecture Checker tool will only show the rule names and modification detail. It will run without making any changes to the model and generate the csv output. Using the rule names from the above run, the Architecture Checker can be invoked with “–modify all” or “–modify apply=rule_name1,rule_name2”. In this case, the rule specific changes will be applied to the model and the changes can be viewed in the updated model json. Additionally, the output csv will also contain information related to the modifications.

Consider the below csv output generated after applying “–modify apply=elwisediv” modification on an example model.

Graph/Node_name

Issue

Recommendation

Type

Input_tensor_name:[dims]

Output_tensor_name:[dims]

Parameters

Previous node

Next nodes

Modification

Modification_info

1

Node_name_1

ElementWiseDivide usually has poor performance compared to ElementWiseMultiply.

Try replacing ElementWiseDivide with ElementWiseMultiply using the reciprocal value to get better performance.

Eltwise_Binary

input_1:[1, 52, 52, 6], input_2:[1]

output_1:[1, 52, 52, 6]

{‘package’: ‘qti.aisw’, ‘eltwise_type’: ‘ElementWiseDivide’, …}

[‘previous_node_name’]

[‘next_node_name1’, ‘next_node_name2’]

Done

ElementWiseDivide has been replaced by ElementWiseMultiply using the reciprocal value

2

Node_name_2

The number of channels in the input/output tensor of this convolution node is low (smaller than 32).

Try increasing the number of channels in the input/output tensor to 32 or greater to get better performance.

Conv2d

input_3:[1, 250, 250, 3], __param_1:[5, 5, 3, 32], convolution_1_bias:[32]

output_2:[1, 123, 123, 32]

{‘package’: ‘qti.aisw’, ‘type’: ‘Conv2d’, …}

[‘previous_node_name’]

[‘next_node_name1’, ‘next_node_name2’]

N/A

N/A

How to read the example output csv?
Row 1: The issue on the node with QNN node name as “Node_name_1” is that it has element wise divide which gives a poor performance as compared to elementwise multipy. After invoking architecture checker with “–modify apply=elwisediv”, the modifications have been successfully applied i.e. the element wise divide is replaced by element wise multiply with a reciprocal value. This information is available in the Modification and Modification_info columns.
Row 2: The issue on the node with QNN node name as “Node_name_2” is that the node has input tensor with number of channels less than 32. Its recommended to increase the number of channels to 32 or greater for better performance. For this issue, the modification through the tool is not applicable hence the Modification and Modification_info columns are N/A.
After modifying the model, the above run will generate updated model.cpp, model_net.json and/or model.bin along with the csv output. Running the Architecture Checker on the updated model json will no longer show the element wise divide issue on Node_Name_1.

Following are the commands to invoke Architecture Checker with Modifier to display list of modifications:

Sample Command

qnn-architecture-checker --input_json ./model_net.json
                       --bin ./model.bin
                       --output_path ./archCheckerOutput
                       --modify

Sample Command

qnn-architecture-checker --input_json ./model_net.json
                       --bin ./model.bin
                       --output_path ./archCheckerOutput
                       --modify show

Following are the commands to apply the modifications either on all possible modifications or specific rules:

Sample Command

qnn-architecture-checker --input_json ./model_net.json
                       --bin ./model.bin
                       --output_path ./archCheckerOutput
                       --modify all

Sample Command

qnn-architecture-checker --input_json ./model_net.json
                       --bin ./model.bin
                       --output_path ./archCheckerOutput
                       --modify apply=prelu,elwisediv
Note:
The Architecture Checker with modifier is an enchancement to help visualize the changes that can be applied on the model to better fit it on the HTP. To see the actual performance improvements, the model may require retraining/redesigning.

qnn-accuracy-debugger (Beta)

Dependencies

The Accuracy Debugger depends on the setup outlined in Setup. In particular, the following are required:

  1. Platform dependencies are need to be met as per Platform Dependencies

  2. The desired ML frameworks need to be installed. Accuracy debugger is verified to work with the ML framework versions mentioned at Environment Setup

The following environment variables are used inside this guide (User may change the following path depending on their needs):

  1. RESOURCESPATH = {Path to the directory where all models and input files reside}

  2. PROJECTREPOPATH = {Path to your accuracy debugger project directory}

Supported models

The qnn-accuracy-debugger currently supports ONNX, TFLite, and Tensorflow 1.x models. Pytorch models are supported only in oneshot-layerwise debugging algorithm of tool.

Overview

The accuracy-debugger tool finds inaccuracies in a neural-network at the layer level. The tool compares the golden outputs produced by running a model through a specific ML framework (ie. Tensorflow, Onnx, TFlite) with the results produced by running the same model through Qualcomm’s QNN Inference Engine. The inference engine can be run on a variety of computing mediums including GPU, CPU and DSP.

The following features are available in Accuracy Debugger. Each feature can be run with its corresponding option; for example, qnn-accuracy-debugger --{option}.

  1. qnn-accuracy-debugger -–framework_runner This feature uses a ML framework e.g. tensorflow, tflite or onnx, to run the model to get intermediate outputs. Note: The argument –framewok_diagnosis has been replaced by –framework_runner. –framework_diagnosis will be deprecated in the future release.

  2. qnn-accuracy-debugger –-inference_engine This feature uses the QNN engine to run a model to retrieve intermediate outputs.

  3. qnn-accuracy-debugger –-verification This feature compares the output generated by the framework runner and inference engine features using verifiers such as CosineSimilarity, RtolAtol, etc.

  4. qnn-accuracy-debugger –compare_encodings This feature extracts encodings from a given QNN net JSON file, compares them with the given AIMET encodings, and outputs an Excel sheet highlighting mismatches.

  5. qnn-accuracy-debugger –tensor_inspection This feature compares given target outputs with reference outputs.

  6. qnn-accuracy-debugger –quant_checker This feature analyzes the activations, weights, and biases of all the possible quantization options available in the qnn-converters for each subsequent layer of a given model.

Tip:
  • You can use –help after the bin commands to see what other options (required or optional) you can add.

  • If no option is provided, Accuracy Debugger runs framework_runner, inference_engine, and verification sequentially.

Below are the instructons for running the Accuracy Debugger:

Framework Runner

The Framework Runner feature is designed to run models with different machine learning frameworks (e.g. Tensorflow, etc). A selected model is run with a specific ML framework. Golden outputs are produced for future comparison with inference results from the Inference Engine step.

Usage

usage: qnn-accuracy-debugger --framework_runner [-h]
                                   -f FRAMEWORK [FRAMEWORK ...]
                                   -m MODEL_PATH
                                   -i INPUT_TENSOR [INPUT_TENSOR ...]
                                   -o OUTPUT_TENSOR
                                   [-w WORKING_DIR]
                                   [--output_dirname OUTPUT_DIRNAME]
                                   [-v]
                                   [--disable_graph_optimization]
                                   [--onnx_custom_op_lib ONNX_CUSTOM_OP_LIB]
                                   [--add_layer_outputs ADD_LAYER_OUTPUTS]
                                   [--add_layer_types ADD_LAYER_TYPES]
                                   [--skip_layer_types SKIP_LAYER_TYPES]
                                   [--skip_layer_outputs SKIP_LAYER_OUTPUTS]
                                   [--start_layer START_LAYER]
                                   [--end_layer END_LAYER]

Script to generate intermediate tensors from an ML Framework.

optional arguments:
     -h, --help            show this help message and exit

required arguments:
     -f FRAMEWORK [FRAMEWORK ...], --framework FRAMEWORK [FRAMEWORK ...]
                             Framework type and version, version is optional. Currently
                             supported frameworks are ["tensorflow","onnx","tflite"] case
                             insensitive but spelling sensitive
     -m MODEL_PATH, --model_path MODEL_PATH
                             Path to the model file(s).
     -i INPUT_TENSOR [INPUT_TENSOR ...], --input_tensor INPUT_TENSOR [INPUT_TENSOR ...]
                             The name, dimensions, raw data, and optionally data
                             type of the network input tensor(s) specifiedin the
                             format "input_name" comma-separated-dimensions path-
                             to-raw-file, for example: "data" 1,224,224,3 data.raw
                             float32. Note that the quotes should always be
                             included in order to handle special characters,
                             spaces, etc. For multiple inputs specify multiple
                             --input_tensor on the command line like:
                             --input_tensor "data1" 1,224,224,3 data1.raw
                             --input_tensor "data2" 1,50,100,3 data2.raw float32.
     -o OUTPUT_TENSOR, --output_tensor OUTPUT_TENSOR
                             Name of the graph's specified output tensor(s).

     optional arguments:
     -w WORKING_DIR, --working_dir WORKING_DIR
                             Working directory for the framework_runner to store
                             temporary files. Creates a new directory if the
                             specified working directory does not exist
     --output_dirname OUTPUT_DIRNAME
                             output directory name for the framework_runner to
                             store temporary files under
                             <working_dir>/framework_runner. Creates a new
                             directory if the specified working directory does not
                             exist
     -v, --verbose           Verbose printing
     --disable_graph_optimization
                             Disables basic model optimization
     --onnx_custom_op_lib ONNX_CUSTOM_OP_LIB
                             path to onnx custom operator library

     (below options are supported only for onnx and ignored for other frameworks)
     --add_layer_outputs ADD_LAYER_OUTPUTS
                     Output layers to be dumped. example:1579,232
     --add_layer_types ADD_LAYER_TYPES
                           outputs of layer types to be dumped. e.g
                           :Resize,Transpose. All enabled by default.
     --skip_layer_types SKIP_LAYER_TYPES
                           comma delimited layer types to skip snooping. e.g
                           :Resize, Transpose
     --skip_layer_outputs SKIP_LAYER_OUTPUTS
                           comma delimited layer output names to skip debugging.
                           e.g :1171, 1174
     --start_layer START_LAYER
                           save all intermediate layer outputs from provided
                           start layer to bottom layer of model
     --end_layer END_LAYER
                           save all intermediate layer outputs from top layer to
                           provided end layer of model

Please note: All the command line arguments should either be provided through command line or through the config file. They will not override those in the config file if there is overlap.

Sample Commands

qnn-accuracy-debugger \
    --framework_runner \
    --framework tensorflow \
    --model_path $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen.pb \
    --input_tensor "input:0" 1,299,299,3 $RESOURCESPATH/samples/InceptionV3Model/data/chairs.raw \
    --output_tensor InceptionV3/Predictions/Reshape_1:0

qnn-accuracy-debugger \
    --framework_runner \
    --framework onnx \
    --model_path $RESOURCESPATH/samples/dlv3onnx/dlv3plus_mbnet_513-513_op9_mod_basic.onnx \
    --input_tensor Input 1,3,513,513 $RESOURCESPATH/samples/dlv3onnx/data/00000_1_3_513_513.raw \
    --output_tensor Output

To run model with custom operator:
qnn-accuracy-debugger \
    --framework_runner \
    --framework onnx \
    -input_tensor "image" 1,3,640,640 $RESOURCESPATH/models/yolov3/batched-inp-107-0.raw \
    --model_path $RESOURCESPATH/models/yolov3/yolov3_640_640_with_abp_qnms.onnx \
    --output_tensor detection_boxes \
    --onnx_custom_op_lib $RESOURCESPATH/models/libCustomQnmsYoloOrt.so
TIP:
  • a working_directory, if not otherwise specified, is generated from wherever you are calling the script from; it is recommended to call all scripts from the same directory so all your outputs and results are stored under the same directory without having outputs everywhere

  • for tensorflow it is sometimes necessary to add the :0 after the input and output node name to signify the index of the node. Notice the :0 is dropped for onnx models.

Output

The program also creates a directory named latest in working_directory/framework_runner which is symbolically linked to the most recently generated directory. In the example below, latest will have data that is symlinked to the data in the most recent directory YYYY-MM-DD_HH:mm:ss. Users may choose to override the directory name by passing it to –output_dirname (i.e. –output_dirname myTest1Ouput).

The float data produced by the Framework Runner step offers precise reference material for the Verification component to diagnose the accuracy of the network generated by the Inference Engine. Unless a path is otherwise specified, the Accuracy Debugger will create directories within the working_directory/framework_runner directory found in the current working directory. The directories will be named with the date and time of the program’s execution, and contain tensor data. Depending on the tensor naming convention of the model, there may be numerous sub-directories within the new directory. This occurs when tensor names include a slash “/”. For example, for the tensor names ‘inception_3a/1x1/bn/sc’, ‘inception_3a/1x1/bn/sc_internal’ and ‘inception_3a/1x1/bn’, subdirectories will be generated.

../_static/resources/framework_runner.png

The figure above shows a sample output from a framework_runner run. InceptionV3 and Logits contain the outputs of each layer before the last layer. Each output directory contains the .raw files corresponding to each node. Every raw file that can be seen is the output of an operation. The outputs of the final layer are saved inside the Predictions directory. The file framework_runner_options.json contains all the options used to run this feature.

Inference Engine

The Inference Engine feature is designed to find the outputs for a QNN model. The output produced by this step can be compared with the golden outputs produced by the framework runner step.

Usage

usage: qnn-accuracy-debugger --inference_engine [-h]
                                   -p ENGINE_PATH
                                   -l INPUT_LIST
                                   -r {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,htp}
                                   -a {aarch64-android,x86_64-linux-clang,x86_64-windows-msvc,wos}
                                   [--stage {source,converted,compiled}]
                                   [-i INPUT_TENSOR [INPUT_TENSOR ...]]
                                   [-o OUTPUT_TENSOR] [-m MODEL_PATH]
                                   [-f FRAMEWORK [FRAMEWORK ...]]
                                   [-qmcpp QNN_MODEL_CPP_PATH]
                                   [-qmbin QNN_MODEL_BIN_PATH]
                                   [-qmb QNN_MODEL_BINARY_PATH]
                                   [--deviceId DEVICEID] [-v]
                                   [--host_device {x86,x86_64-windows-msvc,wos}] [-w WORKING_DIR]
                                   [--output_dirname OUTPUT_DIRNAME]
                                   [--engine_version ENGINE_VERSION]
                                   [--debug_mode_off]
                                   [--print_version PRINT_VERSION]
                                   [--offline_prepare] [-bbw {8,32}]
                                   [-abw {8,16}]
                                   [--golden_dir_for_mapping GOLDEN_DIR_FOR_MAPPING]
                                   [-wbw {8}] [--lib_name LIB_NAME]
                                   [-bd BINARIES_DIR] [-qmn MODEL_NAME]
                                   [-pq {tf,enhanced,adjusted,symmetric}]
                                   [-qo QUANTIZATION_OVERRIDES]
                                   [--act_quantizer {tf,enhanced,adjusted,symmetric}]
                                   [--act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}]
                                   [--param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}]
                                   [--act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}]
                                   [--param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}]
                                   [--algorithms ALGORITHMS]
                                   [--ignore_encodings]
                                   [--per_channel_quantization]
                                   [-idt {float,native}]
                                   [-odt {float_only,native_only,float_and_native}]
                                   [--profiling_level {basic,detailed}]
                                   [--perf_profile {low_balanced,balanced,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}]
                                   [--log_level {error,warn,info,debug,verbose}]
                                   [--qnn_model_net_json QNN_MODEL_NET_JSON]
                                   [--qnn_netrun_config_file QNN_NETRUN_CONFIG_FILE]
                                   [--extra_converter_args EXTRA_CONVERTER_ARGS]
                                   [--extra_runtime_args EXTRA_RUNTIME_ARGS]
                                   [--compiler_config COMPILER_CONFIG]
                                   [--context_config_params CONTEXT_CONFIG_PARAMS]
                                   [--graph_config_params GRAPH_CONFIG_PARAMS]
                                   [--precision {int8,fp16,fp32}]
                                   [--add_layer_outputs ADD_LAYER_OUTPUTS]
                                   [--add_layer_types ADD_LAYER_TYPES]
                                   [--skip_layer_types SKIP_LAYER_TYPES]
                                   [--skip_layer_outputs SKIP_LAYER_OUTPUTS]



Script to run QNN inference engine.

optional arguments:
     -h, --help            show this help message and exit

Core Arguments:
     --stage {source,converted,compiled}
                             Specifies the starting stage in the Accuracy Debugger
                             pipeline.
                             Source: starting with source framework model [default].
                             Converted: starting with model.cpp and .bin files.
                             Compiled: starting with a model's .so binary.
     -r {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,htp}, --runtime {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,htp}
                             Runtime to be used.
                             Use HTP runtime for emulation on x86 host.
     -a {aarch64-android,x86_64-linux-clang,x86_64-windows-msvc,wos}, --architecture {aarch64-android,x86_64-linux-clang,x86_64-windows-msvc,wos}
                             Name of the architecture to use for inference engine.
     -l INPUT_LIST, --input_list INPUT_LIST
                             Path to the input list text.

Arguments required for SOURCE stage:
     -i INPUT_TENSOR [INPUT_TENSOR ...], --input_tensor INPUT_TENSOR [INPUT_TENSOR ...]
                             The name, dimension, and raw data of the network input
                             tensor(s) specified in the format "input_name" comma-
                             separated-dimensions path-to-raw-file, for example:
                             "data" 1,224,224,3 data.raw. Note that the quotes
                             should always be included in order to handle special
                             characters, spaces, etc. For multiple inputs specify
                             multiple --input_tensor on the command line like:
                             --input_tensor "data1" 1,224,224,3 data1.raw
                             --input_tensor "data2" 1,50,100,3 data2.raw.
     -o OUTPUT_TENSOR, --output_tensor OUTPUT_TENSOR
                             Name of the graph's output tensor(s).
     -m MODEL_PATH, --model_path MODEL_PATH
                             Path to the model file(s).
     -f FRAMEWORK [FRAMEWORK ...], --framework FRAMEWORK [FRAMEWORK ...]
                             Framework type to be used, followed optionally by
                             framework version.

Arguments required for CONVERTED stage:
     -qmcpp QNN_MODEL_CPP_PATH, --qnn_model_cpp_path QNN_MODEL_CPP_PATH
                             Path to the qnn model .cpp file
     -qmbin QNN_MODEL_BIN_PATH, --qnn_model_bin_path QNN_MODEL_BIN_PATH
                             Path to the qnn model .bin file

Arguments required for COMPILED stage:
     -qmb QNN_MODEL_BINARY_PATH, --qnn_model_binary_path QNN_MODEL_BINARY_PATH
                             Path to the qnn model .so binary.

Optional Arguments:
     --deviceId DEVICEID   The serial number of the device to use. If not
                             available, the first in a list of queried devices will
                             be used for validation.
     -v, --verbose         Verbose printing
     --host_device {x86,x86_64-windows-msvc,wos}   The device that will be running conversion. Set to x86
                             by default.
     -w WORKING_DIR, --working_dir WORKING_DIR
                             Working directory for the inference_engine to store
                             temporary files. Creates a new directory if the
                             specified working directory does not exist
     --output_dirname OUTPUT_DIRNAME
                             output directory name for the inference_engine to
                             store temporary files under
                             <working_dir>/inference_engine .Creates a new
                             directory if the specified working directory does not
                             exist
     -p ENGINE_PATH, --engine_path ENGINE_PATH
                             Path to the inference engine.
     --debug_mode_off      Specifies if wish to turn off debug_mode mode.
     --print_version PRINT_VERSION
                             Print the QNN SDK version alongside the output.
     --offline_prepare     Use offline prepare to run qnn model.
     -bbw {8,32}, --bias_bitwidth {8,32}
                             option to select the bitwidth to use when quantizing
                             the bias. default 8
     -abw {8,16}, --act_bitwidth {8,16}
                             option to select the bitwidth to use when quantizing
                             the activations. default 8
     --golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY, --golden_dir_for_mapping GOLDEN_DIR_FOR_MAPPING
                             Optional parameter to indicate the directory of the
                             goldens, it's used for tensor mapping without
                             framework.
     -wbw {8}, --weights_bitwidth {8}
                             option to select the bitwidth to use when quantizing
                             the weights. Only support 8 atm
     -nif, --use_native_input_files
                             Specifies that the input files will be parsed in the
                             data type native to the graph. If not specified, input
                             files will be parsed in floating point.
     -nof, --use_native_output_files
                             Specifies that the output files will be generated in
                             the data type native to the graph. If not specified,
                             output files will be generated in floating point.
     --lib_name LIB_NAME   Name to use for model library (.so file)
     -bd BINARIES_DIR, --binaries_dir BINARIES_DIR
                             Directory to which to save model binaries, if they
                             don't yet exist.
     -mn MODEL_NAME, --model_name MODEL_NAME
                             Name of the desired output qnn model
     -pq {tf,enhanced,adjusted,symmetric}, --param_quantizer {tf,enhanced,adjusted,symmetric}
                             Param quantizer algorithm used.
     -qo QUANTIZATION_OVERRIDES, --quantization_overrides QUANTIZATION_OVERRIDES
                             Path to quantization overrides json file.
     --act_quantizer {tf,enhanced,adjusted,symmetric}
                             Optional parameter to indicate the activation
                             quantizer to use
     --act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}
                       Specify which quantization calibration method to use for activations.
                       This option has to be paired with --act_quantizer_schema.
     --param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}
                       Specify which quantization calibration method to use for parameters.
                       This option has to be paired with --param_quantizer_schema.
     --act_quantizer_schema {asymmetric,symmetric}
                         Specify which quantization schema to use for
                         activations. Can not be used together with
                         act_quantizer. Note: This argument mandates --act_quantizer_calibration to be passed.
     --param_quantizer_schema {asymmetric,symmetric}
                         Specify which quantization schema to use for
                         parameters. Can not be used together with
                         param_quantizer. Note: This argument mandates --param_quantizer_calibration to be passed.
     -fbw {16,32}, --float_bias_bitwidth {16,32}
                             option to select the bitwidth to use when biases are in float; default is 32
     -rqs RESTRICT_QUANTIZATION_STEPS, --restrict_quantization_steps RESTRICT_QUANTIZATION_STEPS
                             ENCODING_MIN, ENCODING_MAX
                             Specifies the number of steps to use to compute quantization encodings such that
                             scale = (max - min) / number of quantization steps.
                             The option should be passed as a space separated pair of hexadecimal string minimum and maximum values,
                             i.e. --restrict_quantization_steps 'MIN MAX'. Note that this is a hexadecimal string
                             literal and not a signed integer. To supply a negative value an explicit minus sign is required.
                             e.g.: 8-bit range: --restrict_quantization_steps '-0x80 0x7F'
                             16-bit range: --restrict_quantization_steps '-0x8000 0x7F7F'
     --algorithms ALGORITHMS
                             Use this option to enable new optimization algorithms.
                             Usage is: --algorithms <algo_name1> ... The available
                             optimization algorithms are: 'cle ' - Cross layer
                             equalization includes a number of methods for
                             equalizing weights and biases across layers in order
                             to rectify imbalances that cause quantization errors.
     --ignore_encodings    Use only quantizer generated encodings, ignoring any
                             user or model provided encodings.
     --per_channel_quantization
                             Use per-channel quantization for convolution-based op
                             weights.
     -idt {float,native}, --input_data_type {float,native}
                             the input data type, must match with the supplied
                             inputs
     -odt {float_only,native_only,float_and_native}, --output_data_type {float_only,native_only,float_and_native}
                             the desired output data type
     --profiling_level {basic,detailed,backend}
                             Enables profiling and sets its level.
     --perf_profile {low_balanced,balanced,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}
     --log_level {error,warn,info,debug,verbose}
                             Enable verbose logging.
     --qnn_model_net_json QNN_MODEL_NET_JSON
                             Path to the qnn model net json. Only necessary if its being run from the converted stage. It has information about what structure the data is in within framework_runner and inference_engine steps.
                             This file is required to generate model_graph_struct.json file which is good to have in the verification step.
     --qnn_netrun_config_file QNN_NETRUN_CONFIG_FILE
                             allow backend_extention features to be applied during
                             qnn-net-run
     --extra_converter_args EXTRA_CONVERTER_ARGS
                             additional convereter arguments in a string. example:
                             --extra_converter_args input_dtype=data
                             float;input_layout=data1 NCHW
     --extra_contextbin_args EXTRA_CONTEXTBIN_ARGS
                             additional context binary generator arguments in a quoted string.
                             example: --extra_contextbin_args 'arg1=value1;arg2=value2'
     --extra_runtime_args EXTRA_RUNTIME_ARGS
                             additional convereter arguments in a quoted string.
                             example: --extra_runtime_args
                             profiling_level=basic;log_level=debug
     --compiler_config COMPILER_CONFIG
                             Path to the compiler config file.
     --context_config_params CONTEXT_CONFIG_PARAMS
                             optional context config params in a quoted string.
                             example: --context_config_params 'context_priority=high; cache_compatibility_mode=strict'
     --graph_config_params GRAPH_CONFIG_PARAMS
                             optional graph config params in a quoted string.
                             example: --graph_config_params 'graph_priority=low; graph_profiling_num_executions=10'
     --precision {int8,fp16,fp32}
                             Choose the precision. Default is int8.
                             Note: This option is not applicable when --stage is set to converted or compiled.
     --add_layer_outputs ADD_LAYER_OUTPUTS
                             Output layers to be dumped, e.g., 1579,232
     --add_layer_types ADD_LAYER_TYPES
                             Outputs of layer types to be dumped, e.g., Resize, Transpose; all enabled by default
     --skip_layer_types SKIP_LAYER_TYPES
                             Comma delimited layer types to skip snooping, e.g., Resize, Transpose
     --skip_layer_outputs SKIP_LAYER_OUTPUTS
                             Comma delimited layer output names to skip debugging, e.g., 1171, 1174



Please note: All the command line arguments should either be provided through command line or through the config file. They will not override those in the config file if there is overlap.

The inference engine config file can be found in {accuracy_debugger tool root directory}/python/qti/aisw/accuracy_debugger/lib/inference_engine/configs/config_files and is a JSON file. This config file stores information that helps the inference engine determine which tool and parameters to read in.

Sample Command

qnn-accuracy-debugger \
    --inference_engine \
    --framework tensorflow \
    --runtime dspv73 \
    --model_path $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen.pb \
    --input_tensor "input:0" 1,299,299,3 $RESOURCESPATH/samples/InceptionV3Model/data/chairs.raw \
    --output_tensor InceptionV3/Predictions/Reshape_1 \
    --architecture x86_64-linux-clang \
    --input_list $RESOURCESPATH/samples/InceptionV3Model/data/image_list.txt \
    --verbose

Sample Command

qnn-accuracy-debugger \
    --inference_engine \
    --framework tensorflow \
    --runtime dspv73 \
    --host_device wos \
    --model_path <RESOURCESPATH>\InceptionV3Model\inception_v3_2016_08_28_frozen.pb \
    --input_tensor "input:0" 1,299,299,3 <RESOURCESPATH>\samples\InceptionV3Model\data\chairs.raw \
    --output_tensor InceptionV3\Predictions\Reshape_1 \
    --architecture wos \
    --input_list <RESOURCESPATH>\samples\InceptionV3Model\data\image_list.txt \
    --verbose

Sample Command

qnn-accuracy-debugger \
    --inference_engine \
    --framework tensorflow \
    --runtime cpu \
    --host_device x86_64-windows-msvc \
    --model_path <RESOURCESPATH>\InceptionV3Model\inception_v3_2016_08_28_frozen.pb \
    --input_tensor "input:0" 1,299,299,3 <RESOURCESPATH>\samples\InceptionV3Model\data\chairs.raw \
    --output_tensor InceptionV3\Predictions\Reshape_1 \
    --architecture x86_64-windows-msvc \
    --input_list <RESOURCESPATH>\samples\InceptionV3Model\data\image_list.txt \
    --verbose

Sample Command

qnn-accuracy-debugger \
    --inference_engine \
    --deviceId 357415c4 \
    --framework tensorflow \
    --runtime dspv73 \
    --architecture aarch64-android \
    --model_path $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen.pb \
    --input_tensor "input:0" 1,299,299,3 $RESOURCESPATH/samples/InceptionV3Model/data/chairs.raw \
    --output_tensor InceptionV3/Predictions/Reshape_1 \
    --input_list $RESOURCESPATH/samples/InceptionV3Model/data/image_list.txt \
    --verbose
Tip:
  • for runtime (choose from ‘cpu’, ‘gpu’, ‘dsp’, ‘dspv65’, ‘dspv66’, ‘dspv68’, ‘dspv69’, ‘dspv73’, ‘htp’). Make sure the runtime is 73 for kailua, 69 for waipio, etc. Choose HTP runtime for emulation on x86 host.

  • the input_tensor (–i) and output_tensor (-o) does not need the :0 indexing like when runing tensorflow framework runner

  • two files, namely tensor_mapping.json and qnn_model_graph_struct.json are generated to be used in verification, be sure to locate these 2 files in the working_directory/inference_engine/latest

  • Before running the qnn-accuracy-debugger on a Windows x86 system/Windows on Snapdragon system, ensure that you have configured the environment. And, Specify the host and target machine as x86_64-windows-msvc/wos respectively.

  • Note that qnn-accuracy-debugger on Windows x86 system is tested only for CPU runtime currently.

More example commands running from different stages:

Sample Command

source file stage: same as example from above section (stage default is "source")

running from converted stage (x86):
qnn-accuracy-debugger \
    --inference_engine \
    --stage converted \
    -qmcpp $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen_qnn_model.cpp \
    -qmbin $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen_qnn_model.bin \
    --runtime dspv73 \
    --architecture x86_64-linux-clang \
    --input_list $RESOURCESPATH/samples/InceptionV3Model/data/image_list.txt \
    --qnn_model_net_json $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen_qnn_model_net.json \
    --verbose \
    --framework tensorflow \
    --golden_output_reference_directory $RESOURCESPATH/samples/InceptionV3Model/golden_from_framework_runner/

Android Devices (ie. MTP):
qnn-accuracy-debugger \
    --inference_engine \
    --stage converted \
    -qmcpp $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen_qnn_model.cpp \
    -qmbin $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen_qnn_model.bin \
    --deviceId f366ce60 \
    --runtime dspv73 \
    --architecture aarch64-android \
    --input_list $RESOURCESPATH/samples/InceptionV3Model/data/image_list.txt \
    --qnn_model_net_json $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen_qnn_model_net.json \
    --verbose \
    --framework tensorflow \
    --golden_output_reference_directory $RESOURCESPATH/samples/InceptionV3Model/golden_from_framework_runner/


running in compiled stage (x86):

qnn-accuracy-debugger \
    --inference_engine \
    --stage compiled \
    --qnn_model_binary $RESOURCESPATH/samples/InceptionV3Model/qnn_model_binaries/x86_64-linux-clang/libqnn_model.so \
    --runtime dspv73 \
    --architecture x86_64-linux-clang \
    --input_list $RESOURCESPATH/samples/InceptionV3Model/data/image_list.txt \
    --verbose \
    --qnn_model_net_json $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen_qnn_model_net.json \
    --golden_output_reference_directory $RESOURCESPATH/samples/InceptionV3Model/golden_from_framework_runner/

running in compiled stage (wos):

qnn-accuracy-debugger \
    --inference_engine \
    --stage compiled \
    --qnn_model_binary <RESOURCESPATH>\samples\InceptionV3Model\qnn_model_binaries\x86_64-linux-clang\libqnn_model.so \
    --runtime dspv73 \
    --architecture wos \
    --input_list <RESOURCESPATH>\samples\InceptionV3Model\data\image_list.txt \
    --verbose \
    --qnn_model_net_json <RESOURCESPATH>\samples\InceptionV3Model\inception_v3_2016_08_28_frozen_qnn_model_net.json \
    --golden_output_reference_directory <RESOURCESPATH>\samples\InceptionV3Model\golden_from_framework_runner\

Android devices (ie MTP):
qnn-accuracy-debugger \
    --inference_engine \
    --stage compiled \
    --qnn_model_binary $RESOURCESPATH/samples/InceptionV3Model/qnn_model_binaries/aarch64-android/libqnn_model.so \
    --runtime dspv73 \
    --architecture aarch64-android \
    --input_list $RESOURCESPATH/samples/InceptionV3Model/data/image_list.txt \
    --verbose \
    --qnn_model_net_json $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen_qnn_model_net.json \
    --framework tensorflow \
    --golden_output_reference_directory $RESOURCESPATH/samples/InceptionV3Model/golden_from_framework_runner/

To run onnx model with custom operator:
qnn-accuracy-debugger \
    --inference_engine \
    --framework onnx \
    --runtime dspv75
    --architecture aarch64_android \
    --model_path $RESOURCESPATH/AISW-77095/model.onnx \
    --input_tensor "image" 1,3,640,1794 $RESOURCESPATH/inputs/image.raw \
    --output_tensor uncertainty_jacobian_bb \
    --input_list $RESOURCESPATH/input_list.txt \
    --default_verifier mse \
    --engine QNN \
    --engine_path $QNN_SDK_ROOT \
    --extra_converter_args 'op_package_config=$RESOURCESPATH/CustomPreTopKOpPackageCPU_v2.xml;op_package_lib=$RESOURCESPATH/libCustomPreTopKOpPackageHtp.so:CustomPreTopKOpPackageHtpInterfaceProvider:' \
    --extra_contextbin_args 'op_packages=$RESOURCESPATH/libQnnCustomPreTopKOpPackageHtp.so:CustomPreTopKOpPackageHtpInterfaceProvider:' \
    --extra_runtime_args 'op_packages=$RESOURCESPATH/AISW-77095/libQnnCustomPreTopKOpPackageHtp_v75.so:CustomPreTopKOpPackageHtpInterfaceProvider' \
    --debug_mode_off \
    --offline_prepare \
    --verbose
Tip:
  • The qnn_model_net_json file is not required to run this step. However, it is needed to build the qnn_model_graph_struct.json, which can be used in the Verification step. The model_net.json file is generated when the original model is converted into a converted model. Hence if you are debugging this model from the converted model stage, it is recommended to ask for this model_net.json file.

  • framework and golden_dir_for_mapping, or just golden_dir_for_mapping itself is an alternative to the original model to be provided to generate the tensor_mapping.json. However, providing only the golden_dir_for_mapping, the get_tensor_mapping module will try its best to map but it is not guaranteed this mapping will be 100% accurate.

Output

Once the inference engine has finished running, it will store the output in the specified directory (or the current working directory by default) and store the files in that directory. By default, it will store the output in working_directory/inference_engine in the current working directory.

../_static/resources/inference_engine.png

The figure above shows the sample output from one of the runs of inference engine step. The output directory contains raw files. Each raw file is an output of an operation in the network. The model.bin and model.cpp files are created by the model converter. The qnn_model_binaries directory contains the .so file that is generated by the modellibgenerator utility. The file image_list.txt contains the path for sample test images. The inference_engine_options.json file contains all the options with which this run was launched. In addition to generating the .raw files, the inference_engine also generates the model’s graph structure in a .json file. The name of the file is the same as the name of the protobuf model file. The model_graph_struct.json aids in providing structure related information of the converted model graph during the verification step. Specifically, it helps with organizing the nodes in order (for i.e. the beginning nodes should come earlier than ending nodes). The model_net.json has information about what structure the data is in within the framework_runner and inference_engine steps (data can be in different formats for e.g. channels first vs channels last). The verification step uses this information so that data can be properly transposed and compared. It is an optional parameter which can be provided during inference engine step for generating the model_graph_struct.json file (mandated only when running inference engine from the converted stage). Finally, the tensor_mapping file contains a mapping of the various intermediate output file names generated from the framework runner step and the inference engine step.

../_static/resources/inference_engine_2.png

The created .raw files are organized in the same manner as framework_runner (see above).

Verification

The Verification step compares the output (from the intermediate tensors of a given model) produced by the framework runner step with the output produced by the inference engine step. Once the comparison is complete, the verification results are compiled and displayed visually in a format that can be easily interpreted by the user.

There are different types of verifiers for e.g.: CosineSimilarity, RtolAtol, etc. To see available verifiers please use the –help option (qnn-accuracy-debugger –verification –help). Each verifier compares the Framework Runner and Inference Engine output using an error metric. It also prepares reports and/or visualizations to help the user analyze the network’s error data.

Usage

usage: qnn-accuracy-debugger --verification [-h]
                              --default_verifier DEFAULT_VERIFIER
                              [DEFAULT_VERIFIER ...]
                              --golden_output_reference_directory
                              GOLDEN_OUTPUT_REFERENCE_DIRECTORY
                              --inference_results INFERENCE_RESULTS
                              [--tensor_mapping TENSOR_MAPPING]
                              [--qnn_model_json_path QNN_MODEL_JSON_PATH]
                              [--dlc_path DLC_PATH]
                              [--verifier_config VERIFIER_CONFIG]
                              [--graph_struct GRAPH_STRUCT] [-v]
                              [-w WORKING_DIR]
                              [--output_dirname OUTPUT_DIRNAME]
                              [--args_config ARGS_CONFIG]
                              [--target_encodings TARGET_ENCODINGS]
                              [-e ENGINE [ENGINE ...]]

Script to run verification.

required arguments:
  --default_verifier DEFAULT_VERIFIER [DEFAULT_VERIFIER ...]
                        Default verifier used for verification. The options
                        "RtolAtol", "AdjustedRtolAtol", "TopK", "L1Error",
                        "CosineSimilarity", "MSE", "MAE", "SQNR", "ScaledDiff"
                        are supported. An optional list of hyperparameters can
                        be appended. For example: --default_verifier
                        rtolatol,rtolmargin,0.01,atolmargin,0.01 An optional
                        list of placeholders can be appended. For example:
                        --default_verifier CosineSimilarity param1 1 param2 2.
                        to use multiple verifiers, add additional
                        --default_verifier CosineSimilarity
  --golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY, --framework_results GOLDEN_OUTPUT_REFERENCE_DIRECTORY
                        Path to root directory of golden output files. Paths
                        may be absolute, or relative to the working directory.
  --inference_results INFERENCE_RESULTS
                        Path to root directory generated from inference engine
                        diagnosis. Paths may be absolute, or relative to the
                        working directory.

optional arguments:
  --tensor_mapping TENSOR_MAPPING
                        Path to the file describing the tensor name mapping
                        between inference and golden tensors.
  --qnn_model_json_path QNN_MODEL_JSON_PATH
                        Path to the qnn model net json, used for transforming
                        axis of golden outputs w.r.t to qnn outputs. Note:
                        Applicable only for QNN
  --dlc_path DLC_PATH   Path to the dlc file, used for transforming axis of
                        golden outputs w.r.t to target outputs. Note:
                        Applicable for QAIRT/SNPE
  --verifier_config VERIFIER_CONFIG
                        Path to the verifiers' config file
  --graph_struct GRAPH_STRUCT
                        Path to the inference graph structure .json file. This
                        file aids in providing structure related information
                        of the converted model graph during this stage.Note:
                        This file is mandatory when using ScaledDiff verifier
  -v, --verbose         Verbose printing
  -w WORKING_DIR, --working_dir WORKING_DIR
                        Working directory for the verification to store
                        temporary files. Creates a new directory if the
                        specified working directory does not exist
  --output_dirname OUTPUT_DIRNAME
                        output directory name for the verification to store
                        temporary files under <working_dir>/verification.
                        Creates a new directory if the specified working
                        directory does not exist
  --args_config ARGS_CONFIG
                        Path to a config file with arguments. This can be used
                        to feed arguments to the AccuracyDebugger as an
                        alternative to supplying them on the command line.
  --target_encodings TARGET_ENCODINGS
                        Path to target encodings json file.

Arguments for generating Tensor mapping (required when --tensor_mapping is not specified):
  -e ENGINE [ENGINE ...], --engine ENGINE [ENGINE ...]
                        Name of engine(qnn/snpe) that is used for running
                        inference.

 Please note: All the command line arguments should either be provided through command line or through the config file. They will not override those in the config file if there is overlap.

The main verification process run using qnn-accuracy-debugger –verification optionally uses –tensor_mapping and –graph_struct to find files to compare. These files are generated by the inference engine step, and should be supplied to verification for best results. By default they are named tensor_mapping.json and {model name}_graph_struct.json, and can be found in the output directory of the inference engine results.

Sample Command

Compare output of framework runner with inference engine:

qnn-accuracy-debugger \
     --verification \
     --default_verifier CosineSimilarity param1 1 param2 2 \
     --default_verifier SQNR param1 5 param2 1 \
     --golden_output_reference_directory $PROJECTREPOPATH/working_directory/framework_runner/2022-10-31_17-07-58/ \
     --inference_results $PROJECTREPOPATH/working_directory/inference_engine/latest/output/Result_0/ \
     --tensor_mapping $PROJECTREPOPATH/working_directory/inference_engine/latest/tensor_mapping.json \
     --graph_struct $PROJECTREPOPATH/working_directory/inference_engine/latest/qnn_model_graph_struct.json \
     --qnn_model_json_path $PROJECTREPOPATH/working_directory/inference_engine/latest/qnn_model_net.json
Tip:
  • If you passed multiple images in the image_list.txt from run inference engine diagnosis, you’ll receive multiple output/Result_x, choose result that matches the input you used for framework runner for comparison (ie. in framework you used chair.raw and inference chair.raw was the first item in the image_list.txt then choose output/Result_0, if chair.raw was the second item in image_list.txt, then choose output/Result_1).

  • It is recommended to always supply ‘graph_struct’ and ‘tensor_mapping’ to the command as it is used to line up the report and find the corresponding files for comparison. if tensor_mapping did not get generated from previous steps, you can supplement with ‘model_path’, ‘engine’, ‘framework’ to have module generate ‘tensor_mapping’ during runtime.

  • You can also compare inference_engine outputs to inference_engine outputs by passing the /output of the inference_engine output to the ‘framework_results’. If you want the outputs to be exact-name-matching, then you do not need to provide a tensor_mapping file.

  • Note that if you need to generate a tensor mapping instead of providing a path to prexisting tensor mapping file. You can provide the ‘model_path’ option.

Verifier uses two optional config files. The first file is used to set parameters for specific verifiers, as well as which tensors to use these verifiers on. The second file is used to map tensor names from framework_runner to the inference_engine, since certain tensors generated by framework_runner may have different names than tensors generated by inference_engine.

Verifier Config:

The verifier config file is a JSON file that tells verification which verifiers (asides from the default verifier) to use and with which parameters and on what specific tensors. If no config file is provided, the tool will only use the default verifier specified from the command line, with its default parameters, on all the tensors. The JSON file is keyed by verifier names, with each verifier as its own dictionary keyed by “parameters” and “tensors”.

Config File

```json
{
    "MeanIOU": {
        "parameters": {
            "background_classification": 1.0
        },
        "tensors": [["Postprocessor/BatchMultiClassNonMaxSuppression_boxes", "detection_classes:0"]]
    },
    "TopK": {
        "parameters": {
            "k": 5,
            "ordered": false
        },
        "tensors": [["Reshape_1:0"], ["detection_classes:0"]]
    }
}
```

Note that the “tensors” field is a list of lists. This is done because specific verifiers runs on two tensor at a time. Hence the two tensors are placed in a list. Otherwise if a verifier only runs on one tensor, it will have a list of lists with only one tensor name in each list. MeanIOU is not supported as verifer in Debugger.

Tensor Mapping:

Tensor mapping is a JSON file keyed by inference tensor names, of framework tensor names. If the tensor mapping is not provided, the tool will assume inference and golden tensor names are identical.

Tensor Mapping File

```json
{
    "Postprocessor/BatchMultiClassNonMaxSuppression_boxes": "detection_boxes:0",
    "Postprocessor/BatchMultiClassNonMaxSuppression_scores": "detection_scores:0"
}
```

Output

Verification’s output is divided into different verifiers. For example, if both RtolAtol and TopK verifiers are used, there will be two sub-directories named “RtolAtol” and “TopK”. Availble verifiers can be found by issuing –help option.

../_static/resources/verification_2.png

Under each sub-directory, the verification analysis for each tensor is organized similar to how framework_runner (see above) and inference_engine are organized. For each tensor, a CSV and HTML file is generated. In addition to the tensor-specific analysis, the tool also generates a summary CSV and HTML file which summarizes the data from all verifiers and their subsequent tensors. The following figure shows how a sample summary generated in the verification step looks. Each row in this summary corresponds to one tensor name that is identified by the framework runner and inference engine steps. The final column shows cosinesimilarity score which can vary between 0 to 1 (this range might be different for other verifiers). Higher scores denote similarity while lower scores indicate variance. The developer can then further investigate those specific tensor details. Developer should inspect tensors from top-to-bottom order, meaning if a tensor is broken at an earlier node, anything that was generated post that node is unreliable until that node is properly fixed.

../_static/resources/verification_results.png

Compare Encodings

The Compare Encodings feature is designed to compare QNN and AIMET encodings. This feature takes QNN model net and AIMET encoding JSON files as inputs. This feature executes in the following order.

  1. Extracts encodings from the given QNN model net JSON.

  2. Compares extracted QNN encodings with given AIMET encodings.

  3. Writes results to an Excel file that highlights mismatches.

  4. Throws warnings if some encodings are present in QNN but not in AIMET and vice-versa.

  5. Writes the extracted QNN encodings JSON file (for reference).

Usage

usage: qnn-accuracy-debugger --compare_encodings [-h]
                             --input INPUT
                             --aimet_encodings_json AIMET_ENCODINGS_JSON
                             [--precision PRECISION]
                             [--params_only]
                             [--activations_only]
                             [--specific_node SPECIFIC_NODE]
                             [--working_dir WORKING_DIR]
                             [--output_dirname OUTPUT_DIRNAME]
                             [-v]

Script to compare QNN encodings with AIMET encodings

optional arguments:
  -h, --help            Show this help message and exit

required arguments:
  --input INPUT
                        Path to QNN model net JSON file
  --aimet_encodings_json AIMET_ENCODINGS_JSON
                        Path to AIMET encodings JSON file

optional arguments:
  --precision PRECISION
                        Number of decimal places up to which comparison will be done (default: 17)
  --params_only         Compare only parameters in the encodings
  --activations_only    Compare only activations in the encodings
  --specific_node SPECIFIC_NODE
                        Display encoding differences for the given node
  --working_dir WORKING_DIR
                        Working directory for the compare_encodings to store temporary files.
                        Creates a new directory if the specified working directory does not exist.
  --output_dirname OUTPUT_DIRNAME
                        Output directory name for the compare_encodings to store temporary files
                        under <working_dir>/compare_encodings. Creates a new directory if the
                        specified working directory does not exist.
  -v, --verbose         Verbose printing

Sample Commands

# Compare both params and activations
qnn-accuracy-debugger \
    --compare_encodings \
    --input QNN_model_net.json \
    --aimet_encodings_json aimet_encodings.json

# Compare only params
qnn-accuracy-debugger \
    --compare_encodings \
    --input QNN_model_net.json \
    --aimet_encodings_json aimet_encodings.json \
    --params_only

# Compare only activations
qnn-accuracy-debugger \
    --compare_encodings \
    --input QNN_model_net.json \
    --aimet_encodings_json aimet_encodings.json \
    --activations_only

# Compare only a specific encoding
qnn-accuracy-debugger \
    --compare_encodings \
    --input QNN_model_net.json \
    --aimet_encodings_json aimet_encodings.json \
    --specific_node _2_22_Conv_output_0

Tip

A working_directory is generated from wherever this script is called from unless otherwise specified.

Output

The program creates a directory named latest in working_directory/compare_encodings which is symbolically linked to the most recently generated directory. In the example below, latest will have data that is symlinked to the data in the most recent directory YYYY-MM-DD_HH:mm:ss. Users may choose to override the directory name by passing it to –output_dirname, e.g., –output_dirname myTest.

../_static/resources/compare_encodings.png

The figure above shows a sample output from a compare_encodings run. The following details what each file contains.

  • compare_encodings_options.json contains all the options used to run this feature

  • encodings_diff.xlsx contains comparison results with mismatches highlighted

  • log.txt contains log statements for the run

  • extracted_encodings.json contains extracted QNN encodings

Tensor inspection

Tensor inspection compares given reference output and target output tensors and dumps various statistics to represent differences between them.

The Tensor inspection feature can:

  1. Plot histograms for golden and target tensors

  2. Plot a graph indicating deviation between golden and target tensors

  3. Plot a cumulative distribution graph (CDF) for golden vs target tensors

  4. Plot a density (KDE) graph for target tensor highlighting target min/max and calibrated min/max values

  5. Create a CSV file containing information about: target min/max; calibrated min/max; golden output min/max; target/calibrated min/max differences; and computed metrics (verifiers).

Note

Only data with matching target/golden filenames is inspected; other data is ignored.
This feature expects the golden and target tensors to have the same dimensions, datatypes, and layouts.
Calibrated min/max values are extracted from a user provided encodings file. If an encodings file is not provided, density plot will be skipped and also the CSV summary output will not include calibrated min/max information.

Usage

usage: qnn-accuracy-debugger --tensor_inspection [-h]
                        --golden_data GOLDEN_DATA
                        --target_data TARGET_DATA
                        --verifier VERIFIER [VERIFIER ...]
                        [-w WORKING_DIR]
                        [--data_type {int8,uint8,int16,uint16,float32}]
                        [--target_encodings TARGET_ENCODINGS]
                        [-v]

Script to inspection tensor.

required arguments:
  --golden_data GOLDEN_DATA
                        Path to golden/framework outputs folder. Paths may be absolute or
                        relative to the working directory.
  --target_data TARGET_DATA
                        Path to target outputs folder. Paths may be absolute or relative to the
                        working directory.
  --verifier VERIFIER [VERIFIER ...]
                        Verifier used for verification. The options "RtolAtol",
                        "AdjustedRtolAtol", "TopK", "L1Error", "CosineSimilarity", "MSE", "MAE",
                        "SQNR", "ScaledDiff" are supported.
                        An optional list of hyperparameters can be appended, for example:
                        --verifier rtolatol,rtolmargin,0.01,atolmargin,0,01.
                        To use multiple verifiers, add additional --verifier CosineSimilarity

optional arguments:
  -w WORKING_DIR, --working_dir WORKING_DIR
                        Working directory to save results. Creates a new directory if the
                        specified working directory does not exist
  --data_type {int8,uint8,int16,uint16,float32}
                        DataType of the output tensor.
  --target_encodings TARGET_ENCODINGS
                        Path to target encodings json file.
  -v, --verbose         Verbose printing

Sample Commands

# Basic run
qnn-accuracy-debugger --tensor_inspection \
    --golden_data golden_tensors_dir \
    --target_data target_tensors_dir \
    --verifier sqnr

# Pass target encodings file and enable multiple verifiers
qnn-accuracy-debugger --tensor_inspection \
    --golden_data golden_tensors_dir \
    --target_data target_tensors_dir \
    --verifier mse \
    --verifier sqnr \
    --verifier rtolatol,rtolmargin,0.01,atolmargin,0.01 \
    --target_encodings qnn_encoding.json

Tip

A working_directory is generated from wherever this script is called from unless otherwise specified.

../_static/resources/tensor_inspection.png

The figure above shows a sample output from a Tensor inspection run. The following details what each file contains.

  • Each tensor will have its own directory; the directory name matches the tensor name.

    • CDF_plots.html – Golden vs target CDF graph

    • Diff_plots.html – Golden and target deviation graph

    • Distribution_min-max.png – Density plot for target tensor highlighting target vs calibrated min/max values

    • Histograms.html – Golden and target histograms

    • golden_data.csv – Golden tensor data

    • target_data.csv – Target tensor data

  • log.txt – Log statements from the entire run

  • summary.csv – Target min/max, calibrated min/max, golden output min/max, target vs calibrated min/max differences, and verifier outputs

Histogram Plots

  1. Comparison: We compare histograms for both the golden data and the target data.

  2. Overlay: To enhance clarity, we overlay the histograms bin by bin.

  3. Binned Ranges: Each bin represents a value range, showing the frequency of occurrence.

  4. Visual Insight: Overlapping histograms reveal differences or similarities between the datasets.

  5. Interactive: Hover over histograms to get tensor range and frequencies for the dataset.

Cumulative Distribution Function (CDF) Plots

  1. Overview: CDF plots display the cumulative probability distribution.

  2. Overlay: We superimpose CDF plots for golden and target data.

  3. Percentiles: These plots illustrate data distribution across different percentiles.

  4. Hover Details: Exact cumulative probabilities are available on hover.

Tensor Difference Plots

  1. Inspection: We generate plots highlighting differences between golden and target data tensors.

  2. Scatter and Line: Scatter plots represent tensor values, while line plots show differences at each index.

  3. Interactive: Hover over points to access precise values.

Run QNN Accuracy Debugger E2E

This feature is designed to run the framework runner, inference engine, and verification features sequentially with a single command to debug the model. The following debugging algorithms are available.

  1. Oneshot-layerwise(default):
    • This algorithm is designed to debug all layers of model at a time by performing below steps
      • Execute framework runner to collect reference outputs in fp32

      • Execute inference engine to collect backend outputs in provided target precision.

      • Execute verification for comparison of intermediate outputs from the above 2 steps

      • Execute tensor inspection (when –enable_tensor_inspection is passed) to dump various plots, e.g., scatter, line, CDF, etc., for intermediate outputs

    • It provides quick analysis to identify layers of model causing accuracy deviation.

    • User can chose cumulative-layerwise(below) for deeper analysis of accuracy deviation.

  2. Cumulative-layerwise:
    • This algorithm is designed to debug one layer at a time by performing below steps
      • Execute framework runner to collect reference outputs from all layers of model in fp32.

      • Execute inference engine and verification in iterative manner to perform below operations
        • to collect backend outputs in target precision for each layer while removing the effect of its preceeding layers on final output.

        • to compare intermediate outputs from framework runner and inference engine

    • It provides deeper analysis to identify all layers of model causing accuracy deviation.

    • Currently this option supports only onnx models.

  3. Layerwise:
    • This algorithm is designed to debug a single layer model at a time by performing the following steps
      • Get golden reference per layer outputs from an external tool or, if a golden reference is not given, run framework runner to collect intermediate layer outputs.

      • Iteratively execute inference engine and verification to:
        • Collect backend outputs in target precision for each single layer model by removing the preceding and following layers

        • Compare intermediate output from golden reference with inference engine single layer model output

    • Layerwise snooping provides deeper analysis to identify all model layers causing accuracy deviation on hardware with respect to framework/simulation outputs.

    • Layerwise snooping only supports ONNX models.

Usage

usage: qnn-accuracy-debugger [--framework_runner] [--inference_engine] [--verification] [-h]

Script that runs Framework Runner, Inference Engine or Verification.

Arguments to select which component of the tool to run.  Arguments are mutually exclusive (at
most 1 can be selected).  If none are selected, then all components are run:
--framework_runner Run framework
--inference_engine    Run inference engine
--verification        Run verification

optional arguments:
-h, --help              Show this help message. To show help for any of the components, run
                        script with --help and --<component>. For example, to show the help
                        for Framework Runner, run script with the following: --help
                        --framework_runner

usage: qnn-accuracy-debugger [-h] -f FRAMEWORK [FRAMEWORK ...] -m MODEL_PATH -i INPUT_TENSOR
                            [INPUT_TENSOR ...] -o OUTPUT_TENSOR -r RUNTIME -a
                            {aarch64-android,x86_64-linux-clang,aarch64-android-clang6.0}
                            -l INPUT_LIST --default_verifier DEFAULT_VERIFIER
                            [DEFAULT_VERIFIER ...] [-v] [-w WORKING_DIR]
                            [--output_dirname OUTPUT_DIRNAME]
                            [--deep_analyzer {modelDissectionAnalyzer}]
                            [--debugging_algorithm {layerwise,cumulative-layerwise,oneshot-layerwise}]

Options for running the Accuracy Debugger components

optional arguments:
-h, --help            show this help message and exit

Arguments required by both Framework Runner and Inference Engine:
-f FRAMEWORK [FRAMEWORK ...], --framework FRAMEWORK [FRAMEWORK ...]
                        Framework type and version, version is optional. Currently supported
                        frameworks are [tensorflow, tflite, onnx]. For example, tensorflow
                        2.3.0
-m MODEL_PATH, --model_path MODEL_PATH
                        Path to the model file(s).
-i INPUT_TENSOR [INPUT_TENSOR ...], --input_tensor INPUT_TENSOR [INPUT_TENSOR ...]
                        The name, dimensions, raw data, and optionally data type of the
                        network input tensor(s) specifiedin the format "input_name" comma-
                        separated-dimensions path-to-raw-file, for example: "data"
                        1,224,224,3 data.raw float32. Note that the quotes should always be
                        included in order to handle special characters, spaces, etc. For
                        multiple inputs specify multiple --input_tensor on the command line
                        like: --input_tensor "data1" 1,224,224,3 data1.raw --input_tensor
                        "data2" 1,50,100,3 data2.raw float32.
-o OUTPUT_TENSOR, --output_tensor OUTPUT_TENSOR
                        Name of the graph's specified output tensor(s).

Arguments required by Inference Engine:
-r RUNTIME, --runtime RUNTIME
                        Runtime to be used for inference.
-a {aarch64-android,x86_64-linux-clang,aarch64-android-clang6.0}, --architecture {aarch64-an
droid,x86_64-linux-clang,aarch64-android-clang6.0}
                        Name of the architecture to use for inference engine.
-l INPUT_LIST, --input_list INPUT_LIST
                        Path to the input list text.
Arguments required by Verification:                                                    [3/467]
--default_verifier DEFAULT_VERIFIER [DEFAULT_VERIFIER ...]
                        Default verifier used for verification. The options "RtolAtol",
                        "AdjustedRtolAtol", "TopK", "L1Error", "CosineSimilarity", "MSE",
                        "MAE", "SQNR", "ScaledDiff" are supported. An optional
                        list of hyperparameters can be appended. For example:
                        --default_verifier rtolatol,rtolmargin,0.01,atolmargin,0,01. An
                        optional list of placeholders can be appended. For example:
                        --default_verifier CosineSimilarity param1 1 param2 2. to use
                        multiple verifiers, add additional --default_verifier
                        CosineSimilarity

optional arguments:
-v, --verbose           Verbose printing
-w WORKING_DIR, --working_dir WORKING_DIR
                        Working directory for the wrapper to store temporary files. Creates
                        a new directory if the specified working directory does not exitst.
--output_dirname OUTPUT_DIRNAME
                        output directory name for the wrapper to store temporary files under
                        <working_dir>/wrapper. Creates a new directory if the specified
                        working directory does not exist
--deep_analyzer {modelDissectionAnalyzer}
                        Deep Analyzer to perform deep analysis
--golden_output_reference_directory
                        Optional parameter to indicate the directory of the golden reference outputs.
                        When this option is provided, the framework runner is stage skipped.
                        In inference stage, it's used for tensor mapping without a framework.
                        In verification stage, it's used as a reference to compare
                        outputs produced in the inference engine stage.
--enable_tensor_inspection
                        Plots graphs (line, scatter, CDF etc.) for each
                        layer's output. Additionally, summary sheet will have
                        more details like golden min/max, target min/max etc.,
--debugging_algorithm {layerwise,cumulative-layerwise,oneshot-layerwise}
                        Performs model debugging layerwise, cumulative-layerwise or in oneshot-
                        layerwise based on choice.

--step_size
                        Number of layers to skip in each iteration of debugging.
                        Applicable only for cumulative-layerwise algorithm.
                        --step_size (> 1) should not be used along with --add_layer_outputs,
                        --add_layer_types, --skip_layer_outputs, skip_layer_types,
                        --start_layer, --end_layer
(below options are ignored for framework_runner component incase of layerwise and cumulative-layerwise runs)
--add_layer_outputs ADD_LAYER_OUTPUTS
                        Output layers to be dumped, e.g., 1579,232
--add_layer_types ADD_LAYER_TYPES
                        Outputs of layer types to be dumped, e.g., Resize, Transpose; all enabled by default
--skip_layer_types SKIP_LAYER_TYPES
                        Comma delimited layer types to skip snooping, e.g., Resize, Transpose
--skip_layer_outputs SKIP_LAYER_OUTPUTS
                        Comma delimited layer output names to skip debugging, e.g., 1171, 1174
--start_layer START_LAYER
                        Extracts the given model from mentioned start layer
                        output name
--end_layer END_LAYER
                        Extracts the given model from mentioned end layer
                        output name
Note : --start_layer and --end_layer options are allowed only for Layerwise and Cumulative layerwise run

Sample Command for oneshot-layerwise

Command for Oneshot-layerwise using DSP backend:

qnn-accuracy-debugger \
    --framework tensorflow \
    --runtime dspv73 \
    --model_path $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen.pb \
    --input_tensor "input:0" 1,299,299,3 $PATHTOGOLDENI/samples/InceptionV3Model/data/chairs.raw \
    --output_tensor InceptionV3/Predictions/Reshape_1:0 \
    --architecture x86_64-linux-clang \
    --debugging_algorithm oneshot-layerwise
    --input_list $RESOURCESPATH/samples/InceptionV3Model/data/image_list.txt \
    --default_verifier CosineSimilarity \
    --enable_tensor_inspection \
    --verbose
Command for Oneshot-layerwise using HTP emulation on x86 host:

qnn-accuracy-debugger \
    --framework onnx \
    --runtime htp \
    --model_path /local/mnt/workspace/models/vit/vit_base_16_224.onnx \
    --input_tensor "input.1" 1,3,224,224 /local/mnt/workspace/models/vit/000000039769_1_3_224_224.raw \
    --output_tensor 1597 \
    --architecture x86_64-linux-clang \
    --input_list /local/mnt/workspace/models/vit/list.txt \
    --default_verifier CosineSimilarity \
    --offline_prepare \
    --debugging_algorithm oneshot-layerwise
    --enable_tensor_inspection \
    --verbose

Note

The –enable_tensor_inspection argument significantly increases overall execution time when used with large models. To speed up execution, omit this argument.

Output

The program creates framework_runner, inference_engine, verification, and wrapper output directories as below:

../_static/resources/oneshot-layerwise.png
  • framework_runner – Contains a timestamped directory that contains the intermediate layer outputs (framework) stored in .raw format as described in the framework runner step.

  • inference_engine – Contains a timestamped directory that contains the intermediate layer outputs (inference engine) stored in .raw format as described in the inference engine step.

  • verification directory – Contains a timestamped directory that contains the following:

    • A directory for each verifier specified while running oneshot; it contains CSV and HTML files with metric details for each layer output

    • tensor_inspection – Individual directories for each layer’s output with the following contents:

      • CDF_plots.png – Golden vs target CDF graph

      • Diff_plots.png – Golden and target deviation graph

      • Histograms.png – Golden and target histograms

      • golden_data.csv – Golden tensor data

      • target_data.csv – Target tensor data

    • summary.csv – Report for verification results of each layers output

  • Wrapper directory containing log.txt with the entire log for the run.

Note: Except wrapper directory all other directories will have a folder called latest which is a symlink to the latest run’s corresponding timestamped directory.

Snapshot of summary.csv file:

../_static/resources/oneshot_summary.png

Understanding the oneshot-layerwise report:

Column

Description

Name

Output name of the current layer

Layer Type

Type of the current layer

Size

Size of this layer’s output

Tensor_dims

Shape of this layer’s output

<Verifier name>

Verifier value of the current layer output compared to reference output

golden_min

minimum value in the reference output for current layer

golden_max

maximum value in the reference output for current layer

target_min

minimum value in the target output for current layer

target_max

maximum value in the target output for current layer

Sample Command for cumulative-layerwise

Command for Cumulative-layerwise using DSP backend:

qnn-accuracy-debugger \
    --framework onnx \
    --runtime dspv73 \
    --model_path /local/mnt/workspace/models/vit/vit_base_16_224.onnx \
    --input_tensor "input.1" 1,3,224,224 /local/mnt/workspace/models/vit/000000039769_1_3_224_224.raw \
    --output_tensor 1597 \
    --architecture x86_64-linux-clang \
    --input_list /local/mnt/workspace/models/vit/list.txt \
    --default_verifier CosineSimilarity \
    --offline_prepare \
    --debugging_algorithm cumulative-layerwise
    --engine QNN \
    --verbose
Command for Cumulative-layerwise using HTP emulation on x86 host:

qnn-accuracy-debugger \
    --framework onnx \
    --runtime htp \
    --model_path /local/mnt/workspace/models/vit/vit_base_16_224.onnx \
    --input_tensor "input.1" 1,3,224,224 /local/mnt/workspace/models/vit/000000039769_1_3_224_224.raw \
    --output_tensor 1597 \
    --architecture x86_64-linux-clang \
    --input_list /local/mnt/workspace/models/vit/list.txt \
    --default_verifier CosineSimilarity \
    --offline_prepare \
    --debugging_algorithm cumulative-layerwise
    --engine QNN \
    --verbose

Output

The program creates output directories framework_runner, cumulative_layerwise_snooping and wrapper directories as below

../_static/resources/cumulative_layerwise_work_dir.png
  • framework_runner directory contains timestamped directory that contains the intermediate layers outputs stored in .raw format just as mentioned in Framework Runner step.

  • cumulative_layerwise_snooping directory contains intemediate outputs obtained from inference engine step stored in separate directories with respective layer names. Also it contains final report named cumulative_layerwise.csv which contains verifier scores for each layer. User can identify layers with most deviating scores as problematic nodes.

  • Wrapper directory consists a log.txt where user can refer entire logs for the whole run.

../_static/resources/cumulative_layerwise_report.png

Understanding the cumulative-layerwise report

At the end of cumulative-layerwise run, the tool generates .csv with below information for each layer

Column

Description

O/P Name

Output name of the current layer.

Status

If empty, indicates normal execution.Other possible values:
  • skip - This layer was not debugged as requested by the user.

  • part - Due to the mismatch at this layer, the model was partitioned after this layer

  • err_part - error occured while partitioning model at that layer.

  • err_con - coverter error occurred at this layer.

  • err_lib - lib-generator error occurred at this layer.

  • err_cntx - context-bin-generator error occurred at this layer.

  • err-exec - Failed to execute the compiled model at this layer.

  • err-compare - Failed to compare the backend output of this layer with reference.

Layer Type

Type of the current layer.

Shape

Shape of this layer’s output.

Activations

The Min, Max and Median of the outputs at this layer taken from reference execution.

<Verifier name>

Absolute verifier value of the current layer compared to reference platform.

Orig outputs

Displays the original outputs verifier score observed when the model was run with the current
layer output enabled starting from the last partitioned layer.

Info

Displays information for the output verifiers, if the values are abnormal.

Command for Layerwise:

qnn-accuracy-debugger \
    --framework onnx \
    --runtime dspv73 \
    --model_path /local/mnt/workspace/models/vit/vit_base_16_224.onnx \
    --input_tensor "input.1" 1,3,224,224 /local/mnt/workspace/models/vit/000000039769_1_3_224_224.raw \
    --output_tensor 1597 \
    --architecture x86_64-linux-clang \
    --input_list /local/mnt/workspace/models/vit/list.txt \
    --default_verifier CosineSimilarity \
    --offline_prepare \
    --debugging_algorithm layerwise \
    --quantization_overrides /local/mnt/workspace/layer_output_dump/vit_base_16_224.encodings \
    --engine QNN \
    --verbose

Output

The program creates layerwise_snooping and wrapper output directories as well as framework_runner if a golden reference is not provided (like described for cumulative-layerwise).

  • layerwise_snooping directory – Contains each single layer model outputs obtained from the inference engine stage stored in separate directories and the final report named layerwise.csv which contains verifier scores for each layer model. Users can identify layers with the most deviating scores as problematic nodes.

  • wrapper directory – Contains log.txt which stores the full logs for the run.

  • The output .csv is similar to the cumulative-layerwise output, but the original outputs column will not be present in layerwise snooping, since we are not dealing with final outputs of the model.

Debugging Accuracy issue with Quantized model using Cumulative Layerwise Snooping

  • With quantized models, it is expected to have some mismatch at most data intensive layers - arising due to quantization error.

  • The debugger can be used to identify operators which are most sensitive with high verifier score and run those at higher precision to improve overall accuracy.

  • The sensitivity is determined by the verifier score seen at that layer regarding the reference platform (like ONNXRT).

  • Note that Cumulative-layerwise debugging takes considerable time as the partitioned model shall be quantized and compiled at every layer that does not have a 100% match with reference.

  • Below is one strategy to debug larger models:

    • Run Oneshot-layerwise on the model which helps to identify the starting point of sensitivity in the model.

    • Run Cumulative-layerwise at different parts of the model using start-layer and end-layer options (if the model has 100 nodes, use start layer at starting node from Oneshot-layerwise run and end layer at the 25th node for run 1, start layer at 26th and end layer at 50th node for run 2, start layer at 51st node and end layer at 75th node for run 3 .. and so on).The final reports of all runs help to identify the most sensitive layers in the model. Let’s say node A,B,C have high verifier scores which indicates high sensitivity

      • Run the original model with those specific layers (A/B/C - one at a time or combinations) in FP16 and observe the improvement in accuracy.

Debugging Accuracy issue for models exhibiting Accuracy discrepancy between golden reference (for ex. - AIMET/framework runtime output) vs target output using Layerwise Snooping

  • One of the popular usecase for layerwise snooping is debugging accuracy difference between AIMET vs target
    • Though we are creating an exact simulation of hardware using tools like AIMET, still it is expected to have a very minute mismatch due to environment differences. This can be because simulation executes on GPU FP32 kernels and is simulating noise rather than actual execution on integer kernels in the case of hardware execution.

    • If we have a higher deviation between simulation and hardware, then layerwise snooping could be used to point out to the nodes having higher deviations. The nodes showing higher deviation as per layerwise.csv can be identified as the erroneous nodes.

  • Other usecases include debugging Framework runtime’s FP32 output vs target INT16 output deviations.

Binary Snooping

The binary snooping tool debugs the given ONNX graph in a binary search fashion.

For the graph under analysis, it quantizes half of the graph and lets the other half run in fp16/32. The final model output is used to calculate the subgraph quantization effect. If the subgraph has a high effect(verifier scores greater than 60% of sum of the two subgraphs scores) on the final model output due to quantization, the process repeats until the subgraph size is less than the min_graph_size or the subgraph cannot be divided again. If both subgraphs have similar scores(verifier scores greater than 40% of sum of the two subgraphs scores), both subgraphs are investigated further.

usage

usage: qnn-accuracy-debugger --binary_snooping \
                           -m MODEL_PATH \
                           -l INPUT_LIST \
                           -i INPUT_TENSOR \
                           -f FRAMEWORK \
                           -o OUTPUT_TENSOR \
                           -e ENGINE_NAME \
                           -qo QUANTIZATION_OVERRIDES \
                           [--verifier VERIFIER] \
                           [-a {x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}] \
                           [--host_device {x86,x86_64-windows-msvc,wos}] \
                           [-r {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic,htp}] \
                           [--deviceId DEVICEID] \
                           [--golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY] \
                           [--bias_bitwidth BIAS_BITWIDTH] \
                           [--use_per_channel_quantization USE_PER_CHANNEL_QUANTIZATION] \
                           [--weights_bitwidth WEIGHTS_BITWIDTH] \
                           [--act_bitwidth {8,16}] [-fbw {16,32}] \
                           [-rqs RESTRICT_QUANTIZATION_STEPS] \
                           [-w WORKING_DIR] \
                           [--output_dirname OUTPUT_DIRNAME] \
                           [-p ENGINE_PATH] \
                           [--min_graph_size MIN_GRAPH_SIZE] \
                           [--extra_converter_args EXTRA_CONVERTER_ARGS] \
                           [--act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}] \
                           [--param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}] \
                           [--act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}] \
                           [--param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}] \
                           [--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE] \
                           [--param_quantizer {tf,enhanced,adjusted,symmetric}] \
                           [--act_quantizer {tf,enhanced,adjusted,symmetric}] \
                           [--per_channel_quantization] \
                           [--algorithms ALGORITHMS] \
                           [--verifier_config VERIFIER_CONFIG] \
                           [--start_layer START_LAYER] \
                           [--end_layer END_LAYER] [--precision {int8,fp16}] \
                           [--compiler_config COMPILER_CONFIG] \
                           [--ignore_encodings] \
                           [--extra_runtime_args EXTRA_RUNTIME_ARGS] \
                           [--add_layer_outputs ADD_LAYER_OUTPUTS] \
                           [--add_layer_types ADD_LAYER_TYPES] \
                           [--skip_layer_types SKIP_LAYER_TYPES] \
                           [--skip_layer_outputs SKIP_LAYER_OUTPUTS] \
                           [--remote_server REMOTE_SERVER] \
                           [--remote_username REMOTE_USERNAME] \
                           [--remote_password REMOTE_PASSWORD] [-nif] [-nof]

Sample Commands

Sample command to run binary snooping on mv2 large model

qnn-accuracy-debugger\
  --binary_snooping\
  --framework onnx\
  --model_path models/mv2/mobilenet-v2.onnx\
  --architecture aarch64-android\
  --input_list models/mv2/inputs/input_list_1.txt\
  --input_tensor "input.1" 1,3,224,224 /local/mnt/workspace/harsraj/models/mv2/inputs/data1.raw\
  --output_tensor "473"\
  --engine_path $QNN_SDK_ROOT\
  --working_dir   working_directory/QNN/BINARY_MV2_DSP\
  --runtime dspv75\
  --engine QNN\
  --verifier mse\
  --extra_converter_args "float_bitwidth=32;preserve_io=layout"\
  --quantization_overrides /local/mnt/workspace/harsraj/models/mv2/quantized_encoding.json\
  --min_graph_size 16

Outputs The algorithm provides two JSON files:

  1. graph_result.json (for each subgraph) - Contains verifier scores for two child subgraphs; for example 318_473 has child subgraphs 318_392 and 393_473.

  2. subgraph_result.json (for each subgraph) - Contains the corresponding and sorted verifier scores.

Keys in both files look like “subgraph_start_node_activation_name” + _ + “subgraph_end_node_activation_name”.

For example, 318_473 means a subgraph starts at node activation 318 and ends at node activation 473. Only the subgraph from 318 to 473 is quantized while the rest of the model runs in fp16/32.

Debugging accuracy issues with binary snooping results Subgraphs with maximum verifier scores in subgraph_result.json are the culprit subgraphs.

One subgraph can be a subset of another subgraph. In this case prioritize a subgraph size you are comfortable debugging. The details of a subset can be found in graph_result.json.

Quantization Checker

The quantization checker analyzes activations, weights, and biases of a given model. It provides:

  1. Comparision between quantized and unquantized weights and biases

  2. Analysis on unquantized weights, biases, and activations

  3. Results in csv, html, or plots

  4. Problematic weights and biases for a given bitwidth quantization

Usage

usage: qnn-accuracy-debugger --quant_checker [-h] \
                            --model_path \
                            --input_tensor \
                            --config_file \
                            --framework \
                            --input_list \
                            --output_tensor \
                            [--engine_path] \
                            [--working_dir] \
                            [--quantization_overrides] \
                            [--extra_converter_args] \
                            [--bias_width] \
                            [--weights_width] \
                            [--host_device] \
                            [--deviceId] \
                            [--generate_csv] \
                            [--generate_plots] \
                            [--per_channel_plots] \
                            [--golden_output_reference_directory] \
                            [--output_dirname]
                            [--verbose]

Sample quant_checker_config_file

{
“WEIGHT_COMPARISON_ALGORITHMS”: [

{“algo_name”:”minmax”,”threshold”:”10”}, {“algo_name”:”maxdiff”, “threshold”:”10”}, {“algo_name”:”sqnr”, “threshold”:”26”}, {“algo_name”:”stats”, “threshold”:”2”}, {“algo_name”:”data_range_analyzer”}, {“algo_name”:”data_distribution_analyzer”, “threshold”:”0.6”}

], “BIAS_COMPARISON_ALGORITHMS”: [

{“algo_name”:”minmax”, “threshold”:”10”}, {“algo_name”:”maxdiff”, “threshold”:”10”}, {“algo_name”:”sqnr”, “threshold”:”26”}, {“algo_name”:”stats”, “threshold”:”2”}, {“algo_name”:”data_range_analyzer”}, {“algo_name”:”data_distribution_analyzer”, “threshold”:”0.6”}

], “ACT_COMPARISON_ALGORITHMS”: [

{“algo_name”:”minmax”, “threshold”:”10”}, {“algo_name”:”data_range_analyzer”}

], “INPUT_DATA_ANALYSIS_ALGORITHMS”: [{“algo_name”:”stats”, “threshold”:”2”}], “QUANTIZATION_ALGORITHMS”: [“cle”, “None”], “QUANTIZATION_VARIATIONS”: [“tf”, “enhanced”, “symmetric”, “asymmetric”]

}

Output

Output are available in the <working-directory>/results, which looks like: .. container:

.. figure:: /../_static/resources/quant_checker_acc_debug_output_dir_struct.png

Results are provided in:

  1. HTML

  2. CSV

  3. Histogram

A log is provided in the <working-directory>/quant_checker directory.

HTML Each HTML file contains a summary of the results for each quantization option and for each input file provided.

The following example provides additional guidance on the contents of the HTML files.

../_static/resources/qnn_quantatization_checker_html_sample.png

CSV Results Files

Each CSV file contains detailed computation results for a specific node type (activation/weight/bias) and quantization option. Each row in the csv file displays the op name, node name, passes accuracy (True/False), computation result (accuracy differences), threshold used for each algorithm, and the algorithm name. The format of the computation results (accuracy differences) differs according to the algorithms/metrics used.

The following table provides additional notes about the different algorithms and information in each csv row. .. list-table:

:header-rows: 1
:widths: auto

* - Field
  - Comparator
  - Information
  - Example

* - minmax
  - Indicates the difference between the unquantized
  minimum and the dequantized minimum value. Correspondingly,
  indicates the same difference for the maximum unquantized and
  dequantized value.
  - computation result: "min: #VALUE max: #VALUE"
* - maxdiff
  - Calculates the absolute difference between the
  unquantized and dequantized data for all data points and
  displays the maximum value of the result.
  - computation result: "#VALUE"
* - sqnr
  - Calculates the signal to quantization noise ratio
  between the two tensors of unquantized and dequantized
  data.
  - computation result: "#VALUE"
* - data_range_analyzer
  - Calculates the difference between the
  maximum and minimum values in a tensor and compares that to
  the maximum value supported by the bit-width used to
  determine if the range of values can be reasonably
  represented by the selected quantization bit width.
  - computation result: "unique dec places: #INT_VALUE
  data range : #VALUE". Information in the computation results
  field includes how many unique decimal places we need to
  express the unquantized data in quantized format and what is
  the actual data range.
* - data_distribution_analyzer
  - Calculates the clustering of
  the data to find whether a large number of unique unquantized
  values are quantized to the same value or not.
  - computation result: "Distribution of pixels above
  threshold: #VALUE"
* - stats
  - Calculates some basic statistics on the received
  data such as the min, max, median, variance, standard
  deviation, the mode and the skew. The skew is used to
  indicate how symmetric the data is.
  - computation result: skew: #VALUE min: #VALUE max:
  #VALUE median: #VALUE variance: #VALUE stdDev: #VALUE mode:
  #VALUE

The following CSV example shows weight data for one of the quantization options.

../_static/resources/qnn_quantatization_checker_csv_weights.png

Separate .csv files are available for activations, weights and biases for each quantization option. The activation related results also include analysis for each input file provided.

Histogram

For each quantization variation and for each weight and bias tensor in the model, we generate historagm. a histogram is generated for each quantization variation and for each weight and bias tensor in the model. The following example illustrates the generated histograms.

../_static/resources/quant_checker_hist.png
align

left

Logs

The log files contain the following information.

  • The commands executed as part of the script’s run, including different runs of the snpe-converter tool with different quantization options

  • Analysis failures for activations, weights, and biases

The following example shows a sample log output.

<====ACTIVATIONS ANALYSIS FAILURES====>

<====ACTIVATIONS ANALYSIS FAILURES====>

Results for the enhanced quantization: | Op Name | Activation Node | Passes Accuracy | Accuracy Difference | Threshold Used | Algorithm Used | | conv_tanh_comp1_conv0 | ReLU_6919 | False | minabs_diff: 0.59 maxabs_diff: 17.16 | 0.05 | minmax |

where,

  1. Op Name : Op name as expressed in corresponding qnn artifacts

  2. Activation Node : Activation node name in the operation

  3. Passes Accuracy : True if the quantized activation (or weight or bias) meets threshold when compared with values from float32 graph; false otherwise

  4. Accuracy Difference : Details about the accuracy per the algorithm used

  5. Threshold Used : The threshold used to influence the result of “Passes Accuracy” column

  6. Algorithm Used : Metric used to compare actual quantized activations/weights/biases against unquantized float data or analyze the quality of unquantized float data. Metrics can be minmax, maxdiff, sqnr, stats, data_range_analyzer, data_distribution_analyzer.

qairt-accuracy-debugger (Beta)

Dependencies

The Accuracy Debugger depends on the setup outlined in Setup. In particular, the following are required:

  1. Platform dependencies are need to be met as per Platform Dependencies

  2. The desired ML frameworks need to be installed. Accuracy debugger is verified to work with the ML framework versions mentioned at Environment Setup

Supported models

The qairt-accuracy-debugger currently supports ONNX, Tensorflow and TFLite models. Note that Pytorch models support is limited to oneshot snooping feature of this tool.

Overview

The Accuracy Debugger tool finds inaccuracies in a neural-network at the layer level. Primarily functionality of this tool is to compare the golden outputs produced by running a model through a specific ML framework (i.e., Tensorflow, Onnx, TFlite) with the results produced by running the same model on Target devices (CPU, GPU, DSP etc.,).

The following features are available in Accuracy Debugger. Each feature can be run with its corresponding option; for example, qairt-accuracy-debugger --{option}.

  1. qairt-accuracy-debugger -–framework_runner uses an ML framework e.g. Tensorflow, TFlite or Onnx, to run the model to get intermediate outputs.

  2. qairt-accuracy-debugger –-inference_engine uses inference engine to run a model on the target device to retrieve intermediate outputs.

  3. qairt-accuracy-debugger –-verification compares the output generated by the framework runner and inference engine features using verifiers such as CosineSimilarity, RtolAtol, etc.

  4. qairt-accuracy-debugger –compare_encodings compares target encodings with the AIMET encodings, and outputs an Excel sheet highlighting mismatches.

  5. qairt-accuracy-debugger –tensor_inspection compares given target outputs with golden outputs.

  6. qairt-accuracy-debugger –snooping runs chosen snooping algorithm to investigate accuracy issues.

Tip:
  • You can use –help with feature name to see what other options (required or optional) you can add.

Below are the instructions for running various features available in Accuracy Debugger:

Framework Runner

The Framework Runner feature is designed to run models with different machine learning frameworks (e.g. Tensorflow, Onnx, TFLite etc). A given model is run with a specific ML framework. Golden outputs are produced for future comparison with inference results from the Inference Engine step.

Usage

 usage: qairt-accuracy-debugger --framework_runner [-h]
                             -m MODEL_PATH -i INPUT_TENSOR
                             [INPUT_TENSOR ...] -o OUTPUT_TENSOR
                             [-w WORKING_DIR]
                             [--output_dirname OUTPUT_DIRNAME]
                             [--args_config ARGS_CONFIG] [-v]
                             [--disable_graph_optimization]
                             [--onnx_custom_op_lib ONNX_CUSTOM_OP_LIB]
                             [--add_layer_outputs ADD_LAYER_OUTPUTS]
                             [--add_layer_types ADD_LAYER_TYPES]
                             [--skip_layer_types SKIP_LAYER_TYPES]
                             [--skip_layer_outputs SKIP_LAYER_OUTPUTS]
                             [--start_layer START_LAYER]
                             [--end_layer END_LAYER]
                             [-f FRAMEWORK [FRAMEWORK ...]]

 Script to generate intermediate tensors from an ML Framework.

 options:
 -h, --help            show this help message and exit

 required arguments:
 -m MODEL_PATH, --model_path MODEL_PATH
                         Path to the model file(s).
 -i INPUT_TENSOR [INPUT_TENSOR ...], --input_tensor INPUT_TENSOR [INPUT_TENSOR ...]
                         The name, dimensions, raw data, and optionally data
                         type of the network input tensor(s) specifiedin the
                         format "input_name" comma-separated-dimensions path-
                         to-raw-file, for example: "data" 1,224,224,3 data.raw
                         float32. Note that the quotes should always be
                         included in order to handle special characters,
                         spaces, etc. For multiple inputs specify multiple
                         --input_tensor on the command line like:
                         --input_tensor "data1" 1,224,224,3 data1.raw
                         --input_tensor "data2" 1,50,100,3 data2.raw float32.
 -o OUTPUT_TENSOR, --output_tensor OUTPUT_TENSOR
                         Name of the graph's specified output tensor(s).

 optional arguments:
 -w WORKING_DIR, --working_dir WORKING_DIR
                         Working directory for the framework_runner to store
                         temporary files. Creates a new directory if the
                         specified working directory does not exist
 --output_dirname OUTPUT_DIRNAME
                         output directory name for the framework_runner to
                         store temporary files under
                         <working_dir>/framework_runner. Creates a new
                         directory if the specified working directory does not
                         exist
 --args_config ARGS_CONFIG
                         Path to a config file with arguments. This can be used
                         to feed arguments to the AccuracyDebugger as an
                         alternative to supplying them on the command line.
 -v, --verbose         Verbose printing
 --disable_graph_optimization
                         Disables basic model optimization
 --onnx_custom_op_lib ONNX_CUSTOM_OP_LIB
                         path to onnx custom operator library
 --add_layer_outputs ADD_LAYER_OUTPUTS
                         Output layers to be dumped. example:1579,232
 --add_layer_types ADD_LAYER_TYPES
                         outputs of layer types to be dumped. e.g
                         :Resize,Transpose. All enabled by default.
 --skip_layer_types SKIP_LAYER_TYPES
                         comma delimited layer types to skip snooping. e.g
                         :Resize, Transpose
 --skip_layer_outputs SKIP_LAYER_OUTPUTS
                         comma delimited layer output names to skip debugging.
                         e.g :1171, 1174
 --start_layer START_LAYER
                         save all intermediate layer outputs from provided
                         start layer to bottom layer of model
 --end_layer END_LAYER
                         save all intermediate layer outputs from top layer to
                         provided end layer of model
 -f FRAMEWORK [FRAMEWORK ...], --framework FRAMEWORK [FRAMEWORK ...]
                         Framework type and version, version is optional.
                         Currently supported frameworks are [tensorflow,
                         tflite, onnx, pytorch]. For example, tensorflow 2.10.1

Please note: All the command line arguments should either be provided through command line or through the config file. They will not override those in the config file if there is an overlap.

Sample Commands

# Tensorflow model example:
qairt-accuracy-debugger \
    --framework_runner \
    --framework tensorflow \
    --model_path InceptionV3Model/inception_v3_2016_08_28_frozen.pb \
    --input_tensor "input:0" 1,299,299,3 InceptionV3Model/data/chairs.raw \
    --output_tensor InceptionV3/Predictions/Reshape_1:0

# Onnx model example:
qairt-accuracy-debugger \
    --framework_runner \
    --framework onnx \
    --model_path dlv3onnx/dlv3plus_mbnet_513-513_op9_mod_basic.onnx \
    --input_tensor Input 1,3,513,513 dlv3onnx/data/00000_1_3_513_513.raw \
    --output_tensor Output

# Example to run model with custom operator:
qairt-accuracy-debugger \
    --framework_runner \
    --framework onnx \
    -input_tensor "image" 1,3,640,640 yolov3/batched-inp-107-0.raw \
    --model_path yolov3/yolov3_640_640_with_abp_qnms.onnx \
    --output_tensor detection_boxes \
    --onnx_custom_op_lib libCustomQnmsYoloOrt.so
TIP:
  • a working_directory, if not otherwise specified, is generated from wherever you are calling the script

  • for tensorflow it is sometimes necessary to add the “:0” after the input and output node name to signify the index of the node. Note that “:0” is not required for Onnx models.

Outputs

Once the Framework Runner has finished running, it will store the outputs in the specified working directory. By default, it will store the output in working_directory/framework_runner in the current working directory. Creates a directory named latest in working_directory/framework_runner which is symbolically linked to the most recent run YYYY-MM-DD_HH:mm:ss. Users may choose to override the directory name by passing it to –output_dirname (i.e. –output_dirname myTest1). The following figure shows a sample output folder from a Framework Runner run using an Onnx model.

../_static/resources/qairt_framework_runner.png

working_directory/framework_runner/latest contains the outputs of each layer in the model saved as .raw files. Every raw file that can be seen is the output of an operation in the model. The file framework_runner_options.json contains all the options used to run this feature.

The intermediate outputs produced by the Framework Runner step offers precise reference/golden material for the Verification component to diagnose the accuracy of the network outputs generated by the Inference Engine.

Inference Engine

The Inference Engine feature is designed to dump intermediate outputs of the model when run on target devices like CPU, DSP, GPU etc.,. The output produced by this step can be compared with the golden outputs produced by the framework runner step.

Usage

 usage: qairt-accuracy-debugger --inference_engine [-h]
                               -r {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic}
                               -a
                               {x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}
                               -l INPUT_LIST [--input_network INPUT_NETWORK]
                               [--desired_input_shape DESIRED_INPUT_SHAPE [DESIRED_INPUT_SHAPE ...]]
                               [--out_tensor_node OUT_TENSOR_NODE]
                               [--io_config IO_CONFIG]
                               [-qo QUANTIZATION_OVERRIDES]
                               [--converter_float_bitwidth {32,16}]
                               [--extra_converter_args EXTRA_CONVERTER_ARGS]
                               [--input_dlc INPUT_DLC]
                               [--calibration_input_list CALIBRATION_INPUT_LIST]
                               [-bbw {8,32}] [-abw {8,16}] [-wbw {8,4}]
                               [--quantizer_float_bitwidth {32,16}]
                               [--act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}]
                               [--param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}]
                               [--act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}]
                               [--param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}]
                               [--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE]
                               [--use_per_channel_quantization]
                               [--use_per_row_quantization] [--float_fallback]
                               [--extra_quantizer_args EXTRA_QUANTIZER_ARGS]
                               [--perf_profile {low_balanced,balanced,default,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}]
                               [--profiling_level PROFILING_LEVEL]
                               [--userlogs {warn,verbose,info,error,fatal}]
                               [--log_level {error,warn,info,debug,verbose}]
                               [--extra_runtime_args EXTRA_RUNTIME_ARGS]
                               [--executor_type {qnn,snpe}]
                               [--stage {source,converted,quantized}]
                               [-p ENGINE_PATH] [--deviceId DEVICEID] [-v]
                               [--host_device {x86,x86_64-windows-msvc,wos}]
                               [-w WORKING_DIR]
                               [--output_dirname OUTPUT_DIRNAME]
                               [--debug_mode_off] [--args_config ARGS_CONFIG]
                               [--remote_server REMOTE_SERVER]
                               [--remote_username REMOTE_USERNAME]
                               [--remote_password REMOTE_PASSWORD]
                               [--golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY]
                               [--disable_offline_prepare]
                               [--backend_extension_config BACKEND_EXTENSION_CONFIG]
                               [--context_config_params CONTEXT_CONFIG_PARAMS]
                               [--graph_config_params GRAPH_CONFIG_PARAMS]
                               [--extra_contextbin_args EXTRA_CONTEXTBIN_ARGS]
                               [--start_layer START_LAYER]
                               [--end_layer END_LAYER]
                               [--add_layer_outputs ADD_LAYER_OUTPUTS]
                               [--add_layer_types ADD_LAYER_TYPES]
                               [--skip_layer_types SKIP_LAYER_TYPES]
                               [--skip_layer_outputs SKIP_LAYER_OUTPUTS]

 Script to run inference engine.

 options:
   -h, --help            show this help message and exit

 Required Arguments:
   -r {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic}, --runtime {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic}
                         Runtime to be used. Note: In case of SNPE
                         execution(--executor_type snpe), aic runtime is not
                         supported.
   -a {x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}, --architecture {x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}
                         Name of the architecture to use for inference engine.
                         Note: In case of SNPE execution(--executor_type snpe),
                         aarch64-qnx architecture is not supported.
   -l INPUT_LIST, --input_list INPUT_LIST
                         Path to the input list text file to run inference(used
                         with net-run). Note: When having multiple entries in
                         text file, in order to save memory and time, you can
                         pass --debug_mode_off to skip intermediate outputs
                         dump.

 QAIRT Converter Arguments:
   --input_network INPUT_NETWORK, --model_path INPUT_NETWORK
                         Path to the model file(s). This argument is mandatory
                         when --stage is source(which is default).
   --desired_input_shape DESIRED_INPUT_SHAPE [DESIRED_INPUT_SHAPE ...], --input_tensor DESIRED_INPUT_SHAPE [DESIRED_INPUT_SHAPE ...]
                         The name and dimension of all the input buffers to the
                         network specified in the format [input_name comma-
                         separated-dimensions sample-data data-type] Note:
                         sample-data and data-type are optional for example:
                         'data' 1,224,224,3. Note that the quotes should always
                         be included in order to handle special characters,
                         spaces, etc. For multiple inputs, specify multiple
                         --desired_input_shape on the command line like:
                         --desired_input_shape "data1" 1,224,224,3 sample1.raw
                         float32 --desired_input_shape "data2" 1,50,100,3
                         sample2.raw int64 NOTE: Required for TensorFlow and
                         PyTorch. Optional for Onnx and Tflite. In case of
                         Onnx, this feature works only with Onnx 1.6.0 and
                         above.
   --out_tensor_node OUT_TENSOR_NODE, --output_tensor OUT_TENSOR_NODE
                         Name of the graph's output Tensor Names. Multiple
                         output names should be provided separately like:
                         --out_tensor_node out_1 --out_tensor_node out_2 NOTE:
                         Required for TensorFlow. Optional for Onnx, Tflite and
                         PyTorch
   --io_config IO_CONFIG
                         Use this option to specify a yaml file for input and
                         output options.
   -qo QUANTIZATION_OVERRIDES, --quantization_overrides QUANTIZATION_OVERRIDES
                         Path to quantization overrides json file.
   --converter_float_bitwidth {32,16}
                         Use this option to convert the graph to the specified
                         float bitwidth, either 32 (default) or 16. Note:
                         Cannot be used with --calibration_input_list and
                         --quantization_overrides
   --extra_converter_args EXTRA_CONVERTER_ARGS
                         additional converter arguments in a quoted string.
                         example: --extra_converter_args
                         'arg1=value1;arg2=value2'

 QAIRT Quantizer Arguments:
   --input_dlc INPUT_DLC
                         Path to the dlc container containing the model for
                         which fixed-point encoding metadata should be
                         generated. This argument is mandatory when --stage is
                         either converted or quantized.
   --calibration_input_list CALIBRATION_INPUT_LIST
                         Path to the inputs list text file to run
                         quantization(used with qairt-quantizer)
   -bbw {8,32}, --bias_bitwidth {8,32}
                         option to select the bitwidth to use when quantizing
                         the bias. default 8
   -abw {8,16}, --act_bitwidth {8,16}
                         option to select the bitwidth to use when quantizing
                         the activations. default 8
   -wbw {8,4}, --weights_bitwidth {8,4}
                         option to select the bitwidth to use when quantizing
                         the weights. default 8
   --quantizer_float_bitwidth {32,16}
                         Use this option to select the bitwidth to use for
                         float tensors, either 32 (default) or 16.
   --act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}
                         Specify which quantization calibration method to use
                         for activations supported values: min-max (default),
                         sqnr, entropy, mse, percentile This option can be
                         paired with --act_quantizer_schema to override the
                         quantization schema to use for activations otherwise
                         default schema(asymmetric) will be used
   --param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}
                         Specify which quantization calibration method to use
                         for parameters supported values: min-max (default),
                         sqnr, entropy, mse, percentile This option can be
                         paired with --act_quantizer_schema to override the
                         quantization schema to use for activations otherwise
                         default schema(asymmetric) will be used
   --act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}
                         Specify which quantization schema to use for
                         activations. Note: Default is asymmetric.
   --param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}
                         Specify which quantization schema to use for
                         parameters. Note: Default is asymmetric.
   --percentile_calibration_value PERCENTILE_CALIBRATION_VALUE
                         Value must lie between 90 and 100. Default is 99.99
   --use_per_channel_quantization
                         Use per-channel quantization for convolution-based op
                         weights. Note: This will replace built-in model QAT
                         encodings when used for a given weight.
   --use_per_row_quantization
                         Use this option to enable rowwise quantization of
                         Matmul and FullyConnected ops.
   --float_fallback      Use this option to enable fallback to floating point
                         (FP) instead of fixed point. This option can be paired
                         with --quantizer_float_bitwidth to indicate the
                         bitwidth for FP (by default 32). If this option is
                         enabled, then input list must not be provided and
                         --ignore_encodings must not be provided. The external
                         quantization encodings (encoding file/FakeQuant
                         encodings) might be missing quantization parameters
                         for some interim tensors. First it will try to fill
                         the gaps by propagating across math-invariant
                         functions. If the quantization params are still
                         missing, then it will apply fallback to nodes to
                         floating point.
   --extra_quantizer_args EXTRA_QUANTIZER_ARGS
                         additional quantizer arguments in a quoted string.
                         example: --extra_quantizer_args
                         'arg1=value1;arg2=value2'

 Net-run Arguments:
   --perf_profile {low_balanced,balanced,default,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}
                         Specifies perf profile to set. Valid settings are
                         "low_balanced" , "balanced" , "default",
                         "high_performance" ,"sustained_high_performance",
                         "burst", "low_power_saver", "power_saver",
                         "high_power_saver", "extreme_power_saver", and
                         "system_settings". Note: perf_profile argument is now
                         deprecated for HTP backend, user can specify
                         performance profile through backend extension config
                         now.
   --profiling_level PROFILING_LEVEL
                         Enables profiling and sets its level. For QNN
                         executor, valid settings are "basic", "detailed" and
                         "client" For SNPE executor, valid settings are "off",
                         "basic", "moderate", "detailed", and "linting".
                         Default is detailed.
   --userlogs {warn,verbose,info,error,fatal}
                         Enable verbose logging. Note: This argument is
                         applicable only when --executor_type snpe
   --log_level {error,warn,info,debug,verbose}
                         Enable verbose logging. Note: This argument is
                         applicable only when --executor_type qnn
   --extra_runtime_args EXTRA_RUNTIME_ARGS
                         additional net runner arguments in a quoted string.
                         example: --extra_runtime_args
                         'arg1=value1;arg2=value2'

 Other optional Arguments:
   --executor_type {qnn,snpe}
                         Choose between qnn(qnn-net-run) and snpe(snpe-net-run)
                         execution. If not provided, qnn-net-run will be
                         executed for QAIRT or QNN SDK, or else snpe-net-run
                         will be executed for SNPE SDK.
   --stage {source,converted,quantized}
                         Specifies the starting stage in the Accuracy Debugger
                         pipeline. source: starting with a source framework
                         model, converted: starting with a converted model,
                         quantized: starting with a quantized model. Default is
                         source.
   -p ENGINE_PATH, --engine_path ENGINE_PATH
                         Path to SDK folder.
   --deviceId DEVICEID   The serial number of the device to use. If not passed,
                         the first in a list of queried devices will be used
                         for validation.
   -v, --verbose         Set verbose logging at debugger tool level
   --host_device {x86,x86_64-windows-msvc,wos}
                         The device that will be running conversion. Set to x86
                         by default.
   -w WORKING_DIR, --working_dir WORKING_DIR
                         Working directory for the inference_engine to store
                         temporary files. Creates a new directory if the
                         specified working directory does not exist
   --output_dirname OUTPUT_DIRNAME
                         output directory name for the inference_engine to
                         store temporary files under
                         <working_dir>/inference_engine .Creates a new
                         directory if the specified working directory does not
                         exist
   --debug_mode_off      This option can be used to avoid dumping intermediate
                         outputs.
   --args_config ARGS_CONFIG
                         Path to a config file with arguments. This can be used
                         to feed arguments to the AccuracyDebugger as an
                         alternative to supplying them on the command line.
   --remote_server REMOTE_SERVER
                         ip address of remote machine
   --remote_username REMOTE_USERNAME
                         username of remote machine
   --remote_password REMOTE_PASSWORD
                         password of remote machine
   --golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY, --golden_dir_for_mapping GOLDEN_OUTPUT_REFERENCE_DIRECTORY
                         Optional parameter to indicate the directory of the
                         goldens, it's used for tensor mapping without running
                         model with framework runtime.
   --disable_offline_prepare
                         Use this option to disable offline preparation. Note:
                         By default offline preparation will be done for
                         DSP/HTP runtimes.
   --backend_extension_config BACKEND_EXTENSION_CONFIG
                         Path to config to be used with qnn-context-binary-
                         generator. Note: This argument is applicable only when
                         --executor_type qnn
   --context_config_params CONTEXT_CONFIG_PARAMS
                         optional context config params in a quoted string.
                         example: --context_config_params
                         'context_priority=high;
                         cache_compatibility_mode=strict' Note: This argument
                         is applicable only when --executor_type qnn
   --graph_config_params GRAPH_CONFIG_PARAMS
                         optional graph config params in a quoted string.
                         example: --graph_config_params 'graph_priority=low;
                         graph_profiling_num_executions=10'
   --extra_contextbin_args EXTRA_CONTEXTBIN_ARGS
                         Additional context binary generator arguments in a
                         quoted string(applicable only when --executor_type
                         qnn). example: --extra_contextbin_args
                         'arg1=value1;arg2=value2'
   --start_layer START_LAYER
                         save all intermediate layer outputs from provided
                         start layer to bottom layer of model. Can be used in
                         conjunction with --end_layer.
   --end_layer END_LAYER
                         save all intermediate layer outputs from top layer to
                         provided end layer of model. Can be used in
                         conjunction with --start_layer.
   --add_layer_outputs ADD_LAYER_OUTPUTS
                         Output layers to be dumped. e.g: node1,node2
   --add_layer_types ADD_LAYER_TYPES
                         outputs of layer types to be dumped. e.g
                         :Resize,Transpose. All enabled by default.
   --skip_layer_types SKIP_LAYER_TYPES
                         comma delimited layer types to skip dumping. e.g
                         :Resize,Transpose
   --skip_layer_outputs SKIP_LAYER_OUTPUTS
                         comma delimited layer output names to skip dumping.
                         e.g: node1,node2

Please note: All the command line arguments should either be provided through command line or through the config file. They will not override those in the config file if there is an overlap.

Sample Commands

# Example for running on Linux host's CPU by passing quantization encodings
qairt-accuracy-debugger \
    --inference_engine \
    --runtime cpu \
    --architecture x86_64-linux-clang \
    --model_path model.onnx \
    --input_list InceptionV3Model/data/image_list.txt \
    --quantization_overrides InceptionV3Model/data/AIMET_quantization_encodings.json

# Example for running on Linux host's CPU without quantization encodings
qairt-accuracy-debugger \
    --inference_engine \
    --runtime cpu \
    --architecture x86_64-linux-clang \
    --model_path model.onnx \
    --input_list InceptionV3Model/data/image_list.txt \
    --calibration_input_list InceptionV3Model/data/calibration_list.txt
    --param_quantizer_schema symmetric \
    --act_quantizer_schema asymmetric \
    --param_quantizer_calibration sqnr \
    --act_quantizer_calibration percentile \
    --percentile_calibration_value 99.995 \
    --bias_bitwidth 32

# Example for running on Android DSP target
qairt-accuracy-debugger \
    --inference_engine \
    --runtime dspv75 \
    --architecture aarch64-android \
    --deviceId 357415c4 \
    --model_path model.onnx \
    --input_list InceptionV3Model/data/image_list.txt \
    --quantization_overrides InceptionV3Model/data/AIMET_quantization_encodings.json

# Example for running on Android GPU target with fp32 precision
qairt-accuracy-debugger \
    --inference_engine \
    --runtime gpu \
    --architecture aarch64-android \
    --framework tensorflow \
    --model_path InceptionV3Model/inception_v3_2016_08_28_frozen.pb \
    --input_tensor "input:0" 1,299,299,3 InceptionV3Model/data/chairs.raw \
    --output_tensor InceptionV3/Predictions/Reshape_1 \
    --input_list InceptionV3Model/data/image_list.txt \
    --converter_float_bitwidth 32

# Example for running on Android GPU target with fp16 precision
qairt-accuracy-debugger \
    --inference_engine \
    --runtime gpu \
    --architecture aarch64-android \
    --framework tensorflow \
    --model_path InceptionV3Model/inception_v3_2016_08_28_frozen.pb \
    --input_tensor "input:0" 1,299,299,3 InceptionV3Model/data/chairs.raw \
    --output_tensor InceptionV3/Predictions/Reshape_1 \
    --input_list InceptionV3Model/data/image_list.txt \
    --converter_float_bitwidth 16

# Example for running on DSP of "Windows on Snapdragon" machine
qairt-accuracy-debugger \
    --inference_engine \
    --runtime dspv75 \
    --architecture wos \
    --host_device wos \
    --model_path model.onnx \
    --input_list InceptionV3Model\data\image_list.txt \
    --quantization_overrides InceptionV3Model/data/AIMET_quantization_encodings.json

# Example for running on Windows native
qairt-accuracy-debugger \
    --inference_engine \
    --runtime cpu \
    --architecture x86_64-windows-msvc \
    --host_device x86_64-windows-msvc \
    --model_path model.onnx \
    --input_list InceptionV3Model\data\image_list.txt \
    --quantization_overrides InceptionV3Model/data/AIMET_quantization_encodings.json
Tip:
  • Although tool can quantize the given model using data provided through –calibration_input_list argument, it is recommended to pass quantization encodings through –quantization_overrides argument to speed-up the execution

  • –input_tensor and –output_tensor arguments are mandatory for Tensorflow and TFlite models but they does not need to have indexing information (“:0”) unlike framework runner

  • Before running the qairt-accuracy-debugger on a Windows x86 system/Windows on Snapdragon system, ensure that you have configured the environment. Specify the host and target machine as x86_64-windows-msvc/wos respectively

  • Note that qairt-accuracy-debugger on Windows x86 system is tested only for CPU runtime currently

More example commands with different configurations:

Sample Commands

# source stage: same as examples from above section (default for stage is "source")

# Running from converted stage (Android DSP):
qairt-accuracy-debugger \
    --inference_engine \
    --stage converted \
    --input_dlc converted_model.dlc \
    --runtime dspv75 \
    --deviceId f366ce60 \
    --architecture aarch64-android \
    --input_list InceptionV3Model/data/image_list.txt \
    --quantization_overrides InceptionV3Model/data/AIMET_quantization_encodings.json

# Running from quantized stage (x86 CPU):
qairt-accuracy-debugger \
    --inference_engine \
    --stage quantized \
    --input_dlc quantized_model.dlc \
    --runtime cpu \
    --architecture x86_64-linux-clang \
    --input_list InceptionV3Model/data/image_list.txt \
    --quantization_overrides InceptionV3Model/data/AIMET_quantization_encodings.json

# Running with --extra_converter_args argument for enabling preserve_io and passing onnx symbols (Android DSP):
qairt-accuracy-debugger \
    --inference_engine \
    --runtime dspv75
    --architecture aarch64_android \
    --model_path model.onnx \
    --input_list InceptionV3Model/data/image_list.txt \
    --quantization_overrides InceptionV3Model/data/AIMET_quantization_encodings.json \
    --extra_converter_args 'onnx_define_symbol seq_length=384;onnx_define_symbol batch_size=1'

# Run onnx model with custom operator (Android DSP):
qairt-accuracy-debugger \
    --inference_engine \
    --runtime dspv75
    --architecture aarch64_android \
    --model_path model.onnx \
    --input_list InceptionV3Model/data/image_list.txt \
    --quantization_overrides InceptionV3Model/data/AIMET_quantization_encodings.json \
    --executor_type qnn \
    --extra_converter_args 'op_package_config=CustomPreTopKOpPackageCPU_v2.xml;converter_op_package_lib=libCustomPreTopKOpPackageHtp.so:CustomPreTopKOpPackageHtpInterfaceProvider:' \
    --extra_contextbin_args 'op_packages=libQnnCustomPreTopKOpPackageHtp.so:CustomPreTopKOpPackageHtpInterfaceProvider:' \
    --extra_runtime_args 'op_packages=libQnnCustomPreTopKOpPackageHtp_v75.so:CustomPreTopKOpPackageHtpInterfaceProvider'

Outputs

Once the Inference Engine has finished running, it will store the outputs in the specified working directory. By default, it will store the output in working_directory/inference_engine in the current working directory. Creates a directory named latest in working_directory/inference_engine which is symbolically linked to the most recent run YYYY-MM-DD_HH:mm:ss. Users may choose to override the directory name by passing it to –output_dirname (i.e. –output_dirname myTest1). The following figure shows a sample output folder from a Inference Engine run.

../_static/resources/qairt_inference_engine.png

The “output” directory contains raw files. Each raw file is an output of an operation in the network. In addition to generating the .raw files, the inference_engine also generates the model’s graph structure in a .json file. The name of the file is the same as the name of the protobuf model file. The model_graph_struct.json aids in providing structure related information of the converted model graph during the verification step. Specifically, it helps with organizing the nodes in order (for i.e. the beginning nodes should come earlier than ending nodes).

The inference_engine_options.json file contains all the options with which the run was launched. The base_quantized_encoding.json contains quantization encodings used by the model.

Finally, the tensor_mapping file contains a mapping of the various intermediate output file names generated from the framework runner step and the inference engine step.

Verification

The Verification step compares the output (from the intermediate tensors of a given model) produced by the framework runner step with the output produced by the inference engine step. Once the comparison is complete, the verification results are compiled and displayed visually in a format that can be easily interpreted by the user.

There are different types of verifiers for e.g.: CosineSimilarity, RtolAtol, etc. To see available verifiers please use the –help option (qairt-accuracy-debugger –verification –help). Each verifier compares the Framework Runner and Inference Engine output using an error metric. It also prepares reports and/or visualizations to help the user analyze the network’s error data.

Usage

usage: qairt-accuracy-debugger --verification [-h]
                              --default_verifier DEFAULT_VERIFIER
                              [DEFAULT_VERIFIER ...]
                              --golden_output_reference_directory
                              GOLDEN_OUTPUT_REFERENCE_DIRECTORY
                              --inference_results INFERENCE_RESULTS
                              [--tensor_mapping TENSOR_MAPPING]
                              [--dlc_path DLC_PATH]
                              [--verifier_config VERIFIER_CONFIG]
                              [--graph_struct GRAPH_STRUCT] [-v]
                              [-w WORKING_DIR]
                              [--output_dirname OUTPUT_DIRNAME]
                              [--args_config ARGS_CONFIG]
                              [--target_encodings TARGET_ENCODINGS]
                              [-e ENGINE [ENGINE ...]]

Script to run verification.

required arguments:
  --default_verifier DEFAULT_VERIFIER [DEFAULT_VERIFIER ...]
                        Default verifier used for verification. The options
                        "RtolAtol", "AdjustedRtolAtol", "TopK", "L1Error",
                        "CosineSimilarity", "MSE", "MAE", "SQNR", "ScaledDiff"
                        are supported. An optional list of hyperparameters can
                        be appended. For example: --default_verifier
                        rtolatol,rtolmargin,0.01,atolmargin,0.01 An optional
                        list of placeholders can be appended. For example:
                        --default_verifier CosineSimilarity param1 1 param2 2.
                        to use multiple verifiers, add additional
                        --default_verifier CosineSimilarity
  --golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY, --framework_results GOLDEN_OUTPUT_REFERENCE_DIRECTORY
                        Path to root directory of golden output files. Paths
                        may be absolute, or relative to the working directory.
  --inference_results INFERENCE_RESULTS
                        Path to root directory generated from inference engine
                        diagnosis. Paths may be absolute, or relative to the
                        working directory.

optional arguments:
  --tensor_mapping TENSOR_MAPPING
                        Path to the file describing the tensor name mapping
                        between inference and golden tensors.
  --dlc_path DLC_PATH   Path to the dlc file, used for transforming axis of
                        golden outputs w.r.t to target outputs. Note:
                        Applicable for QAIRT/SNPE
  --verifier_config VERIFIER_CONFIG
                        Path to the verifiers' config file
  --graph_struct GRAPH_STRUCT
                        Path to the inference graph structure .json file. This
                        file aids in providing structure related information
                        of the converted model graph during this stage.Note:
                        This file is mandatory when using ScaledDiff verifier
  -v, --verbose         Verbose printing
  -w WORKING_DIR, --working_dir WORKING_DIR
                        Working directory for the verification to store
                        temporary files. Creates a new directory if the
                        specified working directory does not exist
  --output_dirname OUTPUT_DIRNAME
                        output directory name for the verification to store
                        temporary files under <working_dir>/verification.
                        Creates a new directory if the specified working
                        directory does not exist
  --args_config ARGS_CONFIG
                        Path to a config file with arguments. This can be used
                        to feed arguments to the AccuracyDebugger as an
                        alternative to supplying them on the command line.
  --target_encodings TARGET_ENCODINGS
                        Path to target encodings json file.

Arguments for generating Tensor mapping (required when --tensor_mapping is not specified):
  -e ENGINE [ENGINE ...], --engine ENGINE [ENGINE ...]
                        Name of engine(qnn/snpe) that is used for running
                        inference.

 Please note: All the command line arguments should either be provided through command line or through the config file. They will not override those in the config file if there is overlap.

Note

The standalone verification process run using qairt-accuracy-debugger –verification optionally uses –tensor_mapping and –graph_struct to find files to compare. These files are generated by the inference engine step, and should be supplied to verification for best results. By default they are named tensor_mapping.json and {model name}_graph_struct.json, and can be found in the output directory of the inference engine results.

Sample Commands

# Compare output of framework runner with inference engine
qairt-accuracy-debugger \
     --verification \
     --default_verifier CosineSimilarity param1 1 param2 2 \
     --default_verifier SQNR param1 5 param2 1 \
     --golden_output_reference_directory working_directory/framework_runner/latest/ \
     --inference_results working_directory/inference_engine/latest/output/Result_0/ \
     --tensor_mapping working_directory/inference_engine/latest/tensor_mapping.json \
     --graph_struct working_directory/inference_engine/latest/qnn_model_graph_struct.json
Tip:
  • If you passed multiple images in the image_list.txt from run inference engine diagnosis, you’ll receive multiple output/Result_x. Choose the result that matches the input you used for framework runner for comparison (i.e., in framework you used chair.raw and inference chair.raw was the first item in the image_list.txt then choose output/Result_0 if chair.raw was the second item in image_list.txt, then choose output/Result_1).

  • It is recommended to always supply ‘graph_struct’ and ‘tensor_mapping’ to the command as it is used to line up the report and find the corresponding files for comparison. If tensor_mapping did not get generated from previous steps, you can supplement with ‘model_path’, ‘engine’, ‘framework’ to have the module generate ‘tensor_mapping’ during runtime.

  • If both targets and golden outputs are to be exact-name-matching, then you do not need to provide a tensor_mapping file.

Verifier Config:

The verifier config file is a JSON file that tells verification which verifiers (aside from the default verifier) to use and with which parameters and on what specific tensors. If no config file is provided, the tool will only use the default verifier specified from the command line, with its default parameters, on all the tensors. The JSON file is keyed by verifier names, with each verifier as its own dictionary keyed by “parameters” and “tensors”.

Config File

```json
{
    "MeanIOU": {
        "parameters": {
            "background_classification": 1.0
        },
        "tensors": [["Postprocessor/BatchMultiClassNonMaxSuppression_boxes", "detection_classes:0"]]
    },
    "TopK": {
        "parameters": {
            "k": 5,
            "ordered": false
        },
        "tensors": [["Reshape_1:0"], ["detection_classes:0"]]
    }
}
```

Note that the “tensors” field is a list of lists. This is done because specific verifiers run on two tensors at a time. Hence the two tensors are placed in a list. Otherwise if a verifier only runs on one tensor, it will have a list of lists with only one tensor name in each list. MeanIOU is not supported as a verifer in Debugger.

Tensor Mapping:

Tensor mapping is a JSON file keyed by inference tensor names, of framework tensor names. If the tensor mapping is not provided, the tool will assume inference and golden tensor names are identical.

Tensor Mapping File

```json
{
    "Postprocessor/BatchMultiClassNonMaxSuppression_boxes": "detection_boxes:0",
    "Postprocessor/BatchMultiClassNonMaxSuppression_scores": "detection_scores:0"
}
```

Outputs

Once the Verification has finished running, it will store the outputs in the specified working directory. By default, it will store the output in working_directory/verification in the current working directory. Creates a directory named latest in working_directory/verification which is symbolically linked to the most recent run YYYY-MM-DD_HH:mm:ss. Users may choose to override the directory name by passing it to –output_dirname (i.e. –output_dirname myTest1). The following figure shows a sample output folder from a Verification run.

../_static/resources/qairt_verification.png

Verification’s output is divided into different verifiers. For example, if both mse and sqnr verifiers are used, there will be two sub-directories named “mse” and “sqnr”. Under each sub-directory, for each tensor, a CSV and HTML file is generated.

In addition to the tensor-specific analysis, the tool also generates a summary CSV and HTML file which summarizes the data from all verifiers and their subsequent tensors. The following figure shows how a sample summary generated in the verification step looks. Each row in this summary corresponds to one tensor name that is identified by the framework runner and inference engine steps. The final column shows cosinesimilarity score which can vary between 0 to 1 (this range might be different for other verifiers). Higher scores denote similarity while lower scores indicate variance. The developer can then further investigate those specific tensor details. The developer should inspect tensors from top-to-bottom order, meaning if a tensor is broken at an earlier node, anything that was generated post that node is unreliable until that node is properly fixed.

../_static/resources/verification_results.png

Compare Encodings

The Compare Encodings feature is designed to compare Target and AIMET encodings. This feature takes Target DLC and AIMET encoding JSON file as inputs. This feature executes in the following order.

  1. Extracts encodings from the given DLC file

  2. Compares extracted DLC encodings with given AIMET encodings

  3. Writes results to an Excel file that highlights mismatches

  4. Throws warnings if some encodings are present in DLC but not in AIMET and vice-versa

  5. Writes the extracted DLC encodings JSON file (for reference)

Usage

usage: qairt-accuracy-debugger --compare_encodings [-h]
                             --input INPUT
                             --aimet_encodings_json AIMET_ENCODINGS_JSON
                             [--precision PRECISION]
                             [--params_only]
                             [--activations_only]
                             [--specific_node SPECIFIC_NODE]
                             [--working_dir WORKING_DIR]
                             [--output_dirname OUTPUT_DIRNAME]
                             [-v]

Script to compare DLC encodings with AIMET encodings

optional arguments:
  -h, --help            Show this help message and exit

required arguments:
  --input INPUT
                        Path to DLC file
  --aimet_encodings_json AIMET_ENCODINGS_JSON
                        Path to AIMET encodings JSON file

optional arguments:
  --precision PRECISION
                        Number of decimal places up to which comparison will be done (default: 17)
  --params_only         Compare only parameters in the encodings
  --activations_only    Compare only activations in the encodings
  --specific_node SPECIFIC_NODE
                        Display encoding differences for the given node
  --working_dir WORKING_DIR
                        Working directory for the compare_encodings to store temporary files.
                        Creates a new directory if the specified working directory does not exist.
  --output_dirname OUTPUT_DIRNAME
                        Output directory name for the compare_encodings to store temporary files
                        under <working_dir>/compare_encodings. Creates a new directory if the
                        specified working directory does not exist.
  -v, --verbose         Verbose printing

Sample Commands

# Compare both params and activations
qairt-accuracy-debugger \
    --compare_encodings \
    --input quantized_model.dlc \
    --aimet_encodings_json aimet_encodings.json

# Compare only params
qairt-accuracy-debugger \
    --compare_encodings \
    --input quantized_model.dlc \
    --aimet_encodings_json aimet_encodings.json \
    --params_only

# Compare only activations
qairt-accuracy-debugger \
    --compare_encodings \
    --input quantized_model.dlc \
    --aimet_encodings_json aimet_encodings.json \
    --activations_only

# Compare only a specific encoding
qairt-accuracy-debugger \
    --compare_encodings \
    --input quantized_model.dlc \
    --aimet_encodings_json aimet_encodings.json \
    --specific_node _2_22_Conv_output_0

Tip

A working_directory is generated from wherever this script is called from unless otherwise specified.

Outputs

Once the Compare Encodings has finished running, it will store the outputs in the specified working directory. By default, it will store the output in working_directory/compare_encodings in the current working directory. Creates a directory named latest in working_directory/compare_encodings which is symbolically linked to the most recent run YYYY-MM-DD_HH:mm:ss. Users may choose to override the directory name by passing it to –output_dirname (i.e. –output_dirname myTest1). The following figure shows a sample output folder from a Compare Encodings run.

../_static/resources/compare_encodings.png
The following details what each file contains.
  • compare_encodings_options.json contains all the options used to run this feature

  • encodings_diff.xlsx contains comparison results with mismatches highlighted

  • log.txt contains log statements for the run

  • extracted_encodings.json contains extracted DLC encodings

Tensor inspection

Tensor inspection compares given reference output and target output tensors and dumps various statistics to represent differences between them.

The Tensor inspection feature can:

  1. Plot histograms for golden and target tensors

  2. Plot a graph indicating deviation between golden and target tensors

  3. Plot a cumulative distribution graph (CDF) for golden vs. target tensors

  4. Plot a density (KDE) graph for target tensor highlighting target min/max and calibrated min/max values

  5. Create a CSV file containing information about: target min/max; calibrated min/max; golden output min/max; target/calibrated min/max differences; and computed metrics (verifiers).

Note

Only data with matching target/golden filenames is inspected; other data is ignored.
This feature expects the golden and target tensors to have the same dimensions, datatypes, and layouts.
Calibrated min/max values are extracted from a user provided encodings file. If an encodings file is not provided, density plot will be skipped and also the CSV summary output will not include calibrated min/max information.

Usage

usage: qairt-accuracy-debugger --tensor_inspection [-h]
                        --golden_data GOLDEN_DATA
                        --target_data TARGET_DATA
                        --verifier VERIFIER [VERIFIER ...]
                        [-w WORKING_DIR]
                        [--data_type {int8,uint8,int16,uint16,float32}]
                        [--target_encodings TARGET_ENCODINGS]
                        [-v]

Script to inspection tensor.

required arguments:
  --golden_data GOLDEN_DATA
                        Path to golden/framework outputs folder. Paths may be absolute or
                        relative to the working directory.
  --target_data TARGET_DATA
                        Path to target outputs folder. Paths may be absolute or relative to the
                        working directory.
  --verifier VERIFIER [VERIFIER ...]
                        Verifier used for verification. The options "RtolAtol",
                        "AdjustedRtolAtol", "TopK", "L1Error", "CosineSimilarity", "MSE", "MAE",
                        "SQNR", "ScaledDiff" are supported.
                        An optional list of hyperparameters can be appended, for example:
                        --verifier rtolatol,rtolmargin,0.01,atolmargin,0,01.
                        To use multiple verifiers, add additional --verifier CosineSimilarity

optional arguments:
  -w WORKING_DIR, --working_dir WORKING_DIR
                        Working directory to save results. Creates a new directory if the
                        specified working directory does not exist
  --data_type {int8,uint8,int16,uint16,float32}
                        DataType of the output tensor.
  --target_encodings TARGET_ENCODINGS
                        Path to target encodings json file.
  -v, --verbose         Verbose printing

Sample Commands

# Basic run
qairt-accuracy-debugger --tensor_inspection \
    --golden_data golden_tensors_dir \
    --target_data target_tensors_dir \
    --verifier sqnr

# Pass target encodings file and enable multiple verifiers
qairt-accuracy-debugger --tensor_inspection \
    --golden_data golden_tensors_dir \
    --target_data target_tensors_dir \
    --verifier mse \
    --verifier sqnr \
    --verifier rtolatol,rtolmargin,0.01,atolmargin,0.01 \
    --target_encodings qnn_encoding.json

Tip

A working_directory is generated from wherever this script is called from unless otherwise specified.

Outputs

Once the Tensor Inspection has finished running, it will store the outputs in the specified working directory. By default, it will store the output in working_directory/tensor_inspection in the current working directory. Creates a directory named latest in working_directory/tensor_inspection which is symbolically linked to the most recent run YYYY-MM-DD_HH:mm:ss. Users may choose to override the directory name by passing it to –output_dirname (i.e. –output_dirname myTest1). The following figure shows a sample output folder from a Tensor Inspection run.

../_static/resources/tensor_inspection.png

The following details what each file contains.

  • Each tensor will have its own directory; the directory name matches the tensor name.

    • CDF_plots.html – Golden vs. target CDF graph

    • Diff_plots.html – Golden and target deviation graph

    • Distribution_min-max.png – Density plot for target tensor highlighting target vs. calibrated min/max values

    • Histograms.html – Golden and target histograms

    • golden_data.csv – Golden tensor data

    • target_data.csv – Target tensor data

  • log.txt – Log statements from the entire run

  • summary.csv – Target min/max, calibrated min/max, golden output min/max, target vs. calibrated min/max differences, and verifier outputs

Histogram Plots

  1. Comparison: We compare histograms for both the golden data and the target data.

  2. Overlay: To enhance clarity, we overlay the histograms bin by bin.

  3. Binned Ranges: Each bin represents a value range, showing the frequency of occurrence.

  4. Visual Insight: Overlapping histograms reveal differences or similarities between the datasets.

  5. Interactive: Hover over histograms to get tensor range and frequencies for the dataset.

Cumulative Distribution Function (CDF) Plots

  1. Overview: CDF plots display the cumulative probability distribution.

  2. Overlay: We superimpose CDF plots for golden and target data.

  3. Percentiles: These plots illustrate data distribution across different percentiles.

  4. Hover Details: Exact cumulative probabilities are available on hover.

Tensor Difference Plots

  1. Inspection: We generate plots highlighting differences between golden and target data tensors.

  2. Scatter and Line: Scatter plots represent tensor values, while line plots show differences at each index.

  3. Interactive: Hover over points to access precise values.

Snooping

Snooping algorithms help in finding inaccuracies in a neural-network at the layer level. The following snooping options are available:

  1. oneshot-layerwise

  2. cumulative-layerwise

  3. layerwise

  4. binary

oneshot-layerwise Snooping

This algorithm is designed to debug all layers of the model at a time by performing below steps:

  1. Execute framework runner to collect reference outputs from all intermediate tensors of a model in fp32 precision

  2. Execute inference engine to collect target outputs from all intermediate tensors of a model in provided target precision

  3. Execute verification for comparison of intermediate outputs from the above two steps

This algorithm can be used to get quick analysis to check if layers in the model are quantization sensitive.

../_static/resources/oneshot_diagram.png

Usage

usage: qairt-accuracy-debugger --snooping oneshot-layerwise [-h]
                              --default_verifier DEFAULT_VERIFIER
                              [--result_csv RESULT_CSV]
                              [--verifier_config VERIFIER_CONFIG]
                              [--run_tensor_inspection] -r
                              {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic}
                              -a
                              {x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}
                              -l INPUT_LIST [--input_network MODEL_PATH]
                              [--input_tensor INPUT_TENSOR [INPUT_TENSOR ...]]
                              [--out_tensor_node OUTPUT_TENSOR]
                              [--io_config IO_CONFIG]
                              [--converter_float_bitwidth {32,16}]
                              [--extra_converter_args EXTRA_CONVERTER_ARGS]
                              [--calibration_input_list CALIBRATION_INPUT_LIST]
                              [-bbw {8,32}] [-abw {8,16}] [-wbw {8,4}]
                              [--quantizer_float_bitwidth {32,16}]
                              [--act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}]
                              [--param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}]
                              [--act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}]
                              [--param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}]
                              [--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE]
                              [--use_per_channel_quantization]
                              [--use_per_row_quantization] [--float_fallback]
                              [--extra_quantizer_args EXTRA_QUANTIZER_ARGS]
                              [--perf_profile {low_balanced,balanced,default,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}]
                              [--profiling_level PROFILING_LEVEL]
                              [--userlogs {warn,verbose,info,error,fatal}]
                              [--log_level {error,warn,info,debug,verbose}]
                              [--extra_runtime_args EXTRA_RUNTIME_ARGS]
                              [--executor_type {qnn,snpe}]
                              [--stage {source,converted,quantized}]
                              [-p ENGINE_PATH] [--deviceId DEVICEID] [-v]
                              [--host_device {x86,x86_64-windows-msvc,wos}]
                              [-w WORKING_DIR]
                              [--output_dirname OUTPUT_DIRNAME]
                              [--debug_mode_off] [--args_config ARGS_CONFIG]
                              [--remote_server REMOTE_SERVER]
                              [--remote_username REMOTE_USERNAME]
                              [--remote_password REMOTE_PASSWORD]
                              [--golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY]
                              [--disable_offline_prepare]
                              [--backend_extension_config BACKEND_EXTENSION_CONFIG]
                              [--context_config_params CONTEXT_CONFIG_PARAMS]
                              [--graph_config_params GRAPH_CONFIG_PARAMS]
                              [--extra_contextbin_args EXTRA_CONTEXTBIN_ARGS]
                              [--disable_graph_optimization]
                              [--onnx_custom_op_lib ONNX_CUSTOM_OP_LIB]
                              [-f FRAMEWORK [FRAMEWORK ...]]
                              [-qo QUANTIZATION_OVERRIDES]
                              [--start_layer START_LAYER]
                              [--end_layer END_LAYER]
                              [--add_layer_outputs ADD_LAYER_OUTPUTS]
                              [--add_layer_types ADD_LAYER_TYPES]
                              [--skip_layer_types SKIP_LAYER_TYPES]
                              [--skip_layer_outputs SKIP_LAYER_OUTPUTS]

Script to run oneshot-layerwise snooping.

options:
  -h, --help            show this help message and exit

Verifier Arguments:
  --default_verifier DEFAULT_VERIFIER [DEFAULT_VERIFIER ...]
                        Default verifier used for verification. The options
                        "RtolAtol", "AdjustedRtolAtol", "TopK", "L1Error",
                        "CosineSimilarity", "MSE", "MAE", "SQNR", "ScaledDiff"
                        are supported. An optional list of hyperparameters can
                        be appended. For example: --default_verifier
                        rtolatol,rtolmargin,0.01,atolmargin,0.01 An optional
                        list of placeholders can be appended. For example:
                        --default_verifier CosineSimilarity param1 1 param2 2.
                        to use multiple verifiers, add additional
                        --default_verifier CosineSimilarity
  --result_csv RESULT_CSV
                        Path to the csv summary report comparing the inference
                        vs frameworkPaths may be absolute, or relative to the
                        working directory.if not specified, then a
                        --problem_inference_tensor must be specified
  --verifier_config VERIFIER_CONFIG
                        Path to the verifiers' config file
  --run_tensor_inspection
                        To run tensor inspection, pass this argument

Required Arguments:
  -r {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic}, --runtime {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic}
                        Runtime to be used. Note: In case of SNPE
                        execution(--executor_type snpe), aic runtime is not
                        supported.
  -a {x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}, --architecture {x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}
                        Name of the architecture to use for inference engine.
                        Note: In case of SNPE execution(--executor_type snpe),
                        aarch64-qnx architecture is not supported.
  -l INPUT_LIST, --input_list INPUT_LIST
                        Path to the input list text file to run inference(used
                        with net-run). Note: When having multiple entries in
                        text file, in order to save memory and time, you can
                        pass --debug_mode_off to skip intermediate outputs
                        dump.

QAIRT Converter Arguments:
  --input_network MODEL_PATH, --model_path MODEL_PATH
                        Path to the model file(s).
  --input_tensor INPUT_TENSOR [INPUT_TENSOR ...]
                        The name and dimension of all the input buffers to the
                        network specified in the format [input_name comma-
                        separated-dimensions sample-data data-type] Note:
                        sample-data and data-type are optional for example:
                        'data' 1,224,224,3. Note that the quotes should always
                        be included in order to handle special characters,
                        spaces, etc. For multiple inputs, specify multiple
                        --input_tensor on the command line like:
                        --input_tensor "data1" 1,224,224,3 sample1.raw float32
                        --input_tensor "data2" 1,50,100,3 sample2.raw int64
                        NOTE: Required for TensorFlow and PyTorch. Optional
                        for Onnx and Tflite. In case of Onnx, this feature
                        works only with Onnx 1.6.0 and above.
  --out_tensor_node OUTPUT_TENSOR, --output_tensor OUTPUT_TENSOR
                        Name of the graph's output Tensor Names. Multiple
                        output names should be provided separately like:
                        --out_tensor_node out_1 --out_tensor_node out_2 NOTE:
                        Required for TensorFlow. Optional for Onnx, Tflite and
                        PyTorch
  --io_config IO_CONFIG
                        Use this option to specify a yaml file for input and
                        output options.
  --converter_float_bitwidth {32,16}
                        Use this option to convert the graph to the specified
                        float bitwidth, either 32 (default) or 16. Note:
                        Cannot be used with --calibration_input_list and
                        --quantization_overrides
  --extra_converter_args EXTRA_CONVERTER_ARGS
                        additional converter arguments in a quoted string.
                        example: --extra_converter_args
                        'arg1=value1;arg2=value2'
  -qo QUANTIZATION_OVERRIDES, --quantization_overrides QUANTIZATION_OVERRIDES
                        Path to quantization overrides json file.

QAIRT Quantizer Arguments:
  --calibration_input_list CALIBRATION_INPUT_LIST
                        Path to the inputs list text file to run
                        quantization(used with qairt-quantizer).
  -bbw {8,32}, --bias_bitwidth {8,32}
                        option to select the bitwidth to use when quantizing
                        the bias. default 8
  -abw {8,16}, --act_bitwidth {8,16}
                        option to select the bitwidth to use when quantizing
                        the activations. default 8
  -wbw {8,4}, --weights_bitwidth {8,4}
                        option to select the bitwidth to use when quantizing
                        the weights. default 8
  --quantizer_float_bitwidth {32,16}
                        Use this option to select the bitwidth to use for
                        float tensors, either 32 (default) or 16.
  --act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}
                        Specify which quantization calibration method to use
                        for activations. This option has to be paired with
                        --act_quantizer_schema.
  --param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}
                        Specify which quantization calibration method to use
                        for parameters. This option has to be paired with
                        --param_quantizer_schema.
  --act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}
                        Specify which quantization schema to use for
                        activations. Note: Default is asymmetric.
  --param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}
                        Specify which quantization schema to use for
                        parameters. Note: Default is asymmetric.
  --percentile_calibration_value PERCENTILE_CALIBRATION_VALUE
                        Value must lie between 90 and 100. Default is 99.99
  --use_per_channel_quantization
                        Use per-channel quantization for convolution-based op
                        weights. Note: This will replace built-in model QAT
                        encodings when used for a given weight.
  --use_per_row_quantization
                        Use this option to enable rowwise quantization of
                        Matmul and FullyConnected ops.
  --float_fallback      Use this option to enable fallback to floating point
                        (FP) instead of fixed point. This option can be paired
                        with --quantizer_float_bitwidth to indicate the
                        bitwidth for FP (by default 32). If this option is
                        enabled, then input list must not be provided and
                        --ignore_encodings must not be provided. The external
                        quantization encodings (encoding file/FakeQuant
                        encodings) might be missing quantization parameters
                        for some interim tensors. First it will try to fill
                        the gaps by propagating across math-invariant
                        functions. If the quantization params are still
                        missing, then it will apply fallback to nodes to
                        floating point.
  --extra_quantizer_args EXTRA_QUANTIZER_ARGS
                        additional quantizer arguments in a quoted string.
                        example: --extra_quantizer_args
                        'arg1=value1;arg2=value2'

Net-run Arguments:
  --perf_profile {low_balanced,balanced,default,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}
                        Specifies perf profile to set. Valid settings are
                        "low_balanced" , "balanced" , "default",
                        "high_performance" ,"sustained_high_performance",
                        "burst", "low_power_saver", "power_saver",
                        "high_power_saver", "extreme_power_saver", and
                        "system_settings". Note: perf_profile argument is now
                        deprecated for HTP backend, user can specify
                        performance profile through backend extension config
                        now.
  --profiling_level PROFILING_LEVEL
                        Enables profiling and sets its level. For QNN
                        executor, valid settings are "basic", "detailed" and
                        "client" For SNPE executor, valid settings are "off",
                        "basic", "moderate", "detailed", and "linting".
                        Default is detailed.
  --userlogs {warn,verbose,info,error,fatal}
                        Enable verbose logging. Note: This argument is
                        applicable only when --executor_type snpe
  --log_level {error,warn,info,debug,verbose}
                        Enable verbose logging. Note: This argument is
                        applicable only when --executor_type qnn
  --extra_runtime_args EXTRA_RUNTIME_ARGS
                        additional net runner arguments in a quoted string.
                        example: --extra_runtime_args
                        'arg1=value1;arg2=value2'

Other optional Arguments:
  --executor_type {qnn,snpe}
                        Choose between qnn(qnn-net-run) and snpe(snpe-net-run)
                        execution. If not provided, qnn-net-run will be
                        executed for QAIRT or QNN SDK, or else snpe-net-run
                        will be executed for SNPE SDK.
  --stage {source,converted,quantized}
                        Specifies the starting stage in the Accuracy Debugger
                        pipeline. source: starting with a source framework
                        model, converted: starting with a converted model,
                        quantized: starting with a quantized model. Default is
                        source.
  -p ENGINE_PATH, --engine_path ENGINE_PATH
                        Path to SDK folder.
  --deviceId DEVICEID   The serial number of the device to use. If not passed,
                        the first in a list of queried devices will be used
                        for validation.
  -v, --verbose         Set verbose logging at debugger tool level
  --host_device {x86,x86_64-windows-msvc,wos}
                        The device that will be running conversion. Set to x86
                        by default.
  -w WORKING_DIR, --working_dir WORKING_DIR
                        Working directory for the snooping to store temporary
                        files. Creates a new directory if the specified
                        working directory does not exist
  --output_dirname OUTPUT_DIRNAME
                        output directory name for the snooping to store
                        temporary files under <working_dir>/snooping .Creates
                        a new directory if the specified working directory
                        does not exist
  --debug_mode_off      This option can be used to avoid dumping intermediate
                        outputs.
  --args_config ARGS_CONFIG
                        Path to a config file with arguments. This can be used
                        to feed arguments to the AccuracyDebugger as an
                        alternative to supplying them on the command line.
  --remote_server REMOTE_SERVER
                        ip address of remote machine
  --remote_username REMOTE_USERNAME
                        username of remote machine
  --remote_password REMOTE_PASSWORD
                        password of remote machine
  --golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY, --golden_dir_for_mapping GOLDEN_OUTPUT_REFERENCE_DIRECTORY
                        Optional parameter to indicate the directory of the
                        goldens, it's used for tensor mapping without running
                        model with framework runtime.
  --disable_offline_prepare
                        Use this option to disable offline preparation. Note:
                        By default offline preparation will be done for
                        DSP/HTP runtimes.
  --backend_extension_config BACKEND_EXTENSION_CONFIG
                        Path to config to be used with qnn-context-binary-
                        generator. Note: This argument is applicable only when
                        --executor_type qnn
  --context_config_params CONTEXT_CONFIG_PARAMS
                        optional context config params in a quoted string.
                        example: --context_config_params
                        'context_priority=high;
                        cache_compatibility_mode=strict' Note: This argument
                        is applicable only when --executor_type qnn
  --graph_config_params GRAPH_CONFIG_PARAMS
                        optional graph config params in a quoted string.
                        example: --graph_config_params 'graph_priority=low;
                        graph_profiling_num_executions=10'
  --extra_contextbin_args EXTRA_CONTEXTBIN_ARGS
                        Additional context binary generator arguments in a
                        quoted string(applicable only when --executor_type
                        qnn). example: --extra_contextbin_args
                        'arg1=value1;arg2=value2'
  --disable_graph_optimization
                        Disables basic model optimization
  --onnx_custom_op_lib ONNX_CUSTOM_OP_LIB
                        path to onnx custom operator library
  -f FRAMEWORK [FRAMEWORK ...], --framework FRAMEWORK [FRAMEWORK ...]
                        Framework type and version, version is optional.
                        Currently supported frameworks are [tensorflow,
                        tflite, onnx, pytorch]. For example, tensorflow 2.10.1
  --start_layer START_LAYER
                        save all intermediate layer outputs from provided
                        start layer to bottom layer of model. Can be used in
                        conjunction with --end_layer.
  --end_layer END_LAYER
                        save all intermediate layer outputs from top layer to
                        provided end layer of model. Can be used in
                        conjunction with --start_layer.
  --add_layer_outputs ADD_LAYER_OUTPUTS
                        Output layers to be dumped. e.g: node1,node2
  --add_layer_types ADD_LAYER_TYPES
                        outputs of layer types to be dumped. e.g
                        :Resize,Transpose. All enabled by default.
  --skip_layer_types SKIP_LAYER_TYPES
                        comma delimited layer types to skip dumping. e.g
                        :Resize,Transpose
  --skip_layer_outputs SKIP_LAYER_OUTPUTS
                        comma delimited layer output names to skip dumping.
                        e.g: node1,node2

Note

The –run_tensor_inspection argument significantly increases overall execution time when used with large models. To speed up execution, omit this argument.

Sample Commands

qairt-accuracy-debugger \
  --snooping oneshot-layerwise \
  --runtime dspv75 \
  --architecture aarch64-android \
  --framework onnx \
  --model_path artifacts/mobilenet-v2.onnx \
  --input_list artifacts/list.txt \
  --input_tensor "input.1" 1,3,224,224 artifacts/inputFiles/dog.raw \
  --output_tensor "473" \
  --default_verifier mse \
  --quantization_overrides artifacts/quantized_encoding.json \
  --executor_type qnn \
  --run_tensor_inspection

Tip

A working_directory is generated from wherever this script is called from unless otherwise specified.

Output

Below is the output directory structure:

working_directory
├── framework_runner
│   ├── 2024-08-07_15-34-08
│   └── latest
├── inputs_32
│   ├── dog.raw
│   └── input_list.txt
├── snooping
│   └── 2024-08-07_15-34-08
└── verification
    ├── 2024-08-07_15-34-23
    └── latest
        ├── base.json
        ├── mse
        ├── tensor_inspection
        ├── summary.csv
        ├── summary.html
        ├── summary.json
        └── verification_options.json
  • framework_runner directory contains a timestamped directory that contains the intermediate layer outputs (framework) stored in .raw format as described in the framework runner step.

  • snooping directory contains a timestamped directory that contains the intermediate layer outputs (inference engine) stored in .raw format as described in the inference engine step.

  • verification directory contains a timestamped directory that contains the following:

    • A directory with same name for each verifier specified while running oneshot; it contains CSV and HTML files with metric details for each layer output

    • tensor_inspection – Individual directories for each layer’s output with the following contents:

      • CDF_plots.html – Golden vs target CDF graph

      • Diff_plots.html – Golden and target deviation graph

      • Histograms.html – Golden and target histograms

      • golden_data.csv – Golden tensor data

      • target_data.csv – Target tensor data

    • summary.csv – Report for verification results of each layers output

Note: All directories will have a folder called latest which is a symlink to the latest run’s corresponding timestamped directory.

Snapshot of summary.csv file:

../_static/resources/oneshot_summary.png

Understanding the oneshot-layerwise summary report:

Column

Description

Name

Output name of the current layer

Layer Type

Type of the current layer

Size

Size of this layer’s output

Tensor_dims

Shape of this layer’s output

<Verifier name>

Verifier value of the current layer output compared to reference output

golden_min

minimum value in the reference output for current layer

golden_max

maximum value in the reference output for current layer

target_min

minimum value in the target output for current layer

target_max

maximum value in the target output for current layer

cumulative-layerwise Snooping

This algorithm is designed to debug one layer at a time by performing below steps:

  1. Execute framework runner to collect reference outputs from all intermediate tensors of a model in fp32 precision

  2. Execute inference engine and verification steps in iterative manner to perform below operations

    - Collect target outputs in target precision for each layer while removing the effect of its preceding layers on final output     - Compare intermediate outputs from framework runner and inference engine

It provides deeper analysis to identify sensitivity of layers of model causing accuracy deviation and can be used to measure quantization sensitivity of each layer/op in the model with regard to the final output of the model.

../_static/resources/cumulative_diagram.png

Note

Currently this algorithm is supported only for ONNX models

Debugging Accuracy issue with Quantized model using Cumulative Layerwise Snooping

  • With quantized models, it is expected to have some mismatch at most data intensive layers - arising due to quantization error.

  • The debugger can be used to identify operators which are most sensitive with high verifier score and run those at higher precision to improve overall accuracy.

  • The sensitivity is determined by the verifier score seen at that layer regarding the reference platform (like ONNXRT).

  • Note that Cumulative-layerwise debugging takes considerable time as the partitioned model shall be quantized and compiled at every layer that does not have a 100% match with reference.

  • Below is one strategy to debug larger models:

    • Run Oneshot-layerwise on the model which helps to identify the starting point of sensitivity in the model.

    • Run Cumulative-layerwise at different parts of the model using start-layer and end-layer options (if the model has 100 nodes, use start layer at starting node from Oneshot-layerwise run and end layer at the 25th node for run 1, start layer at 26th and end layer at 50th node for run 2, start layer at 51st node and end layer at 75th node for run 3 .. and so on).The final reports of all runs help to identify the most sensitive layers in the model. Let’s say node A,B,C have high verifier scores which indicates high sensitivity

      • Run the original model with those specific layers (A/B/C - one at a time or combinations) in FP16 and observe the improvement in accuracy.

Usage

usage: qairt-accuracy-debugger --snooping cumulative-layerwise [-h]
                              --default_verifier DEFAULT_VERIFIER [DEFAULT_VERIFIER ...]
                              [--result_csv RESULT_CSV]
                              [--verifier_threshold VERIFIER_THRESHOLD]
                              [--verifier_config VERIFIER_CONFIG] -r
                              {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic}
                              -a
                              {x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}
                              -l INPUT_LIST [--input_network MODEL_PATH]
                              [--input_tensor INPUT_TENSOR [INPUT_TENSOR ...]]
                              [--out_tensor_node OUTPUT_TENSOR]
                              [--io_config IO_CONFIG]
                              [--converter_float_bitwidth {32,16}]
                              [--extra_converter_args EXTRA_CONVERTER_ARGS]
                              [--calibration_input_list CALIBRATION_INPUT_LIST]
                              [-bbw {8,32}] [-abw {8,16}] [-wbw {8,4}]
                              [--quantizer_float_bitwidth {32,16}]
                              [--act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}]
                              [--param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}]
                              [--act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}]
                              [--param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}]
                              [--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE]
                              [--use_per_channel_quantization]
                              [--use_per_row_quantization] [--float_fallback]
                              [--extra_quantizer_args EXTRA_QUANTIZER_ARGS]
                              [--perf_profile {low_balanced,balanced,default,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}]
                              [--profiling_level PROFILING_LEVEL]
                              [--userlogs {warn,verbose,info,error,fatal}]
                              [--log_level {error,warn,info,debug,verbose}]
                              [--extra_runtime_args EXTRA_RUNTIME_ARGS]
                              [--executor_type {qnn,snpe}]
                              [--stage {source,converted,quantized}]
                              [-p ENGINE_PATH] [--deviceId DEVICEID] [-v]
                              [--host_device {x86,x86_64-windows-msvc,wos}]
                              [-w WORKING_DIR]
                              [--output_dirname OUTPUT_DIRNAME]
                              [--debug_mode_off] [--args_config ARGS_CONFIG]
                              [--remote_server REMOTE_SERVER]
                              [--remote_username REMOTE_USERNAME]
                              [--remote_password REMOTE_PASSWORD]
                              [--golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY]
                              [--disable_offline_prepare]
                              [--backend_extension_config BACKEND_EXTENSION_CONFIG]
                              [--context_config_params CONTEXT_CONFIG_PARAMS]
                              [--graph_config_params GRAPH_CONFIG_PARAMS]
                              [--extra_contextbin_args EXTRA_CONTEXTBIN_ARGS]
                              [--disable_graph_optimization]
                              [--onnx_custom_op_lib ONNX_CUSTOM_OP_LIB]
                              [-f FRAMEWORK [FRAMEWORK ...]]
                              [-qo QUANTIZATION_OVERRIDES]
                              [--step_size STEP_SIZE]
                              [--start_layer START_LAYER]
                              [--end_layer END_LAYER]

Script to run cumulative-layerwise snooping.

options:
  -h, --help            show this help message and exit

Verifier Arguments:
  --default_verifier DEFAULT_VERIFIER [DEFAULT_VERIFIER ...]
                        Default verifier used for verification. The options
                        "RtolAtol", "AdjustedRtolAtol", "TopK", "L1Error",
                        "CosineSimilarity", "MSE", "MAE", "SQNR", "ScaledDiff"
                        are supported. An optional list of hyperparameters can
                        be appended. For example: --default_verifier
                        rtolatol,rtolmargin,0.01,atolmargin,0.01 An optional
                        list of placeholders can be appended. For example:
                        --default_verifier CosineSimilarity param1 1 param2 2.
                        to use multiple verifiers, add additional
                        --default_verifier CosineSimilarity
  --result_csv RESULT_CSV
                        Path to the csv summary report comparing the inference
                        vs frameworkPaths may be absolute, or relative to the
                        working directory.if not specified, then a
                        --problem_inference_tensor must be specified
  --verifier_threshold VERIFIER_THRESHOLD
                        Verifier threshold for problematic tensor to be
                        chosen.
  --verifier_config VERIFIER_CONFIG
                        Path to the verifiers' config file

Required Arguments:
  -r {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic}, --runtime {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic}
                        Runtime to be used. Note: In case of SNPE
                        execution(--executor_type snpe), aic runtime is not
                        supported.
  -a {x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}, --architecture {x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}
                        Name of the architecture to use for inference engine.
                        Note: In case of SNPE execution(--executor_type snpe),
                        aarch64-qnx architecture is not supported.
  -l INPUT_LIST, --input_list INPUT_LIST
                        Path to the input list text file to run inference(used
                        with net-run). Note: When having multiple entries in
                        text file, in order to save memory and time, you can
                        pass --debug_mode_off to skip intermediate outputs
                        dump.

QAIRT Converter Arguments:
  --input_network MODEL_PATH, --model_path MODEL_PATH
                        Path to the model file(s).
  --input_tensor INPUT_TENSOR [INPUT_TENSOR ...]
                        The name and dimension of all the input buffers to the
                        network specified in the format [input_name comma-
                        separated-dimensions sample-data data-type] Note:
                        sample-data and data-type are optional for example:
                        'data' 1,224,224,3. Note that the quotes should always
                        be included in order to handle special characters,
                        spaces, etc. For multiple inputs, specify multiple
                        --input_tensor on the command line like:
                        --input_tensor "data1" 1,224,224,3 sample1.raw float32
                        --input_tensor "data2" 1,50,100,3 sample2.raw int64
                        NOTE: Required for TensorFlow and PyTorch. Optional
                        for Onnx and Tflite. In case of Onnx, this feature
                        works only with Onnx 1.6.0 and above.
  --out_tensor_node OUTPUT_TENSOR, --output_tensor OUTPUT_TENSOR
                        Name of the graph's output Tensor Names. Multiple
                        output names should be provided separately like:
                        --out_tensor_node out_1 --out_tensor_node out_2 NOTE:
                        Required for TensorFlow. Optional for Onnx, Tflite and
                        PyTorch
  --io_config IO_CONFIG
                        Use this option to specify a yaml file for input and
                        output options.
  --converter_float_bitwidth {32,16}
                        Use this option to convert the graph to the specified
                        float bitwidth, either 32 (default) or 16. Note:
                        Cannot be used with --calibration_input_list and
                        --quantization_overrides
  --extra_converter_args EXTRA_CONVERTER_ARGS
                        additional converter arguments in a quoted string.
                        example: --extra_converter_args
                        'arg1=value1;arg2=value2'
  -qo QUANTIZATION_OVERRIDES, --quantization_overrides QUANTIZATION_OVERRIDES
                        Path to quantization overrides json file.

QAIRT Quantizer Arguments:
  --calibration_input_list CALIBRATION_INPUT_LIST
                        Path to the inputs list text file to run
                        quantization(used with qairt-quantizer).
  -bbw {8,32}, --bias_bitwidth {8,32}
                        option to select the bitwidth to use when quantizing
                        the bias. default 8
  -abw {8,16}, --act_bitwidth {8,16}
                        option to select the bitwidth to use when quantizing
                        the activations. default 8
  -wbw {8,4}, --weights_bitwidth {8,4}
                        option to select the bitwidth to use when quantizing
                        the weights. default 8
  --quantizer_float_bitwidth {32,16}
                        Use this option to select the bitwidth to use for
                        float tensors, either 32 (default) or 16.
  --act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}
                        Specify which quantization calibration method to use
                        for activations. This option has to be paired with
                        --act_quantizer_schema.
  --param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}
                        Specify which quantization calibration method to use
                        for parameters. This option has to be paired with
                        --param_quantizer_schema.
  --act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}
                        Specify which quantization schema to use for
                        activations. Note: Default is asymmetric.
  --param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}
                        Specify which quantization schema to use for
                        parameters. Note: Default is asymmetric.
  --percentile_calibration_value PERCENTILE_CALIBRATION_VALUE
                        Value must lie between 90 and 100. Default is 99.99
  --use_per_channel_quantization
                        Use per-channel quantization for convolution-based op
                        weights. Note: This will replace built-in model QAT
                        encodings when used for a given weight.
  --use_per_row_quantization
                        Use this option to enable rowwise quantization of
                        Matmul and FullyConnected ops.
  --float_fallback      Use this option to enable fallback to floating point
                        (FP) instead of fixed point. This option can be paired
                        with --quantizer_float_bitwidth to indicate the
                        bitwidth for FP (by default 32). If this option is
                        enabled, then input list must not be provided and
                        --ignore_encodings must not be provided. The external
                        quantization encodings (encoding file/FakeQuant
                        encodings) might be missing quantization parameters
                        for some interim tensors. First it will try to fill
                        the gaps by propagating across math-invariant
                        functions. If the quantization params are still
                        missing, then it will apply fallback to nodes to
                        floating point.
  --extra_quantizer_args EXTRA_QUANTIZER_ARGS
                        additional quantizer arguments in a quoted string.
                        example: --extra_quantizer_args
                        'arg1=value1;arg2=value2'

Net-run Arguments:
  --perf_profile {low_balanced,balanced,default,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}
                        Specifies perf profile to set. Valid settings are
                        "low_balanced" , "balanced" , "default",
                        "high_performance" ,"sustained_high_performance",
                        "burst", "low_power_saver", "power_saver",
                        "high_power_saver", "extreme_power_saver", and
                        "system_settings". Note: perf_profile argument is now
                        deprecated for HTP backend, user can specify
                        performance profile through backend extension config
                        now.
  --profiling_level PROFILING_LEVEL
                        Enables profiling and sets its level. For QNN
                        executor, valid settings are "basic", "detailed" and
                        "client" For SNPE executor, valid settings are "off",
                        "basic", "moderate", "detailed", and "linting".
                        Default is detailed.
  --userlogs {warn,verbose,info,error,fatal}
                        Enable verbose logging. Note: This argument is
                        applicable only when --executor_type snpe
  --log_level {error,warn,info,debug,verbose}
                        Enable verbose logging. Note: This argument is
                        applicable only when --executor_type qnn
  --extra_runtime_args EXTRA_RUNTIME_ARGS
                        additional net runner arguments in a quoted string.
                        example: --extra_runtime_args
                        'arg1=value1;arg2=value2'

Other optional Arguments:
  --executor_type {qnn,snpe}
                        Choose between qnn(qnn-net-run) and snpe(snpe-net-run)
                        execution. If not provided, qnn-net-run will be
                        executed for QAIRT or QNN SDK, or else snpe-net-run
                        will be executed for SNPE SDK.
  --stage {source,converted,quantized}
                        Specifies the starting stage in the Accuracy Debugger
                        pipeline. source: starting with a source framework
                        model, converted: starting with a converted model,
                        quantized: starting with a quantized model. Default is
                        source.
  -p ENGINE_PATH, --engine_path ENGINE_PATH
                        Path to SDK folder.
  --deviceId DEVICEID   The serial number of the device to use. If not passed,
                        the first in a list of queried devices will be used
                        for validation.
  -v, --verbose         Set verbose logging at debugger tool level
  --host_device {x86,x86_64-windows-msvc,wos}
                        The device that will be running conversion. Set to x86
                        by default.
  -w WORKING_DIR, --working_dir WORKING_DIR
                        Working directory for the snooping to store temporary
                        files. Creates a new directory if the specified
                        working directory does not exist
  --output_dirname OUTPUT_DIRNAME
                        output directory name for the snooping to store
                        temporary files under <working_dir>/snooping .Creates
                        a new directory if the specified working directory
                        does not exist
  --debug_mode_off      This option can be used to avoid dumping intermediate
                        outputs.
  --args_config ARGS_CONFIG
                        Path to a config file with arguments. This can be used
                        to feed arguments to the AccuracyDebugger as an
                        alternative to supplying them on the command line.
  --remote_server REMOTE_SERVER
                        ip address of remote machine
  --remote_username REMOTE_USERNAME
                        username of remote machine
  --remote_password REMOTE_PASSWORD
                        password of remote machine
  --golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY, --golden_dir_for_mapping GOLDEN_OUTPUT_REFERENCE_DIRECTORY
                        Optional parameter to indicate the directory of the
                        goldens, it's used for tensor mapping without running
                        model with framework runtime.
  --disable_offline_prepare
                        Use this option to disable offline preparation. Note:
                        By default offline preparation will be done for
                        DSP/HTP runtimes.
  --backend_extension_config BACKEND_EXTENSION_CONFIG
                        Path to config to be used with qnn-context-binary-
                        generator. Note: This argument is applicable only when
                        --executor_type qnn
  --context_config_params CONTEXT_CONFIG_PARAMS
                        optional context config params in a quoted string.
                        example: --context_config_params
                        'context_priority=high;
                        cache_compatibility_mode=strict' Note: This argument
                        is applicable only when --executor_type qnn
  --graph_config_params GRAPH_CONFIG_PARAMS
                        optional graph config params in a quoted string.
                        example: --graph_config_params 'graph_priority=low;
                        graph_profiling_num_executions=10'
  --extra_contextbin_args EXTRA_CONTEXTBIN_ARGS
                        Additional context binary generator arguments in a
                        quoted string(applicable only when --executor_type
                        qnn). example: --extra_contextbin_args
                        'arg1=value1;arg2=value2'
  --disable_graph_optimization
                        Disables basic model optimization
  --onnx_custom_op_lib ONNX_CUSTOM_OP_LIB
                        path to onnx custom operator library
  -f FRAMEWORK [FRAMEWORK ...], --framework FRAMEWORK [FRAMEWORK ...]
                        Framework type and version, version is optional.
                        Currently supported frameworks are [tensorflow,
                        tflite, onnx, pytorch]. For example, tensorflow 2.10.1
  --step_size STEP_SIZE
                        number of layers to skip in each iteration of
                        debugging. Applicable only for cumulative-layerwise
                        algorithm. --step_size (> 1) should not be used along
                        with --add_layer_outputs, --add_layer_types,
                        --skip_layer_outputs, --skip_layer_types,
                        --start_layer, --end_layer
  --start_layer START_LAYER
                        save all intermediate layer outputs from provided
                        start layer to bottom layer of model. Can be used in
                        conjunction with --end_layer.
  --end_layer END_LAYER
                        save all intermediate layer outputs from top layer to
                        provided end layer of model. Can be used in
                        conjunction with --start_layer.

Sample Commands

qairt-accuracy-debugger \
  --snooping cumulative-layerwise \
  --runtime dspv75 \
  --architecture aarch64-android \
  --framework onnx \
  --model_path artifacts/mobilenet-v2.onnx \
  --input_list artifacts/list.txt \
  --input_tensor "input.1" 1,3,224,224 artifacts/inputFiles/dog.raw \
  --output_tensor "473" \
  --default_verifier mse \
  --quantization_overrides artifacts/quantized_encoding.json \
  --executor_type qnn

Output

Below is the output directory structure:

working_directory
├── framework_runner
│   ├── 2024-08-07_16-23-50
│   └── latest
├── inputs_32
│   ├── dog.raw
│   └── input_list.txt
└── snooping
    └── 2024-08-07_16-23-49
        ├── base_quantized.json
        ├── cumulative_layerwise.csv
        ├── extracted_model.onnx
        ├── inference_engine
        ├── log.txt
        ├── snooping_options.json
        ├── temp-list.txt
        └── transformed.onnx
  • framework_runner directory contains timestamped directory that contains the intermediate layers outputs stored in .raw format just as mentioned in Framework Runner step.

  • snooping directory contains intemediate outputs obtained from inference engine step stored in separate directories with respective layer names. Also it contains final report named cumulative_layerwise.csv which contains verifier scores for each layer. User can identify layers with most deviating scores as problematic nodes.

Snapshot of cumulative_layerwise.csv:

../_static/resources/cumulative_layerwise_report.png

Understanding the cumulative-layerwise report:

Column

Description

O/P Name

Output name of the current layer.

Status

If empty, indicates normal execution.Other possible values:
  • skip - This layer was not debugged as requested by the user.

  • part - Due to the mismatch at this layer, the model was partitioned after this layer

  • err_part - error occured while partitioning model at that layer.

  • err_con - coverter error occurred at this layer.

  • err_lib - lib-generator error occurred at this layer.

  • err_cntx - context-bin-generator error occurred at this layer.

  • err-exec - Failed to execute the compiled model at this layer.

  • err-compare - Failed to compare the backend output of this layer with reference.

Layer Type

Type of the current layer.

Shape

Shape of this layer’s output.

Activations

The Min, Max and Median of the outputs at this layer taken from reference execution.

<Verifier name>

Absolute verifier value of the current layer compared to reference platform.

Orig outputs

Displays the original outputs verifier score observed when the model was run with the current
layer output enabled starting from the last partitioned layer.

Info

Displays information for the output verifiers, if the values are abnormal.

layerwise Snooping

This algorithm is designed to debug a single layer model at a time by performing the following steps:

  1. Get golden reference per layer outputs from an external tool or, if a golden reference is not given, run framework runner to collect reference outputs from all intermediate tensors of a model in fp32 precision

  2. Iteratively execute inference engine and verification to: - Collect target outputs in target precision for each single layer model by removing all of the preceding and subsequent layers - Compare intermediate output from golden reference with inference engine single layer partitioned model output

Layer-wise snooping provides deeper analysis to identify all model layers causing accuracy deviation on hardware with respect to framework/simulation outputs. This algorithm can be used to identify kernel issues for layers/ops present in the model.

../_static/resources/layerwise_diagram.png

Note

Currently this algorithm is supported only for ONNX models

Debugging Accuracy issue for models exhibiting Accuracy discrepancy between golden reference (for ex. - AIMET/framework runtime output) vs target output using Layerwise Snooping

  • One of the popular usecase for layerwise snooping is debugging accuracy difference between AIMET vs target
    • Though we are creating an exact simulation of hardware using tools like AIMET, still it is expected to have a very minute mismatch due to environment differences. This can be because simulation executes on GPU FP32 kernels and is simulating noise rather than actual execution on integer kernels in the case of hardware execution.

    • If we have a higher deviation between simulation and hardware, then layerwise snooping could be used to point out to the nodes having higher deviations. The nodes showing higher deviation as per layerwise.csv can be identified as the erroneous nodes.

  • Other usecases include debugging Framework runtime’s FP32 output vs target INT16 output deviations.

Usage

usage: qairt-accuracy-debugger --snooping layerwise [-h]
                              --default_verifier DEFAULT_VERIFIER [DEFAULT_VERIFIER ...]
                              [--result_csv RESULT_CSV]
                              [--verifier_threshold VERIFIER_THRESHOLD]
                              [--verifier_config VERIFIER_CONFIG] -r
                              {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic}
                              -a
                              {x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}
                              -l INPUT_LIST [--input_network MODEL_PATH]
                              [--input_tensor INPUT_TENSOR [INPUT_TENSOR ...]]
                              [--out_tensor_node OUTPUT_TENSOR]
                              [--io_config IO_CONFIG]
                              [--converter_float_bitwidth {32,16}]
                              [--extra_converter_args EXTRA_CONVERTER_ARGS]
                              [--calibration_input_list CALIBRATION_INPUT_LIST]
                              [-bbw {8,32}] [-abw {8,16}] [-wbw {8,4}]
                              [--quantizer_float_bitwidth {32,16}]
                              [--act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}]
                              [--param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}]
                              [--act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}]
                              [--param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}]
                              [--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE]
                              [--use_per_channel_quantization]
                              [--use_per_row_quantization] [--float_fallback]
                              [--extra_quantizer_args EXTRA_QUANTIZER_ARGS]
                              [--perf_profile {low_balanced,balanced,default,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}]
                              [--profiling_level PROFILING_LEVEL]
                              [--userlogs {warn,verbose,info,error,fatal}]
                              [--log_level {error,warn,info,debug,verbose}]
                              [--extra_runtime_args EXTRA_RUNTIME_ARGS]
                              [--executor_type {qnn,snpe}]
                              [--stage {source,converted,quantized}]
                              [-p ENGINE_PATH] [--deviceId DEVICEID] [-v]
                              [--host_device {x86,x86_64-windows-msvc,wos}]
                              [-w WORKING_DIR]
                              [--output_dirname OUTPUT_DIRNAME]
                              [--debug_mode_off] [--args_config ARGS_CONFIG]
                              [--remote_server REMOTE_SERVER]
                              [--remote_username REMOTE_USERNAME]
                              [--remote_password REMOTE_PASSWORD]
                              [--golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY]
                              [--disable_offline_prepare]
                              [--backend_extension_config BACKEND_EXTENSION_CONFIG]
                              [--context_config_params CONTEXT_CONFIG_PARAMS]
                              [--graph_config_params GRAPH_CONFIG_PARAMS]
                              [--extra_contextbin_args EXTRA_CONTEXTBIN_ARGS]
                              [--disable_graph_optimization]
                              [--onnx_custom_op_lib ONNX_CUSTOM_OP_LIB]
                              [-f FRAMEWORK [FRAMEWORK ...]]
                              [-qo QUANTIZATION_OVERRIDES]
                              [--start_layer START_LAYER]
                              [--end_layer END_LAYER]

Script to run layerwise snooping.

options:
  -h, --help            show this help message and exit

Verifier Arguments:
  --default_verifier DEFAULT_VERIFIER [DEFAULT_VERIFIER ...]
                        Default verifier used for verification. The options
                        "RtolAtol", "AdjustedRtolAtol", "TopK", "L1Error",
                        "CosineSimilarity", "MSE", "MAE", "SQNR", "ScaledDiff"
                        are supported. An optional list of hyperparameters can
                        be appended. For example: --default_verifier
                        rtolatol,rtolmargin,0.01,atolmargin,0.01 An optional
                        list of placeholders can be appended. For example:
                        --default_verifier CosineSimilarity param1 1 param2 2.
                        to use multiple verifiers, add additional
                        --default_verifier CosineSimilarity
  --result_csv RESULT_CSV
                        Path to the csv summary report comparing the inference
                        vs frameworkPaths may be absolute, or relative to the
                        working directory.if not specified, then a
                        --problem_inference_tensor must be specified
  --verifier_threshold VERIFIER_THRESHOLD
                        Verifier threshold for problematic tensor to be
                        chosen.
  --verifier_config VERIFIER_CONFIG
                        Path to the verifiers' config file

Required Arguments:
  -r {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic}, --runtime {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic}
                        Runtime to be used. Note: In case of SNPE
                        execution(--executor_type snpe), aic runtime is not
                        supported.
  -a {x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}, --architecture {x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}
                        Name of the architecture to use for inference engine.
                        Note: In case of SNPE execution(--executor_type snpe),
                        aarch64-qnx architecture is not supported.
  -l INPUT_LIST, --input_list INPUT_LIST
                        Path to the input list text file to run inference(used
                        with net-run). Note: When having multiple entries in
                        text file, in order to save memory and time, you can
                        pass --debug_mode_off to skip intermediate outputs
                        dump.

QAIRT Converter Arguments:
  --input_network MODEL_PATH, --model_path MODEL_PATH
                        Path to the model file(s).
  --input_tensor INPUT_TENSOR [INPUT_TENSOR ...]
                        The name and dimension of all the input buffers to the
                        network specified in the format [input_name comma-
                        separated-dimensions sample-data data-type] Note:
                        sample-data and data-type are optional for example:
                        'data' 1,224,224,3. Note that the quotes should always
                        be included in order to handle special characters,
                        spaces, etc. For multiple inputs, specify multiple
                        --input_tensor on the command line like:
                        --input_tensor "data1" 1,224,224,3 sample1.raw float32
                        --input_tensor "data2" 1,50,100,3 sample2.raw int64
                        NOTE: Required for TensorFlow and PyTorch. Optional
                        for Onnx and Tflite. In case of Onnx, this feature
                        works only with Onnx 1.6.0 and above.
  --out_tensor_node OUTPUT_TENSOR, --output_tensor OUTPUT_TENSOR
                        Name of the graph's output Tensor Names. Multiple
                        output names should be provided separately like:
                        --out_tensor_node out_1 --out_tensor_node out_2 NOTE:
                        Required for TensorFlow. Optional for Onnx, Tflite and
                        PyTorch
  --io_config IO_CONFIG
                        Use this option to specify a yaml file for input and
                        output options.
  --converter_float_bitwidth {32,16}
                        Use this option to convert the graph to the specified
                        float bitwidth, either 32 (default) or 16. Note:
                        Cannot be used with --calibration_input_list and
                        --quantization_overrides
  --extra_converter_args EXTRA_CONVERTER_ARGS
                        additional converter arguments in a quoted string.
                        example: --extra_converter_args
                        'arg1=value1;arg2=value2'
  -qo QUANTIZATION_OVERRIDES, --quantization_overrides QUANTIZATION_OVERRIDES
                        Path to quantization overrides json file.

QAIRT Quantizer Arguments:
  --calibration_input_list CALIBRATION_INPUT_LIST
                        Path to the inputs list text file to run
                        quantization(used with qairt-quantizer).
  -bbw {8,32}, --bias_bitwidth {8,32}
                        option to select the bitwidth to use when quantizing
                        the bias. default 8
  -abw {8,16}, --act_bitwidth {8,16}
                        option to select the bitwidth to use when quantizing
                        the activations. default 8
  -wbw {8,4}, --weights_bitwidth {8,4}
                        option to select the bitwidth to use when quantizing
                        the weights. default 8
  --quantizer_float_bitwidth {32,16}
                        Use this option to select the bitwidth to use for
                        float tensors, either 32 (default) or 16.
  --act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}
                        Specify which quantization calibration method to use
                        for activations. This option has to be paired with
                        --act_quantizer_schema.
  --param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}
                        Specify which quantization calibration method to use
                        for parameters. This option has to be paired with
                        --param_quantizer_schema.
  --act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}
                        Specify which quantization schema to use for
                        activations. Note: Default is asymmetric.
  --param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}
                        Specify which quantization schema to use for
                        parameters. Note: Default is asymmetric.
  --percentile_calibration_value PERCENTILE_CALIBRATION_VALUE
                        Value must lie between 90 and 100. Default is 99.99
  --use_per_channel_quantization
                        Use per-channel quantization for convolution-based op
                        weights. Note: This will replace built-in model QAT
                        encodings when used for a given weight.
  --use_per_row_quantization
                        Use this option to enable rowwise quantization of
                        Matmul and FullyConnected ops.
  --float_fallback      Use this option to enable fallback to floating point
                        (FP) instead of fixed point. This option can be paired
                        with --quantizer_float_bitwidth to indicate the
                        bitwidth for FP (by default 32). If this option is
                        enabled, then input list must not be provided and
                        --ignore_encodings must not be provided. The external
                        quantization encodings (encoding file/FakeQuant
                        encodings) might be missing quantization parameters
                        for some interim tensors. First it will try to fill
                        the gaps by propagating across math-invariant
                        functions. If the quantization params are still
                        missing, then it will apply fallback to nodes to
                        floating point.
  --extra_quantizer_args EXTRA_QUANTIZER_ARGS
                        additional quantizer arguments in a quoted string.
                        example: --extra_quantizer_args
                        'arg1=value1;arg2=value2'

Net-run Arguments:
  --perf_profile {low_balanced,balanced,default,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}
                        Specifies perf profile to set. Valid settings are
                        "low_balanced" , "balanced" , "default",
                        "high_performance" ,"sustained_high_performance",
                        "burst", "low_power_saver", "power_saver",
                        "high_power_saver", "extreme_power_saver", and
                        "system_settings". Note: perf_profile argument is now
                        deprecated for HTP backend, user can specify
                        performance profile through backend extension config
                        now.
  --profiling_level PROFILING_LEVEL
                        Enables profiling and sets its level. For QNN
                        executor, valid settings are "basic", "detailed" and
                        "client" For SNPE executor, valid settings are "off",
                        "basic", "moderate", "detailed", and "linting".
                        Default is detailed.
  --userlogs {warn,verbose,info,error,fatal}
                        Enable verbose logging. Note: This argument is
                        applicable only when --executor_type snpe
  --log_level {error,warn,info,debug,verbose}
                        Enable verbose logging. Note: This argument is
                        applicable only when --executor_type qnn
  --extra_runtime_args EXTRA_RUNTIME_ARGS
                        additional net runner arguments in a quoted string.
                        example: --extra_runtime_args
                        'arg1=value1;arg2=value2'

Other optional Arguments:
  --executor_type {qnn,snpe}
                        Choose between qnn(qnn-net-run) and snpe(snpe-net-run)
                        execution. If not provided, qnn-net-run will be
                        executed for QAIRT or QNN SDK, or else snpe-net-run
                        will be executed for SNPE SDK.
  --stage {source,converted,quantized}
                        Specifies the starting stage in the Accuracy Debugger
                        pipeline. source: starting with a source framework
                        model, converted: starting with a converted model,
                        quantized: starting with a quantized model. Default is
                        source.
  -p ENGINE_PATH, --engine_path ENGINE_PATH
                        Path to SDK folder.
  --deviceId DEVICEID   The serial number of the device to use. If not passed,
                        the first in a list of queried devices will be used
                        for validation.
  -v, --verbose         Set verbose logging at debugger tool level
  --host_device {x86,x86_64-windows-msvc,wos}
                        The device that will be running conversion. Set to x86
                        by default.
  -w WORKING_DIR, --working_dir WORKING_DIR
                        Working directory for the snooping to store temporary
                        files. Creates a new directory if the specified
                        working directory does not exist
  --output_dirname OUTPUT_DIRNAME
                        output directory name for the snooping to store
                        temporary files under <working_dir>/snooping .Creates
                        a new directory if the specified working directory
                        does not exist
  --debug_mode_off      This option can be used to avoid dumping intermediate
                        outputs.
  --args_config ARGS_CONFIG
                        Path to a config file with arguments. This can be used
                        to feed arguments to the AccuracyDebugger as an
                        alternative to supplying them on the command line.
  --remote_server REMOTE_SERVER
                        ip address of remote machine
  --remote_username REMOTE_USERNAME
                        username of remote machine
  --remote_password REMOTE_PASSWORD
                        password of remote machine
  --golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY, --golden_dir_for_mapping GOLDEN_OUTPUT_REFERENCE_DIRECTORY
                        Optional parameter to indicate the directory of the
                        goldens, it's used for tensor mapping without running
                        model with framework runtime.
  --disable_offline_prepare
                        Use this option to disable offline preparation. Note:
                        By default offline preparation will be done for
                        DSP/HTP runtimes.
  --backend_extension_config BACKEND_EXTENSION_CONFIG
                        Path to config to be used with qnn-context-binary-
                        generator. Note: This argument is applicable only when
                        --executor_type qnn
  --context_config_params CONTEXT_CONFIG_PARAMS
                        optional context config params in a quoted string.
                        example: --context_config_params
                        'context_priority=high;
                        cache_compatibility_mode=strict' Note: This argument
                        is applicable only when --executor_type qnn
  --graph_config_params GRAPH_CONFIG_PARAMS
                        optional graph config params in a quoted string.
                        example: --graph_config_params 'graph_priority=low;
                        graph_profiling_num_executions=10'
  --extra_contextbin_args EXTRA_CONTEXTBIN_ARGS
                        Additional context binary generator arguments in a
                        quoted string(applicable only when --executor_type
                        qnn). example: --extra_contextbin_args
                        'arg1=value1;arg2=value2'
  --disable_graph_optimization
                        Disables basic model optimization
  --onnx_custom_op_lib ONNX_CUSTOM_OP_LIB
                        path to onnx custom operator library
  -f FRAMEWORK [FRAMEWORK ...], --framework FRAMEWORK [FRAMEWORK ...]
                        Framework type and version, version is optional.
                        Currently supported frameworks are [tensorflow,
                        tflite, onnx, pytorch]. For example, tensorflow 2.10.1
  --start_layer START_LAYER
                        save all intermediate layer outputs from provided
                        start layer to bottom layer of model. Can be used in
                        conjunction with --end_layer.
  --end_layer END_LAYER
                        save all intermediate layer outputs from top layer to
                        provided end layer of model. Can be used in
                        conjunction with --start_layer.

Sample Commands

qairt-accuracy-debugger \
    --snooping layerwise \
    --runtime dspv75 \
    --architecture aarch64-android \
    --framework onnx \
    --model_path artifacts/mobilenet-v2.onnx \
    --input_list artifacts/list.txt \
    --input_tensor "input.1" 1,3,224,224 /local/mnt/workspace/gautr/notebooks/artifacts/inputFiles/dog.raw \
    --output_tensor "473" \
    --default_verifier mse \
    --quantization_overrides artifacts/quantized_encoding.json \
    --executor_type qnn

Output

Below is the output directory structure:

working_directory
├── framework_runner
│   ├── 2024-08-07_15-58-09
│   └── latest
├── inputs_32
│   ├── dog.raw
│   └── input_list.txt
└── snooping
    └── 2024-08-07_15-58-09
        ├── base_quantized.json
        ├── extracted_model.onnx
        ├── inference_engine
        ├── layerwise.csv
        ├── log.txt
        ├── snooping_options.json
        └── temp-list.txt
  • framework_runner directory contains timestamped directory that contains the intermediate layers outputs stored in .raw format just as mentioned in Framework Runner step.

  • snooping contains each single layer model outputs obtained from the inference engine stage stored in separate directories and the final report named layerwise.csv which contains verifier scores for each layer model. Users can identify layers with the most deviating scores as problematic nodes.

  • layerwise.csv is similar to the cumulative-layerwise report (cumulative_layerwise.csv), except that original outputs column will not be present in layerwise snooping. Please refer to cumulative-layerwise report for more details.

Snapshot of layerwise.csv:

../_static/resources/layerwise_report.png

Understanding the layerwise report:

Column

Description

O/P Name

Output name of the current layer.

Status

If empty, indicates normal execution.Other possible values:
  • skip - This layer was not debugged as requested by the user.

  • part - Due to the mismatch at this layer, the model was partitioned after this layer

  • err_part - error occured while partitioning model at that layer.

  • err_con - coverter error occurred at this layer.

  • err_lib - lib-generator error occurred at this layer.

  • err_cntx - context-bin-generator error occurred at this layer.

  • err-exec - Failed to execute the compiled model at this layer.

  • err-compare - Failed to compare the backend output of this layer with reference.

Layer Type

Type of the current layer.

Shape

Shape of this layer’s output.

Activations

The Min, Max and Median of the outputs at this layer taken from reference execution.

<Verifier name>

Absolute verifier value of the current layer compared to reference platform.

Info

Displays information for the output verifiers, if the values are abnormal.

binary Snooping

The binary snooping tool debugs the given ONNX graph in a binary search fashion.

For the graph under analysis, it quantizes half of the graph and lets the other half run in fp16/32. The final model output is used to calculate the subgraph quantization effect. If the subgraph has a high effect(verifier scores greater than 60% of sum of the two subgraphs scores) on the final model output due to quantization, the process repeats until the subgraph size is less than the min_graph_size or the subgraph cannot be divided again. If both subgraphs have similar scores(verifier scores greater than 40% of sum of the two subgraphs scores), both subgraphs are investigated further.

Usage

usage: qairt-accuracy-debugger --snooping binary [-h]
                              -r {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic}
                              -a
                              {x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}
                              -l INPUT_LIST [--input_network MODEL_PATH]
                              [--input_tensor INPUT_TENSOR [INPUT_TENSOR ...]]
                              [--out_tensor_node OUTPUT_TENSOR]
                              [--io_config IO_CONFIG]
                              [--converter_float_bitwidth {32,16}]
                              [--extra_converter_args EXTRA_CONVERTER_ARGS]
                              [--calibration_input_list CALIBRATION_INPUT_LIST]
                              [-bbw {8,32}] [-abw {8,16}] [-wbw {8,4}]
                              [--quantizer_float_bitwidth {32,16}]
                              [--act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}]
                              [--param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}]
                              [--act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}]
                              [--param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}]
                              [--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE]
                              [--use_per_channel_quantization]
                              [--use_per_row_quantization] [--float_fallback]
                              [--extra_quantizer_args EXTRA_QUANTIZER_ARGS]
                              [--perf_profile {low_balanced,balanced,default,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}]
                              [--profiling_level PROFILING_LEVEL]
                              [--userlogs {warn,verbose,info,error,fatal}]
                              [--log_level {error,warn,info,debug,verbose}]
                              [--extra_runtime_args EXTRA_RUNTIME_ARGS]
                              [--executor_type {qnn,snpe}]
                              [--stage {source,converted,quantized}]
                              [-p ENGINE_PATH] [--deviceId DEVICEID] [-v]
                              [--host_device {x86,x86_64-windows-msvc,wos}]
                              [-w WORKING_DIR]
                              [--output_dirname OUTPUT_DIRNAME]
                              [--debug_mode_off] [--args_config ARGS_CONFIG]
                              [--remote_server REMOTE_SERVER]
                              [--remote_username REMOTE_USERNAME]
                              [--remote_password REMOTE_PASSWORD]
                              [--golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY]
                              [--disable_offline_prepare]
                              [--backend_extension_config BACKEND_EXTENSION_CONFIG]
                              [--context_config_params CONTEXT_CONFIG_PARAMS]
                              [--graph_config_params GRAPH_CONFIG_PARAMS]
                              [--extra_contextbin_args EXTRA_CONTEXTBIN_ARGS]
                              [--disable_graph_optimization]
                              [--onnx_custom_op_lib ONNX_CUSTOM_OP_LIB]
                              [-f FRAMEWORK [FRAMEWORK ...]] -qo
                              QUANTIZATION_OVERRIDES
                              [--min_graph_size MIN_GRAPH_SIZE]
                              [--subgraph_relative_weight SUBGRAPH_RELATIVE_WEIGHT]
                              [--verifier VERIFIER]

Script to run binary snooping.

options:
  -h, --help            show this help message and exit

Required Arguments:
  -r {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic}, --runtime {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic}
                        Runtime to be used. Note: In case of SNPE
                        execution(--executor_type snpe), aic runtime is not
                        supported.
  -a {x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}, --architecture {x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}
                        Name of the architecture to use for inference engine.
                        Note: In case of SNPE execution(--executor_type snpe),
                        aarch64-qnx architecture is not supported.
  -l INPUT_LIST, --input_list INPUT_LIST
                        Path to the input list text file to run inference(used
                        with net-run). Note: When having multiple entries in
                        text file, in order to save memory and time, you can
                        pass --debug_mode_off to skip intermediate outputs
                        dump.
  -qo QUANTIZATION_OVERRIDES, --quantization_overrides QUANTIZATION_OVERRIDES
                        Path to quantization overrides json file. Note: This
                        is used with converter as well.

QAIRT Converter Arguments:
  --input_network MODEL_PATH, --model_path MODEL_PATH
                        Path to the model file(s).
  --input_tensor INPUT_TENSOR [INPUT_TENSOR ...]
                        The name and dimension of all the input buffers to the
                        network specified in the format [input_name comma-
                        separated-dimensions sample-data data-type] Note:
                        sample-data and data-type are optional for example:
                        'data' 1,224,224,3. Note that the quotes should always
                        be included in order to handle special characters,
                        spaces, etc. For multiple inputs, specify multiple
                        --input_tensor on the command line like:
                        --input_tensor "data1" 1,224,224,3 sample1.raw float32
                        --input_tensor "data2" 1,50,100,3 sample2.raw int64
                        NOTE: Required for TensorFlow and PyTorch. Optional
                        for Onnx and Tflite. In case of Onnx, this feature
                        works only with Onnx 1.6.0 and above.
  --out_tensor_node OUTPUT_TENSOR, --output_tensor OUTPUT_TENSOR
                        Name of the graph's output Tensor Names. Multiple
                        output names should be provided separately like:
                        --out_tensor_node out_1 --out_tensor_node out_2 NOTE:
                        Required for TensorFlow. Optional for Onnx, Tflite and
                        PyTorch
  --io_config IO_CONFIG
                        Use this option to specify a yaml file for input and
                        output options.
  --converter_float_bitwidth {32,16}
                        Use this option to convert the graph to the specified
                        float bitwidth, either 32 (default) or 16. Note:
                        Cannot be used with --calibration_input_list and
                        --quantization_overrides
  --extra_converter_args EXTRA_CONVERTER_ARGS
                        additional converter arguments in a quoted string.
                        example: --extra_converter_args
                        'arg1=value1;arg2=value2'

QAIRT Quantizer Arguments:
  --calibration_input_list CALIBRATION_INPUT_LIST
                        Path to the inputs list text file to run
                        quantization(used with qairt-quantizer).
  -bbw {8,32}, --bias_bitwidth {8,32}
                        option to select the bitwidth to use when quantizing
                        the bias. default 8
  -abw {8,16}, --act_bitwidth {8,16}
                        option to select the bitwidth to use when quantizing
                        the activations. default 8
  -wbw {8,4}, --weights_bitwidth {8,4}
                        option to select the bitwidth to use when quantizing
                        the weights. default 8
  --quantizer_float_bitwidth {32,16}
                        Use this option to select the bitwidth to use for
                        float tensors, either 32 (default) or 16.
  --act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}
                        Specify which quantization calibration method to use
                        for activations. This option has to be paired with
                        --act_quantizer_schema.
  --param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}
                        Specify which quantization calibration method to use
                        for parameters. This option has to be paired with
                        --param_quantizer_schema.
  --act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}
                        Specify which quantization schema to use for
                        activations. Note: Default is asymmetric.
  --param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}
                        Specify which quantization schema to use for
                        parameters. Note: Default is asymmetric.
  --percentile_calibration_value PERCENTILE_CALIBRATION_VALUE
                        Value must lie between 90 and 100. Default is 99.99
  --use_per_channel_quantization
                        Use per-channel quantization for convolution-based op
                        weights. Note: This will replace built-in model QAT
                        encodings when used for a given weight.
  --use_per_row_quantization
                        Use this option to enable rowwise quantization of
                        Matmul and FullyConnected ops.
  --float_fallback      Use this option to enable fallback to floating point
                        (FP) instead of fixed point. This option can be paired
                        with --quantizer_float_bitwidth to indicate the
                        bitwidth for FP (by default 32). If this option is
                        enabled, then input list must not be provided and
                        --ignore_encodings must not be provided. The external
                        quantization encodings (encoding file/FakeQuant
                        encodings) might be missing quantization parameters
                        for some interim tensors. First it will try to fill
                        the gaps by propagating across math-invariant
                        functions. If the quantization params are still
                        missing, then it will apply fallback to nodes to
                        floating point.
  --extra_quantizer_args EXTRA_QUANTIZER_ARGS
                        additional quantizer arguments in a quoted string.
                        example: --extra_quantizer_args
                        'arg1=value1;arg2=value2'

Net-run Arguments:
  --perf_profile {low_balanced,balanced,default,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}
                        Specifies perf profile to set. Valid settings are
                        "low_balanced" , "balanced" , "default",
                        "high_performance" ,"sustained_high_performance",
                        "burst", "low_power_saver", "power_saver",
                        "high_power_saver", "extreme_power_saver", and
                        "system_settings". Note: perf_profile argument is now
                        deprecated for HTP backend, user can specify
                        performance profile through backend extension config
                        now.
  --profiling_level PROFILING_LEVEL
                        Enables profiling and sets its level. For QNN
                        executor, valid settings are "basic", "detailed" and
                        "client" For SNPE executor, valid settings are "off",
                        "basic", "moderate", "detailed", and "linting".
                        Default is detailed.
  --userlogs {warn,verbose,info,error,fatal}
                        Enable verbose logging. Note: This argument is
                        applicable only when --executor_type snpe
  --log_level {error,warn,info,debug,verbose}
                        Enable verbose logging. Note: This argument is
                        applicable only when --executor_type qnn
  --extra_runtime_args EXTRA_RUNTIME_ARGS
                        additional net runner arguments in a quoted string.
                        example: --extra_runtime_args
                        'arg1=value1;arg2=value2'

Other optional Arguments:
  --executor_type {qnn,snpe}
                        Choose between qnn(qnn-net-run) and snpe(snpe-net-run)
                        execution. If not provided, qnn-net-run will be
                        executed for QAIRT or QNN SDK, or else snpe-net-run
                        will be executed for SNPE SDK.
  --stage {source,converted,quantized}
                        Specifies the starting stage in the Accuracy Debugger
                        pipeline. source: starting with a source framework
                        model, converted: starting with a converted model,
                        quantized: starting with a quantized model. Default is
                        source.
  -p ENGINE_PATH, --engine_path ENGINE_PATH
                        Path to SDK folder.
  --deviceId DEVICEID   The serial number of the device to use. If not passed,
                        the first in a list of queried devices will be used
                        for validation.
  -v, --verbose         Set verbose logging at debugger tool level
  --host_device {x86,x86_64-windows-msvc,wos}
                        The device that will be running conversion. Set to x86
                        by default.
  -w WORKING_DIR, --working_dir WORKING_DIR
                        Working directory for the snooping to store temporary
                        files. Creates a new directory if the specified
                        working directory does not exist
  --output_dirname OUTPUT_DIRNAME
                        output directory name for the snooping to store
                        temporary files under <working_dir>/snooping .Creates
                        a new directory if the specified working directory
                        does not exist
  --debug_mode_off      This option can be used to avoid dumping intermediate
                        outputs.
  --args_config ARGS_CONFIG
                        Path to a config file with arguments. This can be used
                        to feed arguments to the AccuracyDebugger as an
                        alternative to supplying them on the command line.
  --remote_server REMOTE_SERVER
                        ip address of remote machine
  --remote_username REMOTE_USERNAME
                        username of remote machine
  --remote_password REMOTE_PASSWORD
                        password of remote machine
  --golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY, --golden_dir_for_mapping GOLDEN_OUTPUT_REFERENCE_DIRECTORY
                        Optional parameter to indicate the directory of the
                        goldens, it's used for tensor mapping without running
                        model with framework runtime.
  --disable_offline_prepare
                        Use this option to disable offline preparation. Note:
                        By default offline preparation will be done for
                        DSP/HTP runtimes.
  --backend_extension_config BACKEND_EXTENSION_CONFIG
                        Path to config to be used with qnn-context-binary-
                        generator. Note: This argument is applicable only when
                        --executor_type qnn
  --context_config_params CONTEXT_CONFIG_PARAMS
                        optional context config params in a quoted string.
                        example: --context_config_params
                        'context_priority=high;
                        cache_compatibility_mode=strict' Note: This argument
                        is applicable only when --executor_type qnn
  --graph_config_params GRAPH_CONFIG_PARAMS
                        optional graph config params in a quoted string.
                        example: --graph_config_params 'graph_priority=low;
                        graph_profiling_num_executions=10'
  --extra_contextbin_args EXTRA_CONTEXTBIN_ARGS
                        Additional context binary generator arguments in a
                        quoted string(applicable only when --executor_type
                        qnn). example: --extra_contextbin_args
                        'arg1=value1;arg2=value2'
  --disable_graph_optimization
                        Disables basic model optimization
  --onnx_custom_op_lib ONNX_CUSTOM_OP_LIB
                        path to onnx custom operator library
  -f FRAMEWORK [FRAMEWORK ...], --framework FRAMEWORK [FRAMEWORK ...]
                        Framework type and version, version is optional.
                        Currently supported frameworks are [tensorflow,
                        tflite, onnx, pytorch]. For example, tensorflow 2.10.1
  --min_graph_size MIN_GRAPH_SIZE
                        Provide the minimum subgraph size
  --subgraph_relative_weight SUBGRAPH_RELATIVE_WEIGHT
                        Helps in deciding whether a sub graph is further
                        debugged or not. If a subgraph scores > 40 percent of
                        the aggreagte score of two subgraphs, we investage the
                        subgraph further.
  --verifier VERIFIER   Choose verifer among [sqnr, mse] for the comparison

Sample Commands

Sample command to run binary snooping on mv2 large model

qairt-accuracy-debugger\
  --snooping binary\
  --framework onnx\
  --model_path models/mv2/mobilenet-v2.onnx\
  --architecture aarch64-android\
  --input_list models/mv2/inputs/input_list_1.txt\
  --calibration_input_list models/mv2/inputs/input_list_1.txt\
  --input_tensor "input.1" 1,3,224,224 /local/mnt/workspace/harsraj/models/mv2/inputs/data1.raw\
  --output_tensor "473"\
  --engine_path $QAIRT_SDK_ROOT\
  --working_dir   tmp/QAIRT_BINARY\
  --runtime dspv75\
  --verifier mse\
  --quantization_overrides /local/mnt/workspace/harsraj/models/mv2/quantized_encoding.json\
  --min_graph_size 16

Outputs The algorithm provides two JSON files:

  1. graph_result.json (for each subgraph) - Contains verifier scores for two child subgraphs; for example 318_473 has child subgraphs 318_392 and 393_473.

  2. subgraph_result.json (for each subgraph) - Contains the corresponding and sorted verifier scores.

Keys in both files look like “subgraph_start_node_activation_name” + _ + “subgraph_end_node_activation_name”.

For example, 318_473 means a subgraph starts at node activation 318 and ends at node activation 473. Only the subgraph from 318 to 473 is quantized while the rest of the model runs in fp16/32.

Debugging accuracy issues with binary snooping results Subgraphs with maximum verifier scores in subgraph_result.json are the culprit subgraphs.

One subgraph can be a subset of another subgraph. In this case, prioritize a subgraph size you are comfortable debugging. The details of a subset can be found in graph_result.json.

qnn-platform-validator

qnn-platform-validator checks the QNN compatibility/capability of a device. The output is saved in a CSV file in the “output” directory, in a csv format. Basic logs are also displayed on the console.

DESCRIPTION:
------------
Helper script to set up the environment for and launch the qnn-platform-
validator executable.

REQUIRED ARGUMENTS:
-------------------
--backend            <BACKEND>          Specify the backend to validate: <gpu>, <dsp>
                                        <all>.

--directory          <DIR>              Path to the root of the unpacked SDK directory containing
                                        the executable and library files

--dsp_type           <DSP_VERSION>      Specify DSP variant: v66 or v68

OPTIONALS ARGUMENTS:
--------------------
--buildVariant       <TOOLCHAIN>        Specify the build variant
                                        aarch64-android or aarch64-windows-msvc to be validated.
                                        Default: aarch64-android

--testBackend                           Runs a small program on the runtime and Checks if QNN is supported for
                                        backend.

--deviceId           <DEVICE_ID>        Uses the device for running the adb command.
                                        Defaults to first device in the adb devices list..

--coreVersion                           Outputs the version of the runtime that is present on the target.

--libVersion                            Outputs the library version of the runtime that is present on the target.

--targetPath          <DIR>             The path to be used on the device.
                                        Defaults to /data/local/tmp/platformValidator

--remoteHost         <REMOTEHOST>       Run on remote host through remote adb server.
                                        Defaults to localhost.

--debug                                 Set to turn on Debug log
Additional details:
  • The following files need to be pushed to the device for the DSP to pass validator test.
    Note that the stub and skel libraries are specific to the DSP architecture version(e.g., v73):
    // Android
    bin/aarch64-android/qnn-platform-validator
    lib/aarch64-android/libQnnHtpV73CalculatorStub.so
    lib/hexagon-${DSP_ARCH}/unsigned/libCalculator_skel.so
    
    // Windows
    bin/aarch64-windows-msvc/qnn-platform-validator.exe
    lib/aarch64-windows-msvc/QnnHtpV73CalculatorStub.dll
    lib/hexagon-${DSP_ARCH}/unsigned/libCalculator_skel.so
    
  • The following example pushes the aarch64-android variant to /data/local/tmp/platformValidator

    adb push $SNPE_ROOT/bin/aarch64-android/snpe-platform-validator /data/local/tmp/platformValidator/bin/qnn-platform-validator
    adb push $SNPE_ROOT/lib/aarch64-android/ /data/local/tmp/platformValidator/lib
    adb push $SNPE_ROOT/lib/dsp /data/local/tmp/platformValidator/dsp
    

qnn-profile-viewer

The qnn-profile-viewer tool is used to parse profiling data that is generated using qnn-net-run. Additionally, the same data can be saved to a csv file.

usage: qnn-profile-viewer --input_log PROFILING_LOG [--help] [--output=CSV_FILE] [--extract_opaque_objects] [--reader=CUSTOM_READER_SHARED_LIB] [--schematic=SCHEMATIC_BINARY]

Reads profiling logs and outputs the contents to stdout

Note: The IPS calculation takes the following into account: graph execute time, tensor file IO time, and misc. time for quantization, callbacks, etc.

required arguments:
  --input_log                     PROFILING_LOG1,PROFILING_LOG2
                                  Provides a comma-separated list of Profiling log files

optional arguments:
  --output                        PATH
                                  Output file with processed profiling data. File formats vary depending upon the reader used
                                  (see --reader). If not provided, not output is created.

  --help                          Displays this help message.

  --reader                        CUSTOM_READER_SHARED_LIB
                                  Path to a reader library. If not specified, the default reader outputs a CSV file.

  --schematic                     SCHEMATIC_BINARY
                                  Path to the schematic binary file.
                                  Please note that this option is specific to the QnnHtpOptraceProfilingReader library.

  --config                        CONFIG_JSON_FILE
                                  Path to the config json file.
                                  Please note that this option is specific to the QnnHtpOptraceProfilingReader library.

  --dlc                           DLC_FILE
                                  Path to the dlc file.
                                  Please note that this option is specific to the QnnHtpOptraceProfilingReader library.

  --zoom_start                    PROFILE_SUBMODULE_START_NODE
                                  Name of starting node for a profile submodule optrace. If you specify this option you must also specify --zoom_end.
                                  Please note that this option is specific to the QnnHtpOptraceProfilingReader library.

  --zoom_end                      PROFILE_SUBMODULE_END_NODE
                                  Name of ending node for a profile submodule optrace. If you specify this option you must also specify --zoom_start.
                                  Please note that this option is specific to the QnnHtpOptraceProfilingReader library.

  --version                       Displays version information.

  --extract_opaque_objects        Specifies that the opaque objects will be dumped to output files

qnn-netron (Beta)

Overview

QNN Netron tool is making model debugging and visualization less daunting. qnn-netron is an extension of the netron graph tool. It provides for easier graph debugging and convenient runtime information. There are currently two key functionalities of the tool:

  1. The Visualize section allows customers to view their desired models after using the QNN Converter by importing the JSON representation of the model

  2. The Diff section allows customers to run networks of their choosing on different runtimes in order to compare network accuracy and performance

Launching Tool

Dependencies

The QNN netron tool leverages electron JS framework for building GUI frontend and depends on npm/node_js to be available in system. Additionally, python libraries for accuracy analysis are required by backend of tool. A convenient script is available in the QNN SDK to download necessary dependencies for building and running the tool.

# Note: following command should be run as administrator/root to be able to install system libraries
$ sudo bash ${QNN_SDK_ROOT}/bin/check-linux-dependency.sh
$ ${QNN_SDK_ROOT}/bin/check-python-dependency

Launching Application

qnn-netron script is used to be able to launch the QNN Netron application. This script:

  1. Clones vanilla netron git project

  2. Applies custom patches for enabling Netron for QNN

  3. Build the npm project

  4. Launches application

$ qnn-netron -h
usage: qnn-netron [-h] [-w <working_dir>]
Script to build and launch QNN Netron tool for visualizing and running analysis on Qnn Models.

Optional argument(s):
 -w <working_dir>                      Location for building QNN Netron tool. Default: current_dir


# To build and run application use
$ qnn-netron -w <my_working_dir>

QNN Netron Visualize Deep Dive

First, the user is prompted to open a JSON file that represents their converted model. This JSON comes from the converter tool. Please refer to this Overview for more details.

../_static/resources/landing_page_netron.jpg

Once the file is loaded into the tool, the graph should be displayed in the UI as shown below:

After loading in the model, the user can click on any of the nodes and a side pop-up section will display node information such as the type and name as well as vital parameter information such as inputs and outputs (datatypes, encodings, and shapes)

../_static/resources/netron_detailed_nodes_visualization.jpg

Netron Diff Customization Deep Dive

Limitations

  1. Diff Tool comparison between source framework goldens only works for framework goldens that are spatial first axis order. (NHWC)

  2. For usecases where source framework golden is used for comparison, Diff Tool is only tested to work for tensorflow and tensorflow variant frameworks.

In order for the user to open the Diff Customization tool, they can either click file and then “Open Diff…” or on tool startup by clicking “Diff…” as shown below:

../_static/resources/netron_diff_ui_opening.jpg
../_static/resources/open_diff_tool_netron.png

Upon launch of the Diff Customization tool, at the top, the user is prompted to select a use case for the tool. There are 3 options to choose from:

../_static/resources/use_case_netron.png

For the purposes of this documentation, only inference vs inference will be detailed. The setup procedure for the other use cases is similar. The other two use cases are explained below:

  1. Golden vs Inference: Used to test inference run using goldens from a particular ML framework and comparing against the output of a QNN backend

  2. Output vs Output: Used to test existing inference results against ML framework goldens OR used to test differences between two existing inference results

  3. Inference Vs Inference: Used to test inference between two converted QNN models or the same QNN model on different QNN backends

Inference vs Inference

If this use case is selected, the user is presented with various form fields for the purposes of running two jobs asynchronously with the option of choosing different runtimes for each QNN network being run.

qnn-netron

A more detailed view of what the user is prompted is displayed below:

qnn-netron

In order to execute the networks, the user has two options:

Running on Host machine

When the Target Device is selected as “host”, the user can only use the CPU as a runtime. In addition, the user can only select “x86_64-linux-clang” as the architecture in this use case.

qnn-netron

Running On-Device

When the Target Device is selected as “on-device”, a Device ID is required to connect to the device via adb. Thereafter, the user can select any of the three QNN backend runtimes available (CPU, GPU, or DSPv[68, 69, 73]) and the user can select architecture “aarch64-android”

qnn-netron

After choosing the desired target device and runtime configurations, the rest of the fields are explained in detail below:


Note

Users are able to click again and change the location to any of the path fields


Setup Parameters

Configurations to Select

The options for what verifier to run on the outputs of the model are (See Note below table for custom verifier (accuracy + performance) thresholds and see table below for providing custom accuracy verifier hyperparameters):

RtolAtol, AdjustedRtolAtol, TopK, MeanIOU, L1Error, CosineSimilarity, MSE, SQNR

Model JSON

upload <model>_net.json file that was outputted from the QNN converters.

Model Cpp

upload <model>.cpp that was outputted from the QNN converters.

Model Bin

upload <model>.bin that was outputted from the QNN converters.

NDK Path

upload the path to your Android NDK

Devices Engine Path

upload the path to the top-level of the unzipped qnn-sdk

Input List

provide a path to the input file for the model

Save Run Configurations

provide a location where the inference and runtime results from the Diff customization tool will be stored

Note

Users have the option of providing a custom accuracy and performance verifier threshold when running diff. A custom accuracy verifier threshold can be provided for any of the accuracy verifiers. By default the verifier thresholds are 0.01. The custom thresholds can be provided in the text boxes labelled “Accuracy Threshold” and “Perf Threshold”.

Users now have the option to enter accuracy verifier specific hyperparameters inside textboxes. The Default Values are displayed inside the text-boxes and can be customized as per user needs. The table below highlights the hyperparameters for each verifier that can be customized.

Verifier

Hyperparameters

AdjustedRtolAtol

Number of Levels

RtolAtol

Rtol Margin, Atol Margin

Topk

K, Ordered

MeanIOU

Background Classification

L1Error

Multiplier, Scale

CosineSimilarity

Multiplier, Scale

MSE (Mean Square Error)

N/A

SQNR (Signal-To-Noise Ratio)

N/A

Below is an example of what the fields should look like once filled to completion:

qnn-netron

After running the Diff Customization tool, the output directories/files should be present in the working directory file path provided in the last field

qnn-netron

Results and Outputs:

After pressing the Run button as mentioned above, the visualization of the network should pop-up. Nodes will be highlighted if there are any accuracy and/or performance variations. Clicking on each node will show more information about the accuracy and performance diff information as shown below.

qnn-netron

Performance and Accuracy Diff Visualizations:

qnn-netron

As seen above, the performance and accuracy diff information is shown under the Diff section of any given node. The color of the node boundary in the viewer represents whether a performance or accuracy error (above the default verifier threshold of 0.01) was reported. For example, in the Conv2d node shown below, there are two boundaries of orange and red indicating that this node has both an accuracy and performance difference across the runs. The FullyConnected node shown only has a yellow boundary indicating that only a performance difference was found.

qnn-netron
qnn-netron

QNN Netron Diff Navigation

QNN Netron has the ability to locate the first node in the graph with any performance or accuracy diffs. When the user clicks on the next and previous arrows, the visualization of the graph will zoom into the desired node with the first performance or accuracy difference. This makes model debugging much easier for larger models as the user doesn’t have to look for the nodes themselves to find where the network performance and accuracy errors starts to diverge.

qnn-netron

qnn-context-binary-utility

The qnn-context-binary-utility tool validates and serializes the metadata of context binary into a json file. This json file can then be used for inspecting the context binary aiding in debugging. A QNN context can be serialized to binary using QNN APIs or qnn-context-binary-generator tool.

usage: qnn-context-binary-utility --context_binary CONTEXT_BINARY_FILE --json_file JSON_FILE_NAME [--help] [--version]

Reads a serialized context binary and validates its metadata.
If --json_file is provided, it outputs the metadata to a json file

required arguments:
  --context_binary  CONTEXT_BINARY_FILE
                    Path to cached context binary from which the binary info will be extracted
                    and written to json.

  --json_file       JSON_FILE_NAME
                    Provide path along with the file name <DIR>/<FILE_NAME> to serialize
                    context binary info into json.
                    The directory path must exist. File with the FILE_NAME will be created at DIR.

optional arguments:
  --help          Displays this help message.

  --version       Displays version information.

Accuracy Evaluator plugins

File-based plugins

This section lists the built-in file-based plugins.

Dataset plugins

create_squad_examples - Extracts examples from given squad dataset file and save them to a file.

Parameters

Description

Type

Default

squad_version

Squad version 1 or 2

Integer

1

filter_dataset - Filters the dataset including the input list, calibration and annotation files.

Parameters

Description

Type

Default

max_inputs

Maximum number of inputs in inputlist to be considered for execution

Integer

Mandatory

max_calib

Maximum number of inputs in calibration to be considered for execution

Integer

Mandatory

random

Shuffles the inputlist and calibration files

Boolean

False

gpt2_tokenizer - Tokenizes data from files using GPT2TokenizerFast.

Parameters

Description

Type

Default

vocab_file

Path to the vocabulary file

String

Mandatory

merges_file

Path to the merges file

String

Mandatory

seq_length

Sequence length for the generated model inputs

Integer

Mandatory

past_seq_length

Sequence length for the “past” inputs

Integer

Mandatory

past_shape

Shape of the ‘past’ inputs

List

num_past

Number of ‘past’ inputs

Integer

0

split_txt_data - Saves individual text files for each line present in the given input text file.

Preprocessing plugins

centernet_preproc - Performs preprocessing on CenterNet dataset examples.

Parameters

Description

Type

Default

dims

Height and width; comma delimited, e.g., 416,416

String

Mandatory

scale

Scale factor for image

Float

1.0

fix_res

Resolution of the image

Boolean

True

pad

Image padding

Integer

0

convert_nchw - Transposes WHC to CHW or CHW to WHC and adds an extra N dimension.

Parameters

Description

Type

Default

expand-dims

Add the Nth dimension

Boolean

True

create_batch - Concatenates raw input files into a single file using numpy.

Parameters

Description

Type

Default

delete_prior

To delete prior unbatched data to save space

Boolean

True

truncate

If num inputs are not a multiple of batch size, then truncate left over inputs in the last batch or not

Boolean

False

crop - Center crops an image to the given dimensions using numpy or torchvision based on the library parameter.

Parameters

Description

Type

Default

dims

Height and width; comma delimited, e.g., 640,640

String

Mandatory

library

Python library used to crop the given input; valid values are: numpy | torchvision

String

numpy

typecasting_required

To convert final output to numpy or not. Note: This option is specific to torchvision library

Boolean

True

expand_dims - Adds the N dimension for images, e.g., HWC to NHWC.

image_transformers_input - Creates input files with image and/or text for image transformer models like ViT and CLIP.

Parameters

Description

Type

Default

dims

Expected processed output dimension in CHW format

String

Mandatory

num_base_class

Number of base classes in classification; used in the scenario where text input is also provided

Integer

Total classes available

num_prompt

Number of prompts for text classes; used in the scenario where text input is also provided

Integer

Total classes available

image_only

Data type of raw data

Boolean

False

normalize - Normalizes input per the given scheme; data must be of NHWC format.

Parameters

Description

Type

Default

library

Python library used to crop the given input; valid values are: numpy | torchvision

String

numpy

norm

Normalization factor, all values divided by norm

float32

255

means

Dictionary of means to be subtracted, e.g., {“R”:0.485, “G”:0.456, “B”:0.406}

RGB dictionary

{“R”:0, “G”:0, “B”:0}

std

Dictionary of std-dev for rescaling the values, e.g., {“R”:0.229, “G”:0.224, “B”:0.225}

RGB dictionary

{“R”:1, “G”:1, “B”:1}

channel_order

Channel order to specify means and std values per channel - RGB | BGR

String

RGB

normalize_first

To perform normalization before or after mean subtraction and standard deviation.
normalize_first=True means perform normalization before.
Note: torchvision library does not use this option

Boolean

True

typecasting_required

To convert final output to numpy or not. Note: This option is specific to the Torchvision library

Boolean

True

pil_to_tensor_input

To convert input to tensor before normalization. Note: This option is specific to the Torchvision library

Boolean

True

onmt_preprocess - Performs preprocessing on WMT dataset for FasterTransformer OpenNMT model

Parameters

Description

Type

Default

vocab_path

Path to OpenNMT model vocabulary file (pickle file)

String

Mandatory

src_seq_len

The maximum total input sequence length

Integer

128

skip_sentencepiece

Skip sentencepiece encoding

Boolean

True

sentencepiece_model_path

Path to sentencepiece model for WMT dataset (mandatory when “skip_sentencepiece” is False)

String

None

pad - Image padding with constant pad size or based on target dimensions

Parameters

Description

Type

Default

type

Type of padding. Valid options:
  • constant: Add padding of constant sides on 4 sides (pad_size must be provided)

  • target_dims: Add padding based on difference in image size and target size (dims param must be provided)

String

Mandatory

dims

Height and width comma delimited, e.g., 416,416 for ‘target-dims’ type of padding

String

Mandatory

pad_size

Size of padding for ‘constant’ type of padding

Integer

None

img_position

Parameter to specify position of image, either ‘center’ or ‘corner’ (top-left). Padding is added accordingly. Currently used for ‘target_dims’ type padding

String

center

color

Padding value for all planes

Integer

114

resize - Resizes an image using the specified library parameter: cv2(Default), pillow or torchvision

Parameters

Description

Type

Default

dims

Height and width; comma delimited, e.g., 640,640

String

Mandatory

library

Python library to be used for resizing a given input; valid values are: opencv | pillow | torchvision

String

opencv

channel_order

Convert image to specified channel order. At present this parameter only takes the ‘RGB’ value

String

RGB

interp

Interpolation Type. Options:
  • bilinear (supported by opencv, Torchvision, pillow)

  • area (supported by opencv only)

  • nearest (supported by opencv, Torchvision, pillow)

  • bicubic (supported by Torchvision, pillow)

  • box (supported by pillow only)

  • hamming (supported by pillow only)

  • lanczos (supported by pillow only)

String

For opencv and torchvision: bilinear
For pillow: bicubic

type

Type of resize to be done. Note: Torchvision does not use this option. Options:
  • letterbox : Used for YOLO models.

  • imagenet : Scale followed by resize.

  • aspect_ratio : Resize while keeping aspect ratio.

  • None : The default behavior is to auto-resize the image to the target dims.

String

auto-resize

resize_before_typecast

To resize before or after conversion to target datatype e.g., fp32

Boolean

True

typecasting_required

To convert final output to numpy or not. Note: This option is specific to the Torchvision library

Boolean

True

mean

Dictionary of means to be subtracted, e.g., {“R”:0.485, “G”:0.456, “B”:0.406}. Note: This option is specific to the Tensorflow library

RGB dictionary

{“R”:0, “G”:0, “B”:0}

std

Dictionary of std-dev for rescaling the values, e.g., {“R”:0.229, “G”:0.224, “B”:0.225}. Note: This option is specific to the Tensorflow library

RGB dictionary

{“R”:0, “G”:0, “B”:0}

normalize_before_resize

To perform normalization before or after mean subtraction and standard deviation. Note: This option is specific to the Tensorflow library

Boolean

False

crop_before_resize

To perform cropping before resize. Note: This option is specific to the Tensorflow library

Boolean

False

squad_read - Reads the SQuAD dataset JSON file. Preprocesses the question-context pairs into features for language models like BERT-Large

Parameters

Description

Type

Default

vocab_path

Path for local directory containing vocabulary files

String

Mandatory

max_seq_length

The maximum total input sequence length after WordPiece tokenization. Sequences longer than this will be truncated, and sequences shorter than this will be padded

Integer

384

max_query_length

The maximum number of tokens for the question. Questions longer than this will be truncated to this length

Integer

64

doc_stride

When splitting up a long document into chunks, how much stride to take between chunks

Integer

128

packing_strategy

Set this flag when using packing strategy for bert based models

Boolean

False

max_sequence_per_pack

The maximum number of sequences which can be packed together

Integer

3

mask_type

This can take either of three values - ‘None’, ‘Boolean’ or ‘Compressed’ depending on the masking to be done on input_mask

String

None

compressed_mask_length

Set this value if mask_type is set to compressed

Integer

None

Postprocessing plugins

bert_predict - Predicts answers for a SQuAD dataset given start and end logits.

Parameters

Description

Type

Default

vocab_path

Path for a local directory containing vocabulary files

String

Mandatory

max_seq_length

The maximum total input sequence length after WordPiece tokenization. Sequences longer than this will be truncated, and sequences shorter than this will be padded (optional if preprocessing is run)

Integer

384

doc_stride

When splitting up a long document into chunks, how much stride to take between chunks (optional if preprocessing is run)

Integer

128

max_query_length

The maximum number of tokens for the question. Questions longer than this will be truncated to this length (optional if preprocessing is run)

Integer

64

n_best_size

The total number of n-best predictions to generate in the post.json output file

Integer

20

max_answer_length

The maximum length of an answer that can be generated. This is needed because the start and end predictions are not conditioned on one another

Integer

30

packing_strategy

This flag is set to True if using packing strategy

Boolean

False

centerface_postproc - Processes the inference outputs to parse detections and generates a detections file for the metric evaluator. Used for processing CenterFace face detector.

Parameters

Description

Type

Default

dims

Height and width; comma delimited, e.g., 640,640

String

Mandatory

dtypes

List of datatypes to be used for bounding boxes, scores, and labels (in order), e.g., [float32, float32, int64]. Defaults to the datatypes fetched from the ‘outputs_info’ for the model’s config.yaml

List

Datatypes from the outputs_info section of the model config.yaml

heatmap_threshold

User input for heatmap threshold

Float

0.05

nms_threshold

User input for nms threshold

Float

0.3

centernet_postprocess - Processes the inference outputs to parse detections and generate a detections file for the metric evaluator. Used for processing CenterNet detector.

Parameters

Description

Type

Default

dtypes

List of datatypes (at least 3) to be used to infer outputs

String

Mandatory

output_dims

Height and width; comma delimited, e.g., 640,640

String

Mandatory

top_k

Top K proposals are given from the postprocess plugin

Integer

100

num_classes

Number of classes

Integer

1

score

Threshold to purify the detections

Integer

1

lprnet_predict - Used for LPRNET license plate prediction.

object_detection - Processes the inference outputs to parse detections and generate a detections file for metric evaluator

Parameters

Description

Type

Default

dims

Height and width; comma delimited, e.g., 640,640

String

Mandatory

type

Type of post-processing (e.g., letterbox, stretch)

String

None

label_offset

Offset for the labels information

Integer

0

score_threshold

Threshold limit for the detection scores

Float

0.001

xywh_to_xyxy

Convert bounding box format from box center (xywh) to box corner (xyxy) format

Boolean

False

xy_swap

Swap the X and Y coordinates of bbox

Boolean

False

dtypes

List of datatypes used for bounding boxes, scores, and labels in order, e.g., [float32, float32, int64]. Defaults to the datatypes fetched from the ‘outputs_info’ for the model’s config.yaml.

List

Datatypes from the outputs_info section of the model config.yaml

mask

Do postprocessing on mask

Boolean

False

mask_dims

Output dims of model. Provide this only if mask = True. E.g., 100,80,28,28

String

None

padded_outputs

Pad the outputs

Boolean

False

scale

Comma separated scale values

String

‘1’

skip_padding

Skip padding while rescaling to original image shape

Boolean

False

onmt_postprocess - Performs preprocessing for OpenNMT model outputs

Parameters

Description

Type

Default

sentencepiece_model_path

Path to sentencepiece model for WMT dataset

String

Mandatory

unrolled_count

Upper limit on the unrolls required for the output (no. of output tokens to be considered for metric)

Integer

26

vocab_path

Path to OpenNMT model vocabulary file (pickle file), optional if preprocessing is run

String

None

skip_sentencepiece

Skip sentencepiece encoding, optional if preprocessing is run

Boolean

None

Metric plugins

bleu - Evaluates bleu score using sacrebleu library

Parameters

Description

Type

Default

round

Number of decimal places to round the result to

Integer

1

map_coco - Evaluates the mAP score 50 and 50:05:95 for COCO dataset

Parameters

Description

Type

Default

map_80_to_90

Mapping of classes in range 0-80 to 0-90

Boolean

False

segm

Flag to calculate mAP for mask

Boolean

False

keypoint_map

Flag to calculate mAP for keypoint

Boolean

False

perplexity - Calculates the perplexity metric. Model outputs are expected to be the logits of proper shape. Ground truth data is expected to be in tokenized format and in the form of token IDs. The ground truth will be automatically generated, if using the “gpt2_tokenizer” dataset plugin.

Parameters

Description

Type

Default

logits_index

Index of the logits output if the model has multiple outputs

Integer

0

precision - Calculates the precision metric, i.e., (correct predictions / total predictions). Ground truth data is expected in the format “filename <space> correct_text”. The postprocessed model outputs are expected to be text files with just the “predicted_text”.

Parameters

Description

Type

Default

round

Number of decimal places to round the result to

Integer

7

input_image_index

For multi input models, the index of image file in input file list csv

Integer

0

squad_em - Calculates the exact match for SQuAD v1.1 dataset predictions and ground truth.

squad_f1 - Calculates F1 score for SQuAD v1.1 dataset predictions and ground truth.

topk - Evaluates topk value by comparing results and annotations.

Parameters

Description

Type

Default

kval

Top k values, e.g., 1,5 evaluates top1 and top5

String

5

softmax_index

Index of the softmax output in the results file list

Integer

0

label_offset

Offset required in the labels’ scores, e,g., if shape is 1x1001, then labels_offset=1

Integer

0

round

Number of decimal places to round the result to

Integer

3

input_image_index

For multi input models, the index of image file in input file list csv

Integer

0

widerface_AP - Computes average precision for easy, medium, and hard cases.

Parameters

Description

Type

Default

IoU_threshold

User input for IoU threshold

Float

0.4

Memory-based plugins

This section lists the built-in memory-based plugins.

Dataset plugins

create_squad_examples - Extracts examples from a given squad dataset file and saves them to a file.

Parameters

Description

Type

Default

squad_version

Squad version 1 or 2

Integer

1

max_inputs

Maximum number of inputs in inputlist to be considered for execution

Integer

-1 (Complete Dataset)

max_calib

Maximum number of inputs in calibration to be considered for execution

Integer

-1 (Complete Dataset)

filter_dataset - Filters the dataset including the input list, calibration, and annotation files.

Parameters

Description

Type

Default

max_inputs

Maximum number of inputs in inputlist to be considered for execution

Integer

Mandatory

max_calib

Maximum number of inputs in calibration to be considered for execution

Integer

Mandatory

random

Shuffles the inputlist and calibration files

Boolean

False

tokenize_wikitext_2 - Tokenizes wikitext-2 dataset into model inputs.

Parameters

Description

Type

Default

seq_length

Sequence length for the generated model inputs

Integer

Mandatory

tokenizer_name

Name of the tokenizer to be used for generating model inputs

String

Mandatory

past_shape

Shape of the ‘past’ inputs

List

0

num_past

Number of ‘past’ inputs

Integer

0

pos_id

Flag to configure whether position ids are required

Bool

True

mask_dtype

Data type of the mask used.

String

‘float32’

cached_path

Path to cached tokenizer file (if available)

String

split_txt_data - Saves individual text files for each line present in the given input text file.

Preprocessing memory plugins

centernet_preproc - Performs preprocessing on CenterNet dataset examples.

Parameters

Description

Type

Default

dims

Height and width; comma delimited, e.g., 416,416

String

Mandatory

scale

Scale factor for image

Float

1.0

fix_res

Resolution of the image

Boolean

True

pad

Image padding

Integer

0

convert_nchw - Transposes WHC to CHW or CHW to WHC and adds an extra N dimension.

Parameters

Description

Type

Default

expand-dims

Add the Nth dimension

Boolean

True

create_batch - Concatenates raw input files into a single file using numpy.

crop - Center crops an image to the given dimensions using numpy or torchvision based on the library parameter.

Parameters

Description

Type

Default

dims

Height and width; comma delimited, e.g., 640,640

String

Mandatory

library

Python library used to crop the given input; valid values are: numpy | torchvision

String

numpy

typecasting_required

To convert final output to numpy or not. Note: This option is specific to torchvision library

Boolean

True

expand_dims - Adds the N dimension for images, e.g., HWC to NHWC.

image_transformers_input - Creates input files with image and/or text for image transformer models like ViT and CLIP. (Note: This plugin requires Pillow package version:10.0.0)

Parameters

Description

Type

Default

dims

Expected processed output dimension in CHW format

String

Mandatory

image_only

Data type of raw data

Boolean

True

normalize - Normalizes input per the given scheme; data must be of NHWC format.

Parameters

Description

Type

Default

library

Python library used to crop the given input; valid values are: numpy | torchvision

String

numpy

norm

Normalization factor, all values divided by norm

float32

255.0

means

Dictionary of means to be subtracted, e.g., {“R”:0.485, “G”:0.456, “B”:0.406}

RGB dictionary

{“R”:0, “G”:0, “B”:0}

std

Dictionary of std-dev for rescaling the values, e.g., {“R”:0.229, “G”:0.224, “B”:0.225}

RGB dictionary

{“R”:1, “G”:1, “B”:1}

channel_order

Channel order to specify means and std values per channel - RGB | BGR

String

‘RGB’

normalize_first

To perform normalization before or after mean subtraction and standard deviation.
normalize_first=True means perform normalization before.
Note: torchvision library does not use this option

Boolean

True

typecasting_required

To convert final output to numpy or not. Note: This option is specific to the Torchvision library

Boolean

True

pil_to_tensor_input

To convert input to tensor before normalization. Note: This option is specific to the Torchvision library

Boolean

True

pad - Image padding with constant pad size or based on target dimensions

Parameters

Description

Type

Default

dims

Height and width comma delimited, e.g., 416,416 for ‘target-dims’ type of padding

String

Mandatory

type

Type of padding. Valid options:
  • constant: Add padding of constant sides on 4 sides (pad_size must be provided)

  • target_dims: Add padding based on difference in image size and target size (dims param must be provided)

String

Mandatory

pad_size

Size of padding for ‘constant’ type of padding

Integer

None

img_position

Parameter to specify position of image, either ‘center’ or ‘corner’ (top-left). Padding is added accordingly. Currently used for ‘target_dims’ type padding

String

‘center’

color

Padding value for all planes

Integer

114

resize - Resizes an image using the specified library parameter: cv2(Default), pillow or torchvision

Parameters

Description

Type

Default

dims

Height and width; comma delimited, e.g., 640,640

String

Mandatory

library

Python library to be used for resizing a given input; valid values are: opencv | pillow | torchvision

String

opencv

channel_order

Convert image to specified channel order. At present this parameter only takes the ‘RGB’ value

String

RGB

interp

Interpolation Type. Options:
  • bilinear (supported by opencv, Torchvision, pillow)

  • area (supported by opencv only)

  • nearest (supported by opencv, Torchvision, pillow)

  • bicubic (supported by Torchvision, pillow)

  • box (supported by pillow only)

  • hamming (supported by pillow only)

  • lanczos (supported by pillow only)

String

For opencv and torchvision: bilinear
For pillow: bicubic

type

Type of resize to be done. Note: Torchvision does not use this option. Options:
  • letterbox : Used for YOLO models.

  • imagenet : Scale followed by resize.

  • aspect_ratio : Resize while keeping aspect ratio.

  • None : The default behavior is to auto-resize the image to the target dims.

String

auto-resize

resize_before_typecast

To resize before or after conversion to target datatype e.g., fp32

Boolean

True

typecasting_required

To convert final output to numpy or not. Note: This option is specific to the Torchvision library

Boolean

True

mean

Dictionary of means to be subtracted, e.g., {“R”:0.485, “G”:0.456, “B”:0.406}. Note: This option is specific to the Tensorflow library

RGB dictionary

{“R”:0, “G”:0, “B”:0}

std

Dictionary of std-dev for rescaling the values, e.g., {“R”:0.229, “G”:0.224, “B”:0.225}. Note: This option is specific to the Tensorflow library

RGB dictionary

{“R”:0, “G”:0, “B”:0}

normalize_before_resize

To perform normalization before or after mean subtraction and standard deviation. Note: This option is specific to the Tensorflow library

Boolean

False

crop_before_resize

To perform cropping before resize. Note: This option is specific to the Tensorflow library

Boolean

False

norm

Normalization factor, all values divided by norm

float32

255.0

normalize_first

To perform normalization before or after mean subtraction and standard deviation.
normalize_first=True means perform normalization before.
Note: torchvision library does not use this option

Boolean

True

squad_preprocess - Reads the processed files created by the create_squad_examples plugin.

Parameters

Description

Type

Default

mask_type

The type of masking to apply. If ‘bool’, boolean masking is applied. If None, no masking is applied.

String

None

Postprocessing memory plugins

squad_postprocess - Predicts answers for a SQuAD dataset for the given start and end scores.

Parameters

Description

Type

Default

packing_strategy

This flag is set to True if using packing strategy

Boolean

False

centerface_postproc - Processes the inference outputs to parse detections and generates a detections file for the metric evaluator. Used for processing CenterFace face detector.

Parameters

Description

Type

Default

dims

Height and width; comma delimited, e.g., 640,640

String

Mandatory

heatmap_threshold

User input for heatmap threshold

Float

0.05

nms_threshold

User input for nms threshold

Float

0.3

centernet_postprocess - Processes the inference outputs to parse detections and generate a detections file for the metric evaluator. Used for processing CenterNet detector.

Parameters

Description

Type

Default

output_dims

Height and width; comma delimited, e.g., 640,640

String

Mandatory

top_k

Top K proposals are given from the postprocess plugin

Integer

100

num_classes

Number of classes

Integer

1

score

Threshold to purify the detections

Integer

1

lprnet_predict - Used for LPRNET license plate prediction.

object_detection - Processes the inference outputs to parse detections and generate a detections file for metric evaluator

Parameters

Description

Type

Default

dims

Height and width; comma delimited, e.g., 640,640

String

Mandatory

type

Type of post-processing (e.g., letterbox, stretch)

String

None

label_offset

Offset for the labels information

Integer

0

score_threshold

Threshold limit for the detection scores

Float

0.001

xywh_to_xyxy

Convert bounding box format from box center (xywh) to box corner (xyxy) format

Boolean

False

xy_swap

Swap the X and Y coordinates of bbox

Boolean

False

dtypes

List of datatypes used for bounding boxes, scores, and labels in order, e.g., [float32, float32, int64]. Defaults to the datatypes fetched from the ‘outputs_info’ for the model’s config.yaml.

List

Datatypes from the outputs_info section of the model config.yaml

mask

Do postprocessing on mask

Boolean

False

mask_dims

Output dims of model. Provide this only if mask = True. E.g., 100,80,28,28

String

None

padded_outputs

Pad the outputs

Boolean

False

scale

Comma separated scale values

String

‘1’

skip_padding

Skip padding while rescaling to original image shape

Boolean

False

Metric memory plugins

map_coco - Evaluates the mAP score 50 and 50:05:95 for COCO dataset

Parameters

Description

Type

Default

map_80_to_90

Mapping of classes in range 0-80 to 0-90

Boolean

False

segm

Flag to calculate mAP for mask

Boolean

False

keypoint_map

Flag to calculate mAP for keypoint

Boolean

False

data

Dataset used for evaluation. data must be one of ‘openimages’ or ‘coco’

String

‘coco’

perplexity - Calculates the perplexity metric. Model outputs are expected to be the logits of proper shape. Ground truth data is expected to be in tokenized format and in the form of token IDs. The ground truth will be automatically generated, if using the “gpt2_tokenizer” dataset plugin.

Parameters

Description

Type

Default

logits_index

Index of the logits output if the model has multiple outputs

Integer

0

precision - Calculates the precision metric, i.e., (correct predictions / total predictions). Ground truth data is expected in the format “filename <space> correct_text”. The postprocessed model outputs are expected to be text files with just the “predicted_text”.

Parameters

Description

Type

Default

round

Number of decimal places to round the result to

Integer

7

input_image_index

For multi input models, the index of image file in input file list csv

Integer

0

squad_eval - Calculates F1 score and exact match scores for SQuAD dataset based on predictions and ground truth.

Parameters

Description

Type

Default

vocabulary

vocabulary used for creating the tokenizer which would be used for evaluation.
  • vocabulary from huggingface.co and cache (e.g. “bert-base-uncased”)

  • vocabulary from huggingface.co (user-uploaded) and cache (e.g. “deepset/roberta-base-squad2”)

  • path for local directory containing vocabulary files(tokenizer was saved using _save_pretrained(‘./test/saved_model/’)

String

Mandatory

max_answer_length

The maximum length of an answer, after tokenization. In SQuAD v2 this was set to 30 tokens; in SQuAD v1 it was not specified so a default value of 30 was used.

Integer

30

n_best_size

Specifies how many of the possible answers to return for a given question along with corresponding confidence scores.

Integer

20

do_lower_case

Whether or not to lowercase all text before processing.

Bool

False

squad_version

Indicates which version of SQuAD style questions and answers we’re dealing with (“v1” or “v2”).

Integer

1

round

Number of decimal places to round the result to

Integer

6

cached_vocab_path

Path to cached vocab_file to be used for creating tokenizer

String

None

topk - Evaluates topk value by comparing results and annotations.

Parameters

Description

Type

Default

kval

Top k values, e.g., 1,5 evaluates top1 and top5

String

‘1,5’

softmax_index

Index of the softmax output in the results file list

Integer

0

label_offset

Offset required in the labels’ scores, e,g., if shape is 1x1001, then labels_offset=1

Integer

0

round

Number of decimal places to round the result to

Integer

3

input_image_index

For multi input models, the index of image file in input file list csv

Integer

0

widerface_AP - Computes average precision for easy, medium, and hard cases.

Parameters

Description

Type

Default

IoU_threshold

User input for IoU threshold

Float

0.4

SDK Compatibility Verification

The model generated by the converter should be inferred by net-run tools from the same SDK as the converter. We can quickly check the SDK info of model.cpp/model.so by running these string grep commands:

strings model.cpp  | grep qaisw
strings libqnn_model.so  | grep qaisw