Tools

This page describes the various SDK tools and feature for Linux/Android and Windows developers. For the integration flow of different developers, please refer to Overview page for further information.

Category

Tool

Developer

Linux/Android

Windows

Ubuntu

WSL x86

Device

WSL x86

Windows x86_64

Windows on Snapdragon

Model Conversion

qnn-tensorflow-converter

YES

YES

YES

YES

YES**

qnn-tflite-converter

YES

YES

YES

qnn-pytorch-converter

YES

YES

YES

qnn-onnx-converter

YES

YES

YES

YES

YES**

qairt-converter

YES

YES

YES

YES

YES**

Model Preparation

Quantization Support

YES

YES

YES

YES

YES

qnn-model-lib-generator

YES

YES

YES

YES

qnn-op-package-generator

YES

YES

YES

qnn-context-binary-generator

YES

YES

YES

YES

YES

YES

Execution

qnn-net-run

YES

YES

YES

YES

YES

qnn-throughput-net-run

YES

YES

YES

YES

Analysis

qairt-accuracy-evaluator (Beta)

YES

YES****

qnn-architecture-checker (Beta)

YES

YES

YES

YES

YES**

qnn-accuracy-debugger (Beta)

YES

YES

YES***

YES

qairt-accuracy-debugger (Beta)

YES

YES

qnn-platform-validator

YES

YES

YES

qnn-profile-viewer

YES

YES

YES

YES*

YES*

Benchmarking

YES

qnn-netron (Beta)

YES

qnn-context-binary-utility

YES

YES

YES

Note

The Beta designation indicates pre-production quality. This means that the component is currently undergoing more rigorous testing and may not fully satisfy compatibility requirements as expected in the production version. In other words, incompatible changes (such as alterations in behavior or interface) between releases are allowed without prior notice, although every effort is made to minimize such changes.

Note

* When using converter tools in Windows PowerShell, make sure a virtual environment with the required python packages (see Setup for more details) is activated and converters are executed via python, as shown in the following example.
(venv-3.10) > python qnn-onnx-converter <options>

Note

  • Extension naming of library: For Windows developers, please replace all ‘.so’ files with the analogous ‘.dll’ file in the following sections. Please refer to Platform Differences for more details.

  • For more detailed information on converters please refer to Converters.

  • [*] libQnnGpuProfilingReader.dll is not supported on Windows platform for qnn-profile-viewer.

  • [**] Requires the python scripts and the executables from the Windows x86_64 binary folder(bin\x86_64-windows-msvc).

  • [***] Accuracy debugger on Windows x86 system is tested only for CPU runtime currently.

  • [****] The Accuracy Evaluator on Windows for Snapdragon has been tested and verified for both CPU and HTP runtimes.

  • PyTorch models and preprocessing/postprocessing stages that depend upon the torch library are currently not supported in the Windows version of the Accuracy Evaluator.

  • TFlite conversion using qairt-converter is not supported for Windows x86_64 and Windows on Snapdragon due TVM library dependency.

Model Conversion

qnn-tensorflow-converter

The qnn-tensorflow-converter tool converts a model from the TensorFlow framework to a CPP file representing the model as a series of QNN API calls. Additionally, a binary file containing static weights of the model is produced.

usage: qnn-tensorflow-converter -d INPUT_NAME INPUT_DIM --out_node OUT_NAMES
                                [--input_type INPUT_NAME INPUT_TYPE]
                                [--input_dtype INPUT_NAME INPUT_DTYPE] [--input_encoding  ...]
                                [--input_layout INPUT_NAME INPUT_LAYOUT] [--custom_io CUSTOM_IO]
                                [--show_unconsumed_nodes] [--saved_model_tag SAVED_MODEL_TAG]
                                [--saved_model_signature_key SAVED_MODEL_SIGNATURE_KEY]
                                [--quantization_overrides QUANTIZATION_OVERRIDES]
                                [--keep_quant_nodes] [--disable_batchnorm_folding]
                                [--expand_lstm_op_structure]
                                [--keep_disconnected_nodes] [--input_list INPUT_LIST]
                                [--param_quantizer PARAM_QUANTIZER] [--act_quantizer ACT_QUANTIZER]
                                [--algorithms ALGORITHMS [ALGORITHMS ...]]
                                [--bias_bitwidth BIAS_BITWIDTH] [--bias_bw BIAS_BW]
                                [--act_bitwidth ACT_BITWIDTH] [--act_bw ACT_BW]
                                [--weights_bitwidth WEIGHTS_BITWIDTH] [--weight_bw WEIGHT_BW]
                                [--float_bias_bitwidth FLOAT_BIAS_BITWIDTH] [--ignore_encodings]
                                [--use_per_channel_quantization] [--use_per_row_quantization]
                                [--enable_per_row_quantized_bias]
                                [--float_fallback] [--use_native_input_files] [--use_native_dtype]
                                [--use_native_output_files] [--disable_relu_squashing]
                                [--restrict_quantization_steps ENCODING_MIN, ENCODING_MAX]
                                --input_network INPUT_NETWORK [--debug [DEBUG]]
                                [-o OUTPUT_PATH] [--copyright_file COPYRIGHT_FILE]
                                [--float_bitwidth FLOAT_BITWIDTH] [--float_bw FLOAT_BW]
                                [--float_bias_bw FLOAT_BIAS_BW] [--overwrite_model_prefix]
                                [--exclude_named_tensors] [--op_package_lib OP_PACKAGE_LIB]
                                [--converter_op_package_lib CONVERTER_OP_PACKAGE_LIB]
                                [-p PACKAGE_NAME | --op_package_config CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]]
                                [-h] [--arch_checker]

Script to convert TF model into QNN

required arguments:
  -d INPUT_NAME INPUT_DIM, --input_dim INPUT_NAME INPUT_DIM
                        The names and dimensions of the network input layers specified in the format
                        [input_name comma-separated-dimensions], for example:
                            'data' 1,224,224,3
                        Note that the quotes should always be included in order to
                        handlespecial characters, spaces, etc.
                        For multiple inputs specify multiple --input_dim on the command line like:
                            --input_dim 'data1' 1,224,224,3 --input_dim 'data2' 1,50,100,3
  --out_node OUT_NODE,  --out_name OUT_NAMES
                        Name of the graph's output nodes. Multiple output nodes should be
                        provided separately like:
                            --out_node out_1 --out_node out_2
  --input_network INPUT_NETWORK, -i INPUT_NETWORK
                        Path to the source framework model.

optional arguments:
  --input_type INPUT_NAME INPUT_TYPE, -t INPUT_NAME INPUT_TYPE
                        Type of data expected by each input op/layer. Type for each input is
                        |default| if not specified. For example: "data" image.Note that the quotes
                        should always be included in order to handle special characters, spaces,etc.
                        For multiple inputs specify multiple --input_type on the command line.
                        Eg:
                            --input_type "data1" image --input_type "data2" opaque
                        These options get used by DSP runtime and following descriptions state how
                        input will be handled for each option.
                        Image:
                        Input is float between 0-255 and the input's mean is 0.0f and the input's
                        max is 255.0f. We will cast the float to uint8ts and pass the uint8ts to the
                        DSP.
                        Default:
                        Pass the input as floats to the dsp directly and the DSP will quantize it.
                        Opaque:
                        Assumes input is float because the consumer layer(i.e next layer) requires
                        it as float, therefore it won't be quantized.
                        Choices supported:
                            image
                            default
                            opaque
  --input_dtype INPUT_NAME INPUT_DTYPE
                        The names and datatype of the network input layers specified in the format
                        [input_name datatype], for example:
                            'data' 'float32'.
                        Default is float32 if not specified.
                        Note that the quotes should always be included in order to handle special
                        characters, spaces, etc.
                        For multiple inputs specify multiple --input_dtype on the command line like:
                            --input_dtype 'data1' 'float32' --input_dtype 'data2' 'float32'
  --input_encoding INPUT_ENCODING [INPUT_ENCODING ...], -e INPUT_ENCODING [INPUT_ENCODING ...]
                        Usage:     --input_encoding "INPUT_NAME" INPUT_ENCODING_IN
                        [INPUT_ENCODING_OUT]
                        Input encoding of the network inputs. Default is bgr.
                        e.g.
                            --input_encoding "data" rgba
                        Quotes must wrap the input node name to handle special characters,
                        spaces, etc. To specify encodings for multiple inputs, invoke
                        --input_encoding for each one.
                        e.g.
                            --input_encoding "data1" rgba --input_encoding "data2" other
                        Optionally, an output encoding may be specified for an input node by
                        providing a second encoding. The default output encoding is bgr.
                        e.g.
                            --input_encoding "data3" rgba rgb
                        Input encoding types:
                            image color encodings: bgr,rgb, nv21, nv12, ...
                            time_series: for inputs of rnn models;
                            other: not available above or is unknown.
                        Supported encodings:
                            bgr
                            rgb
                            rgba
                            argb32
                            nv21
                            nv12
                            time_series
                            other
  --input_layout INPUT_NAME INPUT_LAYOUT, -l INPUT_NAME INPUT_LAYOUT
                        Layout of each input tensor. If not specified, it will use the default
                        based on the Source Framework, shape of input and input encoding.
                        Accepted values are-
                            NCDHW, NDHWC, NCHW, NHWC, NFC, NCF, NTF, TNF, NF, NC, F, NONTRIVIAL
                        N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature, T = Time
                        NDHWC/NCDHW used for 5d inputs
                        NHWC/NCHW used for 4d image-like inputs
                        NFC/NCF used for inputs to Conv1D or other 1D ops
                        NTF/TNF used for inputs with time steps like the ones used for LSTM op
                        NF used for 2D inputs, like the inputs to Dense/FullyConnected layers
                        NC used for 2D inputs with 1 for batch and other for Channels (rarely used)
                        F used for 1D inputs, e.g. Bias tensor
                        NONTRIVIAL for everything elseFor multiple inputs specify multiple
                        --input_layout on the command line.
                        Eg:
                           --input_layout "data1" NCHW --input_layout "data2" NCHW
  --custom_io CUSTOM_IO
                        Use this option to specify a yaml file for custom IO
  --show_unconsumed_nodes
                        Displays a list of unconsumed nodes, if there any are found. Nodes which are
                        unconsumed do not violate the structural fidelity of thegenerated graph.
  --saved_model_tag SAVED_MODEL_TAG
                        Specify the tag to seletet a MetaGraph from savedmodel. ex:
                        --saved_model_tag serve. Default value will be 'serve' when it is not
                        assigned.
  --saved_model_signature_key SAVED_MODEL_SIGNATURE_KEY
                        Specify signature key to select input and output of the model. ex:
                        --saved_model_signature_key serving_default. Default value will be
                        'serving_default' when it is not assigned
  --disable_batchnorm_folding
  --expand_lstm_op_structure
                        Enables optimization that breaks the LSTM op to equivalent math ops
  --keep_disconnected_nodes
                        Disable Optimization that removes Ops not connected to the main graph.
                        This optimization uses output names provided over commandline OR
                        inputs/outputs extracted from the Source model to determine the main graph
  --debug [DEBUG]       Run the converter in debug mode.
  -o OUTPUT_PATH, --output_path OUTPUT_PATH
                        Path where the converted Output model should be saved.If not specified, the
                        converter model will be written to a file with same name as the input model
  --copyright_file COPYRIGHT_FILE
                        Path to copyright file. If provided, the content of the file will be added
                        to the output model.
  --float_bitwidth FLOAT_BITWIDTH
                        Selects the bitwidth to use when using float for parameters (weights/bias)
                        and activations for all ops or a specific op (via encodings) selected
                        through encoding; 32 (default) or 16.
  --float_bw FLOAT_BW   Deprecated; use --float_bitwidth.
  --float_bias_bw FLOAT_BIAS_BW
                        Deprecated; use --float_bias_bitwidth.
  --overwrite_model_prefix
                        If option passed, model generator will use the output path name to use as
                        model prefix to name functions in <qnn_model_name>.cpp. (Useful for running
                        multiple models at once) eg: ModelName_composeGraphs. Default is to use
                        generic "QnnModel_".
  --exclude_named_tensors
                        Remove using source framework tensorNames; instead use a counter for naming
                        tensors. Note: This can potentially help to reduce the final model library
                        that will be generated(Recommended for deploying model). Default is False.
  -h, --help            show this help message and exit

Quantizer Options:
  --quantization_overrides QUANTIZATION_OVERRIDES
                        Use this option to specify a json file with parameters to use for
                        quantization. These will override any quantization data carried from
                        conversion (eg TF fake quantization) or calculated during the normal
                        quantization process. Format defined as per AIMET specification.
  --keep_quant_nodes    Use this option to keep activation quantization nodes in the graph rather
                        than stripping them.
  --input_list INPUT_LIST
                        Path to a file specifying the input data. This file should be a plain text
                        file, containing one or more absolute file paths per line. Each path is
                        expected to point to a binary file containing one input in the "raw" format,
                        ready to be consumed by the quantizer without any further preprocessing.
                        Multiple files per line separated by spaces indicate multiple inputs to the
                        network. See documentation for more details. Must be specified for
                        quantization. All subsequent quantization options are ignored when this is
                        not provided.
  --param_quantizer PARAM_QUANTIZER
                        Optional parameter to indicate the weight/bias quantizer to use. Must be followed by one of the following options:
                        "tf": Uses the real min/max of the data and specified bitwidth (default).
                        "enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
                        "adjusted": Deprecated.
                        "symmetric": Ensures min and max have the same absolute values about zero.
                                     Data will be stored as int#_t data such that the offset is always 0.
  --act_quantizer ACT_QUANTIZER
                        Optional parameter to indicate the activation quantizer to use. Must be followed by one of the following options:
                        "tf": Uses the real min/max of the data and specified bitwidth (default).
                        "enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
                        "adjusted": Deprecated.
                        "symmetric": Ensures min and max have the same absolute values about zero.
                                     Data will be stored as int#_t data such that the offset is always 0.
  --algorithms ALGORITHMS [ALGORITHMS ...]
                        Use this option to enable new optimization algorithms. Usage is:
                            --algorithms <algo_name1> ... The available optimization algorithms are:
                        "cle" - Cross layer equalization includes a number of methods for equalizing
                        weights and biases across layers in order to rectify imbalances that cause
                        quantization errors.
  --bias_bitwidth BIAS_BITWIDTH
                        Selects the bitwidth to use when quantizing the biases; 8 (default) or 32.
  --bias_bw BIAS_BW     Deprecated; use --bias_bitwidth.
  --act_bitwidth ACT_BITWIDTH
                        Selects the bitwidth to use when quantizing the activations; 8 (default) or 16.
  --act_bw ACT_BW       Deprecated; use --act_bitwidth.
  --weights_bitwidth WEIGHTS_BITWIDTH
                        Selects the bitwidth to use when quantizing the weights; 4 or 8 (default).
  --weight_bw WEIGHT_BW
                        Deprecated; use --weights_bitwidth.
  --float_bias_bitwidth FLOAT_BIAS_BITWIDTH
                        Selects the bitwidth to use when biases are in float; 32 or 16.
  --ignore_encodings    Use only quantizer generated encodings, ignoring any user or model provided
                        encodings.
                        Note: Cannot use --ignore_encodings with --quantization_overrides
  --use_per_channel_quantization
                        Enables per-channel quantization for convolution-based op weights.
                        This replaces the built-in model QAT encodings when used for a given weight.
  --use_per_row_quantization
                        Enables row wise quantization of Matmul and FullyConnected ops.
  --enable_per_row_quantized_bias
                        Enables row wise quantization of bias for FullyConnected op, when weights are per-row quantized.
  --float_fallback      Enables fallback to floating point (FP) instead of fixed point.
                        This option can be paired with --float_bitwidth to indicate the bitwidth for FP (by default 32).
                        If this option is enabled, --input_list must not be provided and --ignore_encodings must not be provided.
                        The external quantization encodings (encoding file/FakeQuant encodings) might be missing
                        quantization parameters for some interim tensors. First it will try to fill the gaps by
                        propagating across math-invariant functions. If the quantization params are still missing,
                        it applies fallback to nodes to floating point.
  --use_native_input_files
                        Boolean flag to indicate how to read input files:
                        1. float (default): reads inputs as floats and quantizes if necessary based
                        on quantization parameters in the model.
                        2. native: reads inputs assuming the data type to be native to the
                        model. For ex., uint8_t.
  --use_native_dtype    Note: This option is deprecated, use --use_native_input_files option in
                        future.
                        Boolean flag to indicate how to read input files:
                        1. float (default): reads inputs as floats and quantizes if necessary based
                        on quantization parameters in the model.
                        2. native: reads inputs assuming the data type to be native to the
                        model. For ex., uint8_t.
  --use_native_output_files
                        Use this option to indicate the data type of the output files
                        1. float (default): output the file as floats.
                        2. native: outputs the file that is native to the model. For ex.,
                        uint8_t.
  --disable_relu_squashing
                        Disables squashing of ReLU against convolution-based ops for quantized models.
  --restrict_quantization_steps ENCODING_MIN, ENCODING_MAX
                        Specifies the number of steps to use for computing quantization encodings
                        such that scale = (max - min) / number of quantization steps.
                        The option should be passed as a space separated pair of hexadecimal string
                        minimum and maximum values. i.e. --restrict_quantization_steps "MIN MAX".
                        Please note that this is a hexadecimal string literal and not a signed
                        integer, to supply a negative value an explicit minus sign is required.
                        E.g.--restrict_quantization_steps "-0x80 0x7F" indicates an example 8 bit range,
                        --restrict_quantization_steps "-0x8000 0x7F7F" indicates an example 16
                        bit range. This argument is required for 16-bit Matmul operations.

 Custom Op Package Options:
  --op_package_lib OP_PACKAGE_LIB, -opl OP_PACKAGE_LIB
                        Use this argument to pass an op package library for quantization. Must be in
                        the form
                        <op_package_lib_path:interfaceProviderName> and be separated by a
                        comma for multiple package libs
  -p PACKAGE_NAME, --package_name PACKAGE_NAME
                        A global package name to be used for each node in the Model.cpp file.
                        Defaults to Qnn header defined package name
  --converter_op_package_lib CONVERTER_OP_PACKAGE_LIB, -cpl CONVERTER_OP_PACKAGE_LIB
                        Absolute path to converter op package library compiled by the OpPackage
                        generator. Must be separated by a comma for multiple package libraries.
                        Note: Libraries must follow the same order as the xml files.
                        E.g.1: --converter_op_package_lib absolute_path_to/libExample.so
                        E.g.2: -cpl absolute_path_to/libExample1.so,absolute_path_to/libExample2.so
  --op_package_config OP_PACKAGE_CONFIG [OP_PACKAGE_CONFIG ...], -opc OP_PACKAGE_CONFIG [OP_PACKAGE_CONFIG ...]
                        Path to a Qnn Op Package XML configuration file that contains user defined
                        custom operations.

Architecture Checker Options(Experimental):
  --arch_checker        Note: This option will be soon deprecated. Use the qnn-architecture-checker tool to achieve the same result.

Note: Only one of: {'package_name', 'op_package_config'} can be specified

Basic command line usage looks like:

$ qnn-tensorflow-converter -i <path>/frozen_graph.pb
                    -d <network_input_name> <dims>
                    --out_node <network_output_name>
                    -o <optional_output_path>
                    --allow_unconsumed_nodes  # optional, but most likely will be need for larger models
                    -p <optional_package_name> # Defaults to "qti.aisw"

qnn-tflite-converter

The qnn-tflite-converter tool converts a TFLite model to a CPP file representing the model as a series of QNN API calls. Additionally, a binary file containing static weights of the model is produced.

usage: qnn-tflite-converter [-d INPUT_NAME INPUT_DIM] [--signature_name SIGNATURE_NAME]
                            [--out_node OUT_NAMES] [--input_type INPUT_NAME INPUT_TYPE]
                            [--input_dtype INPUT_NAME INPUT_DTYPE] [--input_encoding  ...]
                            [--input_layout INPUT_NAME INPUT_LAYOUT] [--custom_io CUSTOM_IO]
                            [--dump_relay DUMP_RELAY]
                            [--quantization_overrides QUANTIZATION_OVERRIDES] [--keep_quant_nodes]
                            [--disable_batchnorm_folding] [--expand_lstm_op_structure]
                            [--keep_disconnected_nodes]
                            [--input_list INPUT_LIST] [--param_quantizer PARAM_QUANTIZER]
                            [--act_quantizer ACT_QUANTIZER]
                            [--algorithms ALGORITHMS [ALGORITHMS ...]]
                            [--bias_bitwidth BIAS_BITWIDTH] [--bias_bw BIAS_BW]
                            [--act_bitwidth ACT_BITWIDTH] [--act_bw ACT_BW]
                            [--weights_bitwidth WEIGHTS_BITWIDTH] [--weight_bw WEIGHT_BW]
                            [--float_bias_bitwidth FLOAT_BIAS_BITWIDTH] [--ignore_encodings]
                            [--use_per_channel_quantization] [--use_per_row_quantization]
                            [--enable_per_row_quantized_bias]
                            [--float_fallback] [--use_native_input_files] [--use_native_dtype]
                            [--use_native_output_files] [--disable_relu_squashing]
                            [--restrict_quantization_steps ENCODING_MIN, ENCODING_MAX]
                            --input_network INPUT_NETWORK [--debug [DEBUG]]
                            [-o OUTPUT_PATH] [--copyright_file COPYRIGHT_FILE]
                            [--float_bitwidth FLOAT_BITWIDTH] [--float_bw FLOAT_BW]
                            [--float_bias_bw FLOAT_BIAS_BW] [--overwrite_model_prefix]
                            [--exclude_named_tensors] [--op_package_lib OP_PACKAGE_LIB]
                            [--converter_op_package_lib CONVERTER_OP_PACKAGE_LIB]
                            [-p PACKAGE_NAME | --op_package_config CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]]
                            [-h] [--arch_checker]

Script to convert TFLite model into QNN

required arguments:
  --input_network INPUT_NETWORK, -i INPUT_NETWORK
                        Path to the source framework model.

optional arguments:
  -d INPUT_NAME INPUT_DIM, --input_dim INPUT_NAME INPUT_DIM
                        The names and dimensions of the network input layers specified in the format
                        [input_name comma-separated-dimensions], for example:
                            'data' 1,224,224,3
                        Note that the quotes should always be included in order to handle special
                        characters, spaces, etc.
                        For multiple inputs specify multiple --input_dim on the command line like:
                            --input_dim 'data1' 1,224,224,3 --input_dim 'data2' 1,50,100,3
  --signature_name SIGNATURE_NAME, -sn SIGNATURE_NAME
                        Specifies a specific subgraph signature to convert.
  --out_node OUT_NAMES, --out_name OUT_NAMES
                        Name of the graph's output Tensor Names. Multiple output names should be
                        provided separately like:
                            --out_name out_1 --out_name out_2
  --input_type INPUT_NAME INPUT_TYPE, -t INPUT_NAME INPUT_TYPE
                        Type of data expected by each input op/layer. Type for each input is
                        |default| if not specified. For example: "data" image.Note that the quotes
                        should always be included in order to handle special characters, spaces,etc.
                        For multiple inputs specify multiple --input_type on the command line.
                        Eg:
                            --input_type "data1" image --input_type "data2" opaque
                        These options get used by DSP runtime and following descriptions state how
                        input will be handled for each option.
                        Image:
                        Input is float between 0-255 and the input's mean is 0.0f and the input's
                        max is 255.0f. We will cast the float to uint8ts and pass the uint8ts to the
                        DSP.
                        Default:
                        Pass the input as floats to the dsp directly and the DSP will quantize it.
                        Opaque:
                        Assumes input is float because the consumer layer(i.e next layer) requires
                        it as float, therefore it won't be quantized.
                        Choices supported:
                            image
                            default
                            opaque
  --input_dtype INPUT_NAME INPUT_DTYPE
                        The names and datatype of the network input layers specified in the format
                        [input_name datatype], for example:
                            'data' 'float32'
                        Default is float32 if not specified
                        Note that the quotes should always be included in order to handlespecial
                        characters, spaces, etc.
                        For multiple inputs specify multiple --input_dtype on the command line like:
                            --input_dtype 'data1' 'float32' --input_dtype 'data2' 'float32'
  --input_encoding INPUT_ENCODING [INPUT_ENCODING ...], -e INPUT_ENCODING [INPUT_ENCODING ...]
                        Usage:     --input_encoding "INPUT_NAME" INPUT_ENCODING_IN
                        [INPUT_ENCODING_OUT]
                        Input encoding of the network inputs. Default is bgr.
                        e.g.
                            --input_encoding "data" rgba
                        Quotes must wrap the input node name to handle special characters,
                        spaces, etc. To specify encodings for multiple inputs, invoke
                        --input_encoding for each one.
                        e.g.
                            --input_encoding "data1" rgba --input_encoding "data2" other
                        Optionally, an output encoding may be specified for an input node by
                        providing a second encoding. The default output encoding is bgr.
                        e.g.
                            --input_encoding "data3" rgba rgb
                        Input encoding types:
                            image color encodings: bgr,rgb, nv21, nv12, ...
                            time_series: for inputs of rnn models;
                            other: not available above or is unknown.
                        Supported encodings:
                            bgr
                            rgb
                            rgba
                            argb32
                            nv21
                            nv12
                            time_series
                            other
  --input_layout INPUT_NAME INPUT_LAYOUT, -l INPUT_NAME INPUT_LAYOUT
                        Layout of each input tensor. If not specified, it will use the default
                        based on the Source Framework, shape of input and input encoding.
                        Accepted values are-
                            NCDHW, NDHWC, NCHW, NHWC, NFC, NCF, NTF, TNF, NF, NC, F, NONTRIVIAL
                        N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature, T = Time
                        NDHWC/NCDHW used for 5d inputs
                        NHWC/NCHW used for 4d image-like inputs
                        NFC/NCF used for inputs to Conv1D or other 1D ops
                        NTF/TNF used for inputs with time steps like the ones used for LSTM op
                        NF used for 2D inputs, like the inputs to Dense/FullyConnected layers
                        NC used for 2D inputs with 1 for batch and other for Channels (rarely used)
                        F used for 1D inputs, e.g. Bias tensor
                        NONTRIVIAL for everything elseFor multiple inputs specify multiple
                        --input_layout on the command line.
                        Eg:
                           --input_layout "data1" NCHW --input_layout "data2" NCHW
  --custom_io CUSTOM_IO
                        Use this option to specify a yaml file for custom IO.
  --dump_relay DUMP_RELAY
                        Dump Relay ASM and Params at the path provided with the argument
                        Usage: --dump_relay <path_to_dump>
  --show_unconsumed_nodes
                        Displays a list of unconsumed nodes, if there any are
                        found. Nodes which are unconsumed do not violate the
                        structural fidelity of the generated graph.
  --disable_batchnorm_folding
  --expand_lstm_op_structure
                        Enables optimization that breaks the LSTM op to equivalent math ops
  --keep_disconnected_nodes
                        Disable Optimization that removes Ops not connected to the main graph.
                        This optimization uses output names provided over commandline OR
                        inputs/outputs extracted from the Source model to determine the main graph
  -o OUTPUT_PATH, --output_path OUTPUT_PATH
                        Path where the converted Output model should be saved.If not specified, the
                        converter model will be written to a file with same name as the input model
  --copyright_file COPYRIGHT_FILE
                        Path to copyright file. If provided, the content of the file will be added
                        to the output model.
  --float_bitwidth FLOAT_BITWIDTH
                        Selects the bitwidth to use when using float for parameters (weights/bias)
                        and activations for all ops or a specific op (via encodings) selected
                        through encoding; 32 (default) or 16.
  --float_bw FLOAT_BW   Deprecated; use --float_bitwidth.
  --float_bias_bw FLOAT_BIAS_BW
                        Deprecated; use --float_bias_bitwidth.
  --overwrite_model_prefix
                        If option passed, model generator will use the output path name to use as
                        model prefix to name functions in <qnn_model_name>.cpp. (Useful for running
                        multiple models at once) eg: ModelName_composeGraphs. Default is to use
                        generic "QnnModel_".
  --exclude_named_tensors
                        Remove using source framework tensorNames; instead use a counter for naming
                        tensors. Note: This can potentially help to reduce the final model library
                        that will be generated(Recommended for deploying model). Default is False.
  -h, --help            show this help message and exit

Quantizer Options:
  --quantization_overrides QUANTIZATION_OVERRIDES
                        Use this option to specify a json file with parameters to use for
                        quantization. These will override any quantization data carried from
                        conversion (eg TF fake quantization) or calculated during the normal
                        quantization process. Format defined as per AIMET specification.
  --keep_quant_nodes    Use this option to keep activation quantization nodes in the graph rather
                        than stripping them.
  --input_list INPUT_LIST
                        Path to a file specifying the input data. This file should be a plain text
                        file, containing one or more absolute file paths per line. Each path is
                        expected to point to a binary file containing one input in the "raw" format,
                        ready to be consumed by the quantizer without any further preprocessing.
                        Multiple files per line separated by spaces indicate multiple inputs to the
                        network. See documentation for more details. Must be specified for
                        quantization. All subsequent quantization options are ignored when this is
                        not provided.
  --param_quantizer PARAM_QUANTIZER
                        Optional parameter to indicate the weight/bias quantizer to use. Must be followed by one of the following options:
                        "tf": Uses the real min/max of the data and specified bitwidth (default).
                        "enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
                        "adjusted": Deprecated.
                        "symmetric": Ensures min and max have the same absolute values about zero.
                                     Data will be stored as int#_t data such that the offset is always 0.
  --act_quantizer ACT_QUANTIZER
                        Optional parameter to indicate the activation quantizer to use. Must be followed by one of the following options:
                        "tf": Uses the real min/max of the data and specified bitwidth (default).
                        "enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
                        "adjusted": Deprecated.
                        "symmetric": Ensures min and max have the same absolute values about zero.
                                     Data will be stored as int#_t data such that the offset is always 0.
  --algorithms ALGORITHMS [ALGORITHMS ...]
                        Use this option to enable new optimization algorithms. Usage is:
                            --algorithms <algo_name1> ... The available optimization algorithms are:
                        "cle" - Cross layer equalization includes a number of methods for equalizing
                        weights and biases across layers in order to rectify imbalances that cause
                        quantization errors.
  --bias_bitwidth BIAS_BITWIDTH
                        Selects the bitwidth to use when quantizing the biases; 8 (default) or 32.
  --bias_bw BIAS_BW     Deprecated; use --bias_bitwidth.
  --act_bitwidth ACT_BITWIDTH
                        Selects the bitwidth to use when quantizing the activations; 8 (default) or 16.
  --act_bw ACT_BW       Deprecated; use --act_bitwidth.
  --weights_bitwidth WEIGHTS_BITWIDTH
                        Selects the bitwidth to use when quantizing the weights; 4 or 8 (default).
  --weight_bw WEIGHT_BW
                        Deprecated; use --weights_bitwidth.
  --float_bias_bitwidth FLOAT_BIAS_BITWIDTH
                        Selects the bitwidth to use when biases are in float; 32 or 16.
  --ignore_encodings    Use only quantizer generated encodings, ignoring any user or model provided
                        encodings.
                        Note: Cannot use --ignore_encodings with --quantization_overrides
  --use_per_channel_quantization
                        Enables per-channel quantization for convolution-based op weights.
                        This replaces the built-in model QAT encodings when used for a given weight.
  --use_per_row_quantization
                        Enables row wise quantization of Matmul and FullyConnected ops.
  --enable_per_row_quantized_bias
                        Enables row wise quantization of bias for FullyConnected op, when weights are per-row quantized.
  --float_fallback      Enables fallback to floating point (FP) instead of fixed point.
                        This option can be paired with --float_bitwidth to indicate the bitwidth for FP (by default 32).
                        If this option is enabled, --input_list must not be provided and --ignore_encodings must not be provided.
                        The external quantization encodings (encoding file/FakeQuant encodings) might be missing
                        quantization parameters for some interim tensors. First it will try to fill the gaps by
                        propagating across math-invariant functions. If the quantization params are still missing,
                        it applies fallback to nodes to floating point.
  --use_native_input_files
                        Boolean flag to indicate how to read input files:
                        1. float (default): reads inputs as floats and quantizes if necessary based
                        on quantization parameters in the model.
                        2. native: reads inputs assuming the data type to be native to the
                        model. For ex., uint8_t.
  --use_native_dtype    Note: This option is deprecated, use --use_native_input_files option in
                        future.
                        Boolean flag to indicate how to read input files:
                        1. float (default): reads inputs as floats and quantizes if necessary based
                        on quantization parameters in the model.
                        2. native: reads inputs assuming the data type to be native to the
                        model. For ex., uint8_t.
  --use_native_output_files
                        Use this option to indicate the data type of the output files
                        1. float (default): output the file as floats.
                        2. native: outputs the file that is native to the model. For ex.,
                        uint8_t.
  --disable_relu_squashing
                        Disables squashing of ReLU against convolution-based ops for quantized models.
  --restrict_quantization_steps ENCODING_MIN, ENCODING_MAX
                        Specifies the number of steps to use for computing quantization encodings
                        such that scale = (max - min) / number of quantization steps.
                        The option should be passed as a space separated pair of hexadecimal string
                        minimum and maximum values. i.e. --restrict_quantization_steps "MIN MAX".
                        Please note that this is a hexadecimal string literal and not a signed
                        integer, to supply a negative value an explicit minus sign is required.
                        E.g.--restrict_quantization_steps "-0x80 0x7F" indicates an example 8 bit range,
                        --restrict_quantization_steps "-0x8000 0x7F7F" indicates an example 16
                        bit range.

Custom Op Package Options:
  --op_package_lib OP_PACKAGE_LIB, -opl OP_PACKAGE_LIB
                        Use this argument to pass an op package library for quantization. Must be in
                        the form <op_package_lib_path:interfaceProviderName> and be separated by a
                        comma for multiple package libs
  --converter_op_package_lib CONVERTER_OP_PACKAGE_LIB, -cpl CONVERTER_OP_PACKAGE_LIB
                        Absolute path to converter op package library compiled by the OpPackage
                        generator. Must be separated by a comma for multiple package libraries.
                        Note: Libraries must follow the same order as the xml files.
                        E.g.1: --converter_op_package_lib absolute_path_to/libExample.so
                        E.g.2: -cpl absolute_path_to/libExample1.so,absolute_path_to/libExample2.so
  -p PACKAGE_NAME, --package_name PACKAGE_NAME
                        A global package name to be used for each node in the Model.cpp file.
                        Defaults to Qnn header defined package name
  --op_package_config OP_PACKAGE_CONFIG [OP_PACKAGE_CONFIG ...], -opc OP_PACKAGE_CONFIG [OP_PACKAGE_CONFIG ...]
                        Path to a Qnn Op Package XML configuration file that contains user defined
                        custom operations.

Architecture Checker Options(Experimental):
  --arch_checker        Note: This option will be soon deprecated. Use the qnn-architecture-checker tool to achieve the same result.

Note: Only one of: {'package_name', 'op_package_config'} can be specified

Basic command line usage looks like:

$ qnn-tflite-converter -i <path>/model.tflite
                       -d <optional_network_input_name> <dims>
                       -o <optional_output_path>
                       -p <optional_package_name> # Defaults to "qti.aisw"

qnn-pytorch-converter

The qnn-pytorch-converter tool converts a PyTorch model to a CPP file representing the model as a series of QNN API calls. Additionally, a binary file containing static weights of the model is produced.

usage: qnn-pytorch-converter -d INPUT_NAME INPUT_DIM [--out_node OUT_NAMES]
                             [--input_type INPUT_NAME INPUT_TYPE]
                             [--input_dtype INPUT_NAME INPUT_DTYPE] [--input_encoding  ...]
                             [--input_layout INPUT_NAME INPUT_LAYOUT] [--custom_io CUSTOM_IO]
                             [--preserve_io [PRESERVE_IO [PRESERVE_IO ...]]]
                             [--dump_relay DUMP_RELAY] [--dry_run] [--dump_out_names]
                             [--pytorch_custom_op_lib PYTORCH_CUSTOM_OP_LIB]
                             [--quantization_overrides QUANTIZATION_OVERRIDES] [--keep_quant_nodes]
                             [--disable_batchnorm_folding] [--expand_lstm_op_structure]
                             [--keep_disconnected_nodes]
                             [--input_list INPUT_LIST] [--param_quantizer PARAM_QUANTIZER]
                             [--act_quantizer ACT_QUANTIZER]
                             [--algorithms ALGORITHMS [ALGORITHMS ...]]
                             [--bias_bitwidth BIAS_BITWIDTH] [--bias_bw BIAS_BW]
                             [--act_bitwidth ACT_BITWIDTH] [--act_bw ACT_BW]
                             [--weights_bitwidth WEIGHTS_BITWIDTH] [--weight_bw WEIGHT_BW]
                             [--float_bias_bitwidth FLOAT_BIAS_BITWIDTH] [--ignore_encodings]
                             [--use_per_channel_quantization] [--use_per_row_quantization]
                             [--enable_per_row_quantized_bias]
                             [--float_fallback] [--use_native_input_files] [--use_native_dtype]
                             [--use_native_output_files] [--disable_relu_squashing]
                             [--restrict_quantization_steps ENCODING_MIN, ENCODING_MAX]
                             --input_network INPUT_NETWORK [--debug [DEBUG]]
                             [-o OUTPUT_PATH] [--copyright_file COPYRIGHT_FILE]
                             [--float_bitwidth FLOAT_BITWIDTH] [--float_bw FLOAT_BW]
                             [--float_bias_bw FLOAT_BIAS_BW] [--overwrite_model_prefix]
                             [--exclude_named_tensors] [--op_package_lib OP_PACKAGE_LIB]
                             [--converter_op_package_lib CONVERTER_OP_PACKAGE_LIB]
                             [-p PACKAGE_NAME | --op_package_config CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]]
                             [-h] [--arch_checker]

Script to convert PyTorch model into QNN

required arguments:
  -d INPUT_NAME INPUT_DIM, --input_dim INPUT_NAME INPUT_DIM
                        The names and dimensions of the network input layers specified in the format
                        [input_name comma-separated-
                        dimensions], for example:
                            'data' 1,3,224,224
                        Note that the quotes should always be included in order to handle special
                        characters, spaces, etc.
                        For multiple inputs specify multiple --input_dim on the command line like:
                            --input_dim 'data1' 1,3,224,224 --input_dim 'data2' 1,50,100,3
  --input_network INPUT_NETWORK, -i INPUT_NETWORK
                        Path to the source framework model.

optional arguments:
  --out_node OUT_NAMES, --out_name OUT_NAMES
                        Name of the graph's output Tensor Names. Multiple output names should be
                        provided separately like:
                            --out_name out_1 --out_name out_2
  --input_type INPUT_NAME INPUT_TYPE, -t INPUT_NAME INPUT_TYPE
                        Type of data expected by each input op/layer. Type for each input is
                        |default| if not specified. For example: "data" image.Note that the quotes
                        should always be included in order to handle special characters, spaces, etc.
                        For multiple inputs specify multiple --input_type on the command line.
                        Eg:
                            --input_type "data1" image --input_type "data2" opaque
                        These options get used by DSP runtime and following descriptions state how
                        input will be handled for each option.
                        Image:
                        Input is float between 0-255 and the input's mean is 0.0f and the input's
                        max is 255.0f. We will cast the float to uint8ts and pass the uint8ts to the
                        DSP.
                        Default:
                        Pass the input as floats to the dsp directly and the DSP will quantize it.
                        Opaque:
                        Assumes
                        input is float because the consumer layer(i.e next layer) requires
                        it as float, therefore it won't be quantized.
                        Choices supported:
                            image
                            default
                            opaque
  --input_dtype INPUT_NAME INPUT_DTYPE
                        The names and datatype of the network input layers specified in the format
                        [input_name datatype], for example:
                            'data' 'float32'
                        Default is float32 if not specified
                        Note that the quotes should always be included in order to handlespecial
                        characters, spaces, etc.
                        For multiple inputs specify multiple --input_dtype on the command line like:
                            --input_dtype 'data1' 'float32' --input_dtype 'data2' 'float32'
  --input_encoding INPUT_ENCODING [INPUT_ENCODING ...], -e INPUT_ENCODING [INPUT_ENCODING ...]
                        Usage:     --input_encoding "INPUT_NAME" INPUT_ENCODING_IN
                        [INPUT_ENCODING_OUT]
                        Input encoding of the network inputs. Default is bgr.
                        e.g.
                            --input_encoding "data" rgba
                        Quotes must wrap the input node name to handle special characters,
                        spaces, etc. To specify encodings for multiple inputs, invoke
                        --input_encoding for each one.
                        e.g.
                            --input_encoding "data1" rgba --input_encoding "data2" other
                        Optionally, an output encoding may be specified for an input node by
                        providing a second encoding. The default output encoding is bgr.
                        e.g.
                            --input_encoding "data3" rgba rgb
                        Input encoding types:
                            image color encodings: bgr,rgb, nv21, nv12, ...
                            time_series: for inputs of rnn models;
                            other: not available above or is unknown.
                        Supported encodings:
                            bgr
                            rgb
                            rgba
                            argb32
                            nv21
                            nv12
                            time_series
                            other
  --input_layout INPUT_NAME INPUT_LAYOUT, -l INPUT_NAME INPUT_LAYOUT
                        Layout of each input tensor. If not specified, it will use the default
                        based on the Source Framework, shape of input and input encoding.
                        Accepted values are-
                            NCDHW, NDHWC, NCHW, NHWC, NFC, NCF, NTF, TNF, NF, NC, F, NONTRIVIAL
                        N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature, T = Time
                        NDHWC/NCDHW used for 5d inputs
                        NHWC/NCHW used for 4d image-like inputs
                        NFC/NCF used for inputs to Conv1D or other 1D ops
                        NTF/TNF used for inputs with time steps like the ones used for LSTM op
                        NF used for 2D inputs, like the inputs to Dense/FullyConnected layers
                        NC used for 2D inputs with 1 for batch and other for Channels (rarely used)
                        F used for 1D inputs, e.g. Bias tensor
                        NONTRIVIAL for everything elseFor multiple inputs specify multiple
                        --input_layout on the command line.
                        Eg:
                           --input_layout "data1" NCHW --input_layout "data2" NCHW
  --custom_io CUSTOM_IO
                        Use this option to specify a yaml file for custom IO.
  --preserve_io [PRESERVE_IO [PRESERVE_IO ...]]
                        Use this option to preserve IO layout and datatype. The different ways of
                        using this option are as follows:
                            --preserve_io layout <space separated list of names of inputs and
                        outputs of the graph>
                            --preserve_io datatype <space separated list of names of inputs and
                        outputs of the graph>
                        In this case, user should also specify the string - layout or datatype in
                        the command to indicate that converter needs to
                        preserve the layout or datatype. e.g.
                        --preserve_io layout input1 input2 output1
                        --preserve_io datatype input1 input2 output1
                        Optionally, the user may choose to preserve the layout and/or datatype for
                        all the inputs and outputs of the graph.
                        This can be done in the following two ways:
                            --preserve_io layout
                            --preserve_io datatype
                        Additionally, the user may choose to preserve both layout and datatypes for
                        all IO tensors by just passing the option as follows:
                            --preserve_io
                        Note: Only one of the above usages are allowed at a time.
                        Note: --custom_io gets higher precedence than --preserve_io.
  --dump_relay DUMP_RELAY
                        Dump Relay ASM and Params at the path provided with the argument
                        Usage: --dump_relay <path_to_dump>
  --dry_run             Evaluates the model without actually converting any ops, and
                         returns unsupported ops if any.
  --dump_out_names      Dump output names mapped from QNN CPP stored names to converter used
                        names and save to file 'model_output_names.json'.
  --pytorch_custom_op_lib PYTORCH_CUSTOM_OP_LIB, -pcl PYTORCH_CUSTOM_OP_LIB
                        Absolute path to the PyTorch library containing the custom op definition.
                        Multiple custom op libraries must be comma-separated.
                        For PyTorch custom op details, refer to:
                             https://pytorch.org/tutorials/advanced/torch_script_custom_ops.html
                        For custom C++ extension details, refer to:
                             https://pytorch.org/tutorials/advanced/cpp_extension.html
                        Eg. 1: --pytorch_custom_op_lib absolute_path_to/Example.so
                        Eg. 2: -pcl absolute_path_to/Example1.so,absolute_path_to/Example2.so
  --disable_batchnorm_folding
  --expand_lstm_op_structure
                        Enables optimization that breaks the LSTM op to equivalent math ops
  --keep_disconnected_nodes
                        Disable Optimization that removes Ops not connected to the main graph.
                        This optimization uses output names provided over commandline OR
                        inputs/outputs extracted from the Source model to determine the main graph
  --debug [DEBUG]       Run the converter in debug mode.
  -o OUTPUT_PATH, --output_path OUTPUT_PATH
                        Path where the converted Output model should be saved.If not specified, the
                        converter model will be written to a file with same name as the input model
  --copyright_file COPYRIGHT_FILE
                        Path to copyright file. If provided, the content of the file will be added
                        to the output model.
  --float_bitwidth FLOAT_BITWIDTH
                        Selects the bitwidth to use when using float for parameters (weights/bias)
                        and activations for all ops or a specific op (via encodings) selected
                        through encoding; 32 (default) or 16.
  --float_bw FLOAT_BW   Deprecated; use --float_bitwidth.
  --float_bias_bw FLOAT_BIAS_BW
                        Deprecated; use --float_bias_bitwidth.
  --overwrite_model_prefix
                        If option passed, model generator will use the output path name to use as
                        model prefix to name functions in <qnn_model_name>.cpp. (Useful for running
                        multiple models at once) eg: ModelName_composeGraphs. Default is to use
                        generic "QnnModel_".
  --exclude_named_tensors
                        Remove using source framework tensorNames; instead use a counter for naming
                        tensors. Note: This can potentially help to reduce the final model library
                        that will be generated(Recommended for deploying model). Default is False.
  -h, --help            show this help message and exit

Quantizer Options:
  --quantization_overrides QUANTIZATION_OVERRIDES
                        Use this option to specify a json file with parameters to use for
                        quantization. These will override any quantization data carried from
                        conversion (eg TF fake quantization) or calculated during the normal
                        quantization process. Format defined as per AIMET specification.
  --keep_quant_nodes    Use this option to keep activation quantization nodes in the graph rather
                        than stripping them.
  --input_list INPUT_LIST
                        Path to a file specifying the input data. This file should be a plain text
                        file, containing one or more absolute file paths per line. Each path is
                        expected to point to a binary file containing one input in the "raw" format,
                        ready to be consumed by the quantizer without any further preprocessing.
                        Multiple files per line separated by spaces indicate multiple inputs to the
                        network. See documentation for more details. Must be specified for
                        quantization. All subsequent quantization options are ignored when this is
                        not provided.
  --param_quantizer PARAM_QUANTIZER
                        Optional parameter to indicate the weight/bias quantizer to use. Must be followed by one of the following options:
                        "tf": Uses the real min/max of the data and specified bitwidth (default).
                        "enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
                        "adjusted": Deprecated.
                        "symmetric": Ensures min and max have the same absolute values about zero.
                                     Data will be stored as int#_t data such that the offset is always 0.
  --act_quantizer ACT_QUANTIZER
                        Optional parameter to indicate the activation quantizer to use. Must be followed by one of the following options:
                        "tf": Uses the real min/max of the data and specified bitwidth (default).
                        "enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
                        "adjusted": Deprecated.
                        "symmetric": Ensures min and max have the same absolute values about zero.
                                     Data will be stored as int#_t data such that the offset is always 0.
  --algorithms ALGORITHMS [ALGORITHMS ...]
                        Use this option to enable new optimization algorithms. Usage is:
                            --algorithms <algo_name1> ... The available optimization algorithms are:
                        "cle" - Cross layer equalization includes a number of methods for equalizing
                        weights and biases across layers in order to rectify imbalances that cause
                        quantization errors.
  --bias_bitwidth BIAS_BITWIDTH
                        Selects the bitwidth to use when quantizing the biases; 8 (default) or 32.
  --bias_bw BIAS_BW     Deprecated; use --bias_bitwidth.
  --act_bitwidth ACT_BITWIDTH
                        Selects the bitwidth to use when quantizing the activations; 8 (default) or 16.
  --act_bw ACT_BW       Deprecated; use --act_bitwidth.
  --weights_bitwidth WEIGHTS_BITWIDTH
                        Selects the bitwidth to use when quantizing the weights; 4 or 8 (default).
  --weight_bw WEIGHT_BW
                        Deprecated; use --weights_bitwidth.
  --float_bias_bitwidth FLOAT_BIAS_BITWIDTH
                        Selects the bitwidth to use when biases are in float; 32 or 16.
  --ignore_encodings    Use only quantizer generated encodings, ignoring any user or model provided
                        encodings.
                        Note: Cannot use --ignore_encodings with --quantization_overrides
  --use_per_channel_quantization
                        Enables per-channel quantization for convolution-based op weights.
                        This replaces the built-in model QAT encodings when used for a given weight.
  --use_per_row_quantization
                        Enables row wise quantization of Matmul and FullyConnected ops.
  --enable_per_row_quantized_bias
                        Enables row wise quantization of bias for FullyConnected op, when weights are per-row quantized.
  --float_fallback      Enables fallback to floating point (FP) instead of fixed point.
                        This option can be paired with --float_bitwidth to indicate the bitwidth for FP (by default 32).
                        If this option is enabled, --input_list must not be provided and --ignore_encodings must not be provided.
                        The external quantization encodings (encoding file/FakeQuant encodings) might be missing
                        quantization parameters for some interim tensors. First it will try to fill the gaps by
                        propagating across math-invariant functions. If the quantization params are still missing,
                        it applies fallback to nodes to floating point.
  --use_native_input_files
                        Boolean flag to indicate how to read input files:
                        1. float (default): reads inputs as floats and quantizes if necessary based
                        on quantization parameters in the model.
                        2. native: reads inputs assuming the data type to be native to the
                        model. For ex., uint8_t.
  --use_native_dtype    Note: This option is deprecated, use --use_native_input_files option in
                        future.
                        Boolean flag to indicate how to read input files:
                        1. float (default): reads inputs as floats and quantizes if necessary based
                        on quantization parameters in the model.
                        2. native: reads inputs assuming the data type to be native to the
                        model. For ex., uint8_t.
  --use_native_output_files
                        Use this option to indicate the data type of the output files
                        1. float (default): output the file as floats.
                        2. native: outputs the file that is native to the model. For ex.,
                        uint8_t.
  --disable_relu_squashing
                        Disables squashing of ReLU against convolution-based ops for quantized models.
  --restrict_quantization_steps ENCODING_MIN, ENCODING_MAX
                        Specifies the number of steps to use for computing quantization encodings
                        such that scale = (max - min) / number of quantization steps.
                        The option should be passed as a space separated pair of hexadecimal string
                        minimum and maximum values. i.e. --restrict_quantization_steps "MIN MAX".
                        Please note that this is a hexadecimal string literal and not a signed
                        integer, to supply a negative value an explicit minus sign is required.
                        E.g.--restrict_quantization_steps "-0x80 0x7F" indicates an example 8 bit range,
                        --restrict_quantization_steps "-0x8000 0x7F7F" indicates an example 16
                        bit range.

Custom Op Package Options:
  --op_package_lib OP_PACKAGE_LIB, -opl OP_PACKAGE_LIB
                        Use this argument to pass an op package library for quantization. Must be in
                        the form <op_package_lib_path:interfaceProviderName> and be separated by a
                        comma for multiple package libs
  --converter_op_package_lib CONVERTER_OP_PACKAGE_LIB, -cpl CONVERTER_OP_PACKAGE_LIB
                        Absolute path to converter op package library compiled by the OpPackage
                        generator. Must be separated by a comma for multiple package libraries.
                        Note: Libraries must follow the same order as the xml files.
                        E.g.1: --converter_op_package_lib absolute_path_to/libExample.so
                        E.g.2: -cpl absolute_path_to/libExample1.so,absolute_path_to/libExample2.so
  -p PACKAGE_NAME, --package_name PACKAGE_NAME
                        A global package name to be used for each node in the Model.cpp file.
                        Defaults to Qnn header defined package name
  --op_package_config CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...], -opc CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]
                        Path to a Qnn Op Package XML configuration file that contains user defined
                        custom operations.

Architecture Checker Options(Experimental):
  --arch_checker        Note: This option will be soon deprecated. Use the qnn-architecture-checker tool to achieve the same result.

Note: Only one of: {‘package_name’, ‘op_package_config’} can be specified

Basic command line usage looks like:

$ qnn-pytorch-converter -i <path>/model.pt
                       -d <network_input_name> <dims>
                       -o <optional_output_path>
                       -p <optional_package_name> # Defaults to "qti.aisw"

qnn-onnx-converter

The qnn-onnx-converter tool converts a model from the ONNX framework to a CPP file representing the model as a series of QNN API calls. Additionally, a binary file containing static weights of the model is produced.

usage: qnn-onnx-converter [--out_node OUT_NAMES] [--input_type INPUT_NAME INPUT_TYPE]
                          [--input_dtype INPUT_NAME INPUT_DTYPE] [--input_encoding [ ...]]
                          [--input_layout INPUT_NAME INPUT_LAYOUT] [--custom_io CUSTOM_IO]
                          [--preserve_io [PRESERVE_IO ...]]
                          [--dump_qairt_io_config_yaml [DUMP_QAIRT_IO_CONFIG_YAML]]
                          [--enable_framework_trace] [--dry_run [DRY_RUN]] [-d INPUT_NAME INPUT_DIM]
                          [-n] [-b BATCH] [-s SYMBOL_NAME VALUE]
                          [--dump_custom_io_config_template DUMP_CUSTOM_IO_CONFIG_TEMPLATE]
                          [--quantization_overrides QUANTIZATION_OVERRIDES] [--keep_quant_nodes]
                          [--disable_batchnorm_folding] [--expand_lstm_op_structure]
                          [--keep_disconnected_nodes] [--preserve_onnx_output_order]
                          [--apply_masked_softmax {compressed,uncompressed}]
                          [--packed_masked_softmax_inputs PACKED_MASKED_SOFTMAX_INPUTS [PACKED_MASKED_SOFTMAX_INPUTS ...]]
                          [--packed_max_seq PACKED_MAX_SEQ] [--input_list INPUT_LIST]
                          [--param_quantizer PARAM_QUANTIZER] [--act_quantizer ACT_QUANTIZER]
                          [--algorithms ALGORITHMS [ALGORITHMS ...]] [--bias_bitwidth BIAS_BITWIDTH]
                          [--bias_bw BIAS_BITWIDTH] [--act_bitwidth ACT_BITWIDTH]
                          [--act_bw ACT_BITWIDTH] [--weights_bitwidth WEIGHTS_BITWIDTH]
                          [--weight_bw WEIGHTS_BITWIDTH] [--ignore_encodings]
                          [--use_per_channel_quantization] [--use_per_row_quantization]
                          [--enable_per_row_quantized_bias] [--float_fallback]
                          [--use_native_input_files] [--use_native_dtype]
                          [--use_native_output_files] [--disable_relu_squashing]
                          [--restrict_quantization_steps ENCODING_MIN, ENCODING_MAX]
                          [--pack_4_bit_weights] [--keep_weights_quantized]
                          [--act_quantizer_calibration ACT_QUANTIZER_CALIBRATION]
                          [--param_quantizer_calibration PARAM_QUANTIZER_CALIBRATION]
                          [--act_quantizer_schema ACT_QUANTIZER_SCHEMA]
                          [--param_quantizer_schema PARAM_QUANTIZER_SCHEMA]
                          [--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE]
                          [--dump_qairt_quantizer_command DUMP_QAIRT_QUANTIZER_COMMAND]
                          [--quantizer_log QUANTIZER_LOG]
                          [--quantizer_log_level {LogLevel.NONE,LogLevel.TRACE,LogLevel.INFO}]
                          --input_network INPUT_NETWORK [--debug [DEBUG]] [-o OUTPUT_PATH]
                          [--copyright_file COPYRIGHT_FILE] [--float_bitwidth FLOAT_BITWIDTH]
                          [--float_bw FLOAT_BW] [--float_bias_bitwidth FLOAT_BIAS_BITWIDTH]
                          [--float_bias_bw FLOAT_BIAS_BW] [--overwrite_model_prefix]
                          [--exclude_named_tensors] [--model_version MODEL_VERSION]
                          [--op_package_lib OP_PACKAGE_LIB]
                          [--converter_op_package_lib CONVERTER_OP_PACKAGE_LIB]
                          [-p PACKAGE_NAME | --op_package_config CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]]
                          [--arch_checker] [-h] [--validate_models]

Script to convert ONNX model into QNN

required arguments:
  --input_network INPUT_NETWORK, -i INPUT_NETWORK
                        Path to the source framework model.

optional arguments:
  --out_node OUT_NAMES, --out_name OUT_NAMES
                        Name of the graph's output Tensor Names. Multiple output names should be
                        provided separately like:
                            --out_name out_1 --out_name out_2
  --input_type INPUT_NAME INPUT_TYPE, -t INPUT_NAME INPUT_TYPE
                        Type of data expected by each input op/layer. Type for each input is
                        |default| if not specified. For example: "data" image.Note that the quotes
                        should always be included in order to handle special characters, spaces,etc.
                        For multiple inputs specify multiple --input_type on the command line.
                        Eg:
                           --input_type "data1" image --input_type "data2" opaque
                        These options get used by DSP runtime and following descriptions state how
                        input will be handled for each option.
                        Image:
                        Input is float between 0-255 and the input's mean is 0.0f and the input's
                        max is 255.0f. We will cast the float to uint8ts and pass the uint8ts to the
                        DSP.
                        Default:
                        Pass the input as floats to the dsp directly and the DSP will quantize it.
                        Opaque:
                        Assumes input is float because the consumer layer(i.e next layer) requires
                        it as float, therefore it won't be quantized.
                        Choices supported:
                           image
                           default
                           opaque
  --input_dtype INPUT_NAME INPUT_DTYPE
                        The names and datatype of the network input layers specified in the format
                        [input_name datatype], for example:
                            'data' 'float32'
                        Default is float32 if not specified
                        Note that the quotes should always be included in order to handlespecial
                        characters, spaces, etc.
                        For multiple inputs specify multiple --input_dtype on the command line like:
                            --input_dtype 'data1' 'float32' --input_dtype 'data2' 'float32'
  --input_encoding INPUT_ENCODING [INPUT_ENCODING ...], -e INPUT_ENCODING [INPUT_ENCODING ...]
                        Usage:     --input_encoding "INPUT_NAME" INPUT_ENCODING_IN
                        [INPUT_ENCODING_OUT]
                        Input encoding of the network inputs. Default is bgr.
                        e.g.
                           --input_encoding "data" rgba
                        Quotes must wrap the input node name to handle special characters,
                        spaces, etc. To specify encodings for multiple inputs, invoke
                        --input_encoding for each one.
                        e.g.
                            --input_encoding "data1" rgba --input_encoding "data2" other
                        Optionally, an output encoding may be specified for an input node by
                        providing a second encoding. The default output encoding is bgr.
                        e.g.
                            --input_encoding "data3" rgba rgb
                        Input encoding types:
                             image color encodings: bgr,rgb, nv21, nv12, ...
                            time_series: for inputs of rnn models;
                            other: not available above or is unknown.
                        Supported encodings:
                           bgr
                           rgb
                           rgba
                           argb32
                           nv21
                           nv12
                           time_series
                           other
  --input_layout INPUT_NAME INPUT_LAYOUT, -l INPUT_NAME INPUT_LAYOUT
                        Layout of each input tensor. If not specified, it will use the default
                        based on the Source Framework, shape of input and input encoding.
                        Accepted values are-
                            NCDHW, NDHWC, NCHW, NHWC, HWIO, OIHW, NFC, NCF, NTF, TNF, NF, NC, F,
                        NONTRIVIAL
                        N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature, T =
                        Time
                        NDHWC/NCDHW used for 5d inputs
                        NHWC/NCHW used for 4d image-like inputs
                        NFC/NCF used for inputs to Conv1D or other 1D ops
                        NTF/TNF used for inputs with time steps like the ones used for LSTM op
                        NF used for 2D inputs, like the inputs to Dense/FullyConnected layers
                        NC used for 2D inputs with 1 for batch and other for Channels (rarely used)
                        F used for 1D inputs, e.g. Bias tensor
                        NONTRIVIAL for everything elseFor multiple inputs specify multiple
                        --input_layout on the command line.
                        Eg:
                            --input_layout "data1" NCHW --input_layout "data2" NCHW
  --custom_io CUSTOM_IO
                        Use this option to specify a yaml file for custom IO.
  --preserve_io [PRESERVE_IO ...]
                        Use this option to preserve IO layout and datatype. The different ways of
                        using this option are as follows:
                            --preserve_io layout <space separated list of names of inputs and
                        outputs of the graph>
                            --preserve_io datatype <space separated list of names of inputs and
                        outputs of the graph>
                        In this case, user should also specify the string - layout or datatype in
                        the command to indicate that converter needs to
                        preserve the layout or datatype. e.g.
                           --preserve_io layout input1 input2 output1
                           --preserve_io datatype input1 input2 output1
                        Optionally, the user may choose to preserve the layout and/or datatype for
                        all the inputs and outputs of the graph.
                        This can be done in the following two ways:
                            --preserve_io layout
                            --preserve_io datatype
                        Additionally, the user may choose to preserve both layout and datatypes for
                        all IO tensors by just passing the option as follows:
                            --preserve_io
                        Note: Only one of the above usages are allowed at a time.
                        Note: --custom_io gets higher precedence than --preserve_io.
  --dump_qairt_io_config_yaml [DUMP_QAIRT_IO_CONFIG_YAML]
                        Use this option to dump a yaml file which contains the equivalent I/O
                        configurations of QAIRT Converter along with the QAIRT Converter Command and
                        can be passed to QAIRT Converter using the option --io_config.
  --enable_framework_trace
                        Use this option to enable converter to trace the op/tensor change
                        information.
                        Currently framework op trace is supported only for ONNX converter.
  --dry_run [DRY_RUN]   Evaluates the model without actually converting any ops, and returns
                        unsupported ops/attributes as well as unused inputs and/or outputs if any.
                        Leave empty or specify "info" to see dry run as a table, or specify "debug"
                        to show more detailed messages only"
  -d INPUT_NAME INPUT_DIM, --input_dim INPUT_NAME INPUT_DIM
                        The name and dimension of all the input buffers to the network specified in
                        the format [input_name comma-separated-dimensions],
                        for example: 'data' 1,224,224,3.
                        Note that the quotes should always be included in order to handle special
                        characters, spaces, etc.
                        For scalar inputs, use a single dimension `0` to indicate that the input is a scalar value.
                        For multiple inputs specify multiple --input_dim on the command line like:
                            --input_dim 'data1' 1,224,224,3 --input_dim 'data2' 0
                        NOTE: This feature works only with Onnx 1.6.0 and above
  -n, --no_simplification
                        Do not attempt to simplify the model automatically. This may prevent some
                        models from properly converting
                        when sequences of unsupported static operations are present.
  -b BATCH, --batch BATCH
                        The batch dimension override. This will take the first dimension of all
                        inputs and treat it as a batch dim, overriding it with the value provided
                        here. For example:
                        --batch 6
                        will result in a shape change from [1,3,224,224] to [6,3,224,224].
                        If there are inputs without batch dim this should not be used and each input
                        should be overridden independently using -d option for input dimension
                        overrides.
  -s SYMBOL_NAME VALUE, --define_symbol SYMBOL_NAME VALUE
                        This option allows overriding specific input dimension symbols. For instance
                        you might see input shapes specified with variables such as :
                        data: [1,3,height,width]
                        To override these simply pass the option as:
                        --define_symbol height 224 --define_symbol width 448
                        which results in dimensions that look like:
                        data: [1,3,224,448]
  --dump_custom_io_config_template DUMP_CUSTOM_IO_CONFIG_TEMPLATE
                        Dumps the yaml template for Custom I/O configuration. This file canbe edited
                        as per the custom requirements and passed using the option --custom_ioUse
                        this option to specify a yaml file to which the custom IO config template is
                        dumped.
  --disable_batchnorm_folding
  --expand_lstm_op_structure
                        Enables optimization that breaks the LSTM op to equivalent math ops
  --keep_disconnected_nodes
                        Disable Optimization that removes Ops not connected to the main graph.
                        This optimization uses output names provided over commandline OR
                        inputs/outputs extracted from the Source model to determine the main graph
  --preserve_onnx_output_order
                        Preserve the ONNX output order in the converted graph. Note: This may
                        slightly impact performance.
  --debug [DEBUG]       Run the converter in debug mode.
  -o OUTPUT_PATH, --output_path OUTPUT_PATH
                        Path where the converted Output model should be saved.If not specified, the
                        converter model will be written to a file with same name as the input model
  --copyright_file COPYRIGHT_FILE
                        Path to copyright file. If provided, the content of the file will be added
                        to the output model.
  --float_bitwidth FLOAT_BITWIDTH
                        Use the --float_bitwidth option to convert the graph to the specified float
                        bitwidth, either 32 (default) or 16.
  --float_bw FLOAT_BW   Note: --float_bw is deprecated, use --float_bitwidth.
  --float_bias_bitwidth FLOAT_BIAS_BITWIDTH
                        Use the --float_bias_bitwidth option to select the bitwidth to use for float
                        bias tensor
  --float_bias_bw FLOAT_BIAS_BW
                        Note: --float_bias_bw is deprecated, use --float_bias_bitwidth.
  --overwrite_model_prefix
                        If option passed, model generator will use the output path name to use as
                        model prefix to name functions in <qnn_model_name>.cpp. (Useful for running
                        multiple models at once) eg: ModelName_composeGraphs. Default is to use
                        generic "QnnModel_".
  --exclude_named_tensors
                        Remove using source framework tensorNames; instead use a counter for naming
                        tensors. Note: This can potentially help to reduce  the final model library
                        that will be generated(Recommended for deploying model). Default is False.
  --model_version MODEL_VERSION
                        User-defined ASCII string to identify the model, only first 64 bytes will be
                        stored
  -h, --help            show this help message and exit
  --validate_models     Validate the original onnx model against optimized onnx model.
                        Constant inputs with all value 1s will be generated and will be used
                        by both models and their outputs are checked against each other.
                        The {'option_strings': ['--validate_models'], 'dest': 'validate_models',
                        'nargs': 0, 'const': True, 'default': False, 'type': None, 'choices': None,
                        'required': False, 'help': 'Validate the original onnx model against
                        optimized onnx model.\nConstant inputs with all value 1s will be generated
                        and will be used \nby both models and their outputs are checked against each
                        other.\nThe % average error and 90th percentile of output differences will
                        be calculated for this.\nNote: Usage of this flag will incur extra time due
                        to inference of the models.', 'metavar': None, 'container':
                        <argparse._ArgumentGroup object at 0x7f08ba7c5ae0>, 'prog': 'qnn-onnx-
                        converter'}verage error and 90th percentile of output differences will be
                        calculated for this.
                        Note: Usage of this flag will incur extra time due to inference of the
                        models.

Custom Op Package Options:
  --op_package_lib OP_PACKAGE_LIB, -opl OP_PACKAGE_LIB
                        Use this argument to pass an op package library for quantization. Must be in
                        the form <op_package_lib_path:interfaceProviderName> and be separated by a
                        comma for multiple package libs
  --converter_op_package_lib CONVERTER_OP_PACKAGE_LIB, -cpl CONVERTER_OP_PACKAGE_LIB
                        Absolute path to converter op package library compiled by the OpPackage
                        generator. Must be separated by a comma for multiple package libraries.
                        Note: Order of converter op package libraries must follow the order of xmls.
                        Ex1: --converter_op_package_lib absolute_path_to/libExample.so
                        Ex2: -cpl absolute_path_to/libExample1.so,absolute_path_to/libExample2.so
  -p PACKAGE_NAME, --package_name PACKAGE_NAME
                        A global package name to be used for each node in the Model.cpp file.
                        Defaults to Qnn header defined package name
  --op_package_config CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...], -opc CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]
                        Path to a Qnn Op Package XML configuration file that contains user defined
                        custom operations.

Quantizer Options:
  --quantization_overrides QUANTIZATION_OVERRIDES
                        Use this option to specify a json file with parameters to use for
                        quantization. These will override any quantization data carried from
                        conversion (eg TF fake quantization) or calculated during the normal
                        quantization process. Format defined as per AIMET specification.
  --keep_quant_nodes    Use this option to keep activation quantization nodes in the graph rather
                        than stripping them.
  --input_list INPUT_LIST
                        Path to a file specifying the input data. This file should be a plain text
                        file, containing one or more absolute file paths per line. Each path is
                        expected to point to a binary file containing one input in the "raw" format,
                        ready to be consumed by the quantizer without any further preprocessing.
                        Multiple files per line separated by spaces indicate multiple inputs to the
                        network. See documentation for more details. Must be specified for
                        quantization. All subsequent quantization options are ignored when this is
                        not provided.
  --param_quantizer PARAM_QUANTIZER
                        Optional parameter to indicate the weight/bias quantizer to use. Must be
                        followed by one of the following options:
                        "tf": Uses the real min/max of the data and specified bitwidth (default).
                        "enhanced": Uses an algorithm useful for quantizing models with long tails
                        present in the weight distribution.
                        "adjusted": Note: "adjusted" mode is deprecated.
                        "symmetric": Ensures min and max have the same absolute values about zero.
                        Data will be stored as int#_t data such that the offset is always 0.Note:
                        Legacy option --param_quantizer will be deprecated, use
                        --param_quantizer_calibration instead
  --act_quantizer ACT_QUANTIZER
                        Optional parameter to indicate the activation quantizer to use. Must be
                        followed by one of the following options:
                        "tf": Uses the real min/max of the data and specified bitwidth (default).
                        "enhanced": Uses an algorithm useful for quantizing models with long tails
                        present in the weight distribution.
                        "adjusted": Note: "adjusted" mode is deprecated.
                        "symmetric": Ensures min and max have the same absolute values about zero.
                        Data will be stored as int#_t data such that the offset is always 0.Note:
                        Legacy option --act_quantizer will be deprecated, use
                        --act_quantizer_calibration instead
  --algorithms ALGORITHMS [ALGORITHMS ...]
                        Use this option to enable new optimization algorithms. Usage is:
                        --algorithms <algo_name1> ... The available optimization algorithms are:
                        "cle" - Cross layer equalization includes a number of methods for equalizing
                        weights and biases across layers in order to rectify imbalances that cause
                        quantization errors.
  --bias_bitwidth BIAS_BITWIDTH
                        Use the --bias_bitwidth option to select the bitwidth to use when quantizing
                        the biases, either 8 (default) or 32.
  --bias_bw BIAS_BITWIDTH
                        Note: --bias_bw is deprecated, use --bias_bitwidth.
  --act_bitwidth ACT_BITWIDTH
                        Use the --act_bitwidth option to select the bitwidth to use when quantizing
                        the activations, either 8 (default) or 16.
  --act_bw ACT_BITWIDTH
                        Note: --act_bw is deprecated, use --act_bitwidth.
  --weights_bitwidth WEIGHTS_BITWIDTH
                        Use the --weights_bitwidth option to select the bitwidth to use when
                        quantizing the weights, either 4 or 8 (default).
  --weight_bw WEIGHTS_BITWIDTH
                        Note: --weight_bw is deprecated, use --weights_bitwidth.
  --ignore_encodings    Use only quantizer generated encodings, ignoring any user or model provided
                        encodings.
                        Note: Cannot use --ignore_encodings with --quantization_overrides
  --use_per_channel_quantization
                        Use this option to enable per-channel quantization for convolution-based op
                        weights.
                        Note: This will replace built-in model QAT encodings when used for a given
                        weight.
  --use_per_row_quantization
                        Use this option to enable rowwise quantization of Matmul and FullyConnected
                        ops.
  --enable_per_row_quantized_bias
                        Use this option to enable rowwise quantization of bias for FullyConnected
                        op, when weights are per-row quantized.
  --float_fallback      Use this option to enable fallback to floating point (FP) instead of fixed
                        point.
                        This option can be paired with --float_bitwidth to indicate the bitwidth for
                        FP (by default 32).
                        If this option is enabled, then input list must not be provided and
                        --ignore_encodings must not be provided.
                        The external quantization encodings (encoding file/FakeQuant encodings)
                        might be missing quantization parameters for some interim tensors.
                        First it will try to fill the gaps by propagating across math-invariant
                        functions. If the quantization params are still missing,
                        then it will apply fallback to nodes to floating point.
  --use_native_input_files
                        Boolean flag to indicate how to read input files:
                        1. float (default): reads inputs as floats and quantizes if necessary based
                        on quantization parameters in the model.
                        2. native:          reads inputs assuming the data type to be native to the
                        model. For ex., uint8_t.
  --use_native_dtype    Note: This option is deprecated, use --use_native_input_files option in
                        future.
                        Boolean flag to indicate how to read input files:
                        1. float (default): reads inputs as floats and quantizes if necessary based
                        on quantization parameters in the model.
                        2. native:          reads inputs assuming the data type to be native to the
                        model. For ex., uint8_t.
  --use_native_output_files
                        Use this option to indicate the data type of the output files
                        1. float (default): output the file as floats.
                        2. native:          outputs the file that is native to the model. For ex.,
                        uint8_t.
  --disable_relu_squashing
                        Disables squashing of Relu against Convolution based ops for quantized
                        models
  --restrict_quantization_steps ENCODING_MIN, ENCODING_MAX
                        Specifies the number of steps to use for computing quantization encodings
                        such that scale = (max - min) / number of quantization steps.
                        The option should be passed as a space separated pair of hexadecimal string
                        minimum and maximum valuesi.e. --restrict_quantization_steps "MIN MAX".
                         Please note that this is a hexadecimal string literal and not a signed
                        integer, to supply a negative value an explicit minus sign is required.
                        E.g.--restrict_quantization_steps "-0x80 0x7F" indicates an example 8 bit
                        range,
                            --restrict_quantization_steps "-0x8000 0x7F7F" indicates an example 16
                        bit range.
                        This argument is required for 16-bit Matmul operations.
  --pack_4_bit_weights  Store 4-bit quantized weights in packed format in a single byte i.e. two
                        4-bit quantized tensors can be stored in one byte
  --keep_weights_quantized
                        Use this option to keep the weights quantized even when the output of the op
                        is in floating point. Bias will be converted to floating point as per the
                        output of the op. Required to enable wFxp_actFP configurations according to
                        the provided bitwidth for weights and activations
                        Note: These modes are not supported by all runtimes. Please check
                        corresponding Backend OpDef supplement if these are supported
  --act_quantizer_calibration ACT_QUANTIZER_CALIBRATION
                        Specify which quantization calibration method to use for activations
                        supported values: min-max (default), sqnr, entropy, mse, percentile
                        This option can be paired with --act_quantizer_schema to override the
                        quantization
                        schema to use for activations otherwise default schema(asymmetric) will be
                        used
  --param_quantizer_calibration PARAM_QUANTIZER_CALIBRATION
                        Specify which quantization calibration method to use for parameters
                        supported values: min-max (default), sqnr, entropy, mse, percentile
                        This option can be paired with --param_quantizer_schema to override the
                        quantization
                        schema to use for parameters otherwise default schema(asymmetric) will be
                        used
  --act_quantizer_schema ACT_QUANTIZER_SCHEMA
                        Specify which quantization schema to use for activations
                        supported values: asymmetric (default), symmetric, unsignedsymmetric
                        This option cannot be used with legacy quantizer option --act_quantizer
  --param_quantizer_schema PARAM_QUANTIZER_SCHEMA
                        Specify which quantization schema to use for parameters
                        supported values: asymmetric (default), symmetric, unsignedsymmetric
                        This option cannot be used with legacy quantizer option --param_quantizer
  --percentile_calibration_value PERCENTILE_CALIBRATION_VALUE
                        Specify the percentile value to be used with Percentile calibration method
                        The specified float value must lie within 90 and 100, default: 99.99
  --dump_qairt_quantizer_command DUMP_QAIRT_QUANTIZER_COMMAND
                        Use this option to dump a file which contains the equivalent Commandline
                        input for QAIRT Quantizer
  --quantizer_log QUANTIZER_LOG
                        Enable logging in quantizer v2, logging to the file <QUANTIZER_LOG>.
                        E.g., --quantizer_log my_model_name.csv will produce the file
                        my_model_name.csv. See --quantizer_log_level.
  --quantizer_log_level {LogLevel.NONE,LogLevel.TRACE,LogLevel.INFO}
                        Sets the logging level in quantizer v2.
                        INFO: Emits a file in the CSV format. Requires --quantizer_log
                        <file_name.csv> to be set. Warnings and errors are emitted to the console.
                        TRACE: Emits a file in the TXT format. Requires --quantizer_log
                        <file_name.txt> to be set. Warnings and errors are emitted to the console.
                        NONE: Default value. No file is emitted. Warnings and errors are emitted to
                        the console.

Masked Softmax Optimization Options:
  --apply_masked_softmax {compressed,uncompressed}
                        This flag enables the pass that creates a MaskedSoftmax Op and
                        rewrites the graph to include this Op. MaskedSoftmax Op may not
                        be supported by all the QNN backends. Please check the
                        supplemental backend XML for the targeted backend.
                        This argument takes a string parameter input that selects
                        the mode of MaskedSoftmax Op.
                        'compressed' value rewrites the graph with the compressed version of
                        MaskedSoftmax Op.
                        'uncompressed' value rewrites the graph with the uncompressed version of
                        MaskedSoftmax Op.
  --packed_masked_softmax_inputs PACKED_MASKED_SOFTMAX_INPUTS [PACKED_MASKED_SOFTMAX_INPUTS ...]
                        Mention the input ids tensor name which will be packed in the single
                        inference.
                        This is applicable only for Compressed MaskedSoftmax Op.
                        This will create a new input to the graph named 'position_ids'
                        with same shape as the provided input name in this flag.
                        During runtime, this input shall be provided with the token
                        locations for individual sequences so that the same will be
                        internally passed to positional embedding layer.
                        E.g. If 2 sequences of length 20 and 30 are packed together
                        in single batch of 64 tokens then this new input 'position_ids' should have
                        value [0, 1, ..., 19, 0, 1, ..., 29, 0, 0, 0, ..., 0]
                        Usage: --packed_masked_softmax input_ids
                        Packed model will enable the user to pack multiple sequences into
                        single batch of inference.
  --packed_max_seq PACKED_MAX_SEQ
                        Number of sequences packed in the single input ids and
                        single attention mask inputs. Applicable only for
                        Compressed MaskedSoftmax Op.

Architecture Checker Options(Experimental):
  --arch_checker        Pass this option to enable architecture checker tool.
                        This is an experimental option for models that are intended to run on HTP
                        backend.

Note: Only one of: {'op_package_config', 'package_name'} can be specified Note: Only one of:
{'op_package_config', 'package_name'} can be specified

qairt-converter

The qairt-converter tool converts a model from the one of Onnx/TensorFlow/TFLite/PyTorch framework to a DLC file representing the QNN graph format that can enable inference on Qualcomm AI IP/HW. The converter auto detects the framework based on the source model extension.

Basic command line usage looks like:

usage: qairt-converter [--source_model_input_shape INPUT_NAME INPUT_DIM]
                       [--out_tensor_node OUT_NAMES]
                       [--source_model_input_datatype INPUT_NAME INPUT_DTYPE]
                       [--source_model_input_layout INPUT_NAME INPUT_LAYOUT]
                       [--desired_input_layout INPUT_NAME DESIRED_INPUT_LAYOUT]
                       [--source_model_output_layout OUTPUT_NAME OUTPUT_LAYOUT]
                       [--desired_output_layout OUTPUT_NAME DESIRED_OUTPUT_LAYOUT]
                       [--desired_input_color_encoding [ ...]]
                       [--preserve_io_datatype [PRESERVE_IO_DATATYPE ...]]
                       [--dump_config_template DUMP_IO_CONFIG_TEMPLATE] [--config IO_CONFIG]
                       [--dry_run [DRY_RUN]] [--enable_framework_trace] [--remove_unused_inputs]
                       [--gguf_config GGUF_CONFIG] [--quantizer_log QUANTIZER_LOG]
                       [--quantizer_log_level {LogLevel.NONE,LogLevel.TRACE,LogLevel.INFO}]
                       [--quantization_overrides QUANTIZATION_OVERRIDES]
                       [--lora_weight_list LORA_WEIGHT_LIST]
                       [--quant_updatable_mode {none,adapter_only,all}] [--onnx_skip_simplification]
                       [--onnx_override_batch BATCH] [--onnx_define_symbol SYMBOL_NAME VALUE]
                       [--onnx_validate_models] [--onnx_summary]
                       [--onnx_perform_sequence_construct_optimizer] [--tf_summary]
                       [--tf_override_batch BATCH] [--tf_disable_optimization]
                       [--tf_show_unconsumed_nodes] [--tf_saved_model_tag SAVED_MODEL_TAG]
                       [--tf_saved_model_signature_key SAVED_MODEL_SIGNATURE_KEY]
                       [--tf_validate_models] [--tflite_signature_name SIGNATURE_NAME]
                       [--dump_exported_onnx] --input_network INPUT_NETWORK [--debug [DEBUG]]
                       [--output_path OUTPUT_PATH] [--copyright_file COPYRIGHT_FILE]
                       [--float_bitwidth FLOAT_BITWIDTH] [--float_bias_bitwidth FLOAT_BIAS_BITWIDTH]
                       [--set_model_version MODEL_VERSION] [--export_format EXPORT_FORMAT]
                       [--converter_op_package_lib CONVERTER_OP_PACKAGE_LIB]
                       [--package_name PACKAGE_NAME | --op_package_config CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]]
                       [--target_backend BACKEND] [--target_soc_model SOC_MODEL] [-h]

required arguments:
  --input_network INPUT_NETWORK, -i INPUT_NETWORK
                        Path to the source framework model.

optional arguments:
  --source_model_input_shape INPUT_NAME INPUT_DIM, -s INPUT_NAME INPUT_DIM
                        The name and dimension of all the input buffers to the network specified in
                        the format [input_name comma-separated-dimensions],
                        for example: --source_model_input_shape 'data' 1,224,224,3.
                        Note that the quotes should always be included in order to handle special
                        characters, spaces, etc.
                        For scalar inputs, use a single dimension `0` to indicate that the input is
                        a scalar value. This representation is supported for ONNX models only.
                        For multiple inputs specify multiple --source_model_input_shape on the commandline like:
                            --source_model_input_shape 'data1' 1,224,224,3 --source_model_input_shape 'data2' 0
                        NOTE: Required for TensorFlow and PyTorch. Optional for Onnx and Tflite
                        In case of Onnx, this feature works only with Onnx 1.6.0 and above
  --out_tensor_node OUT_NAMES, --out_tensor_name OUT_NAMES
                        Name of the graph's output Tensor Names. Multiple output names should be
                        provided separately like:
                            --out_tensor_name out_1 --out_tensor_name out_2
                        NOTE: Required for TensorFlow. Optional for Onnx, Tflite and PyTorch
  --source_model_input_datatype INPUT_NAME INPUT_DTYPE
                        The names and datatype of the network input layers specified in the format
                        [input_name datatype], for example:
                            'data' 'float32'
                        Default is float32 if not specified
                        Note that the quotes should always be included in order to handlespecial
                        characters, spaces, etc.
                        For multiple inputs specify multiple --source_model_input_datatype on the
                        command line like:
                            --source_model_input_datatype 'data1' 'float32'
                        --source_model_input_datatype 'data2' 'float32'
  --source_model_input_layout INPUT_NAME INPUT_LAYOUT
                        Layout of each input tensor. If not specified, it will use the default based
                        on the Source Framework, shape of input and input encoding.
                        Accepted values are-
                            NCDHW, NDHWC, NCHW, NHWC, HWIO, OIHW, NFC, NCF, NTF, TNF, NF, NC, F
                        N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature,
                        T = Time, I = Input, O = Output
                        NDHWC/NCDHW used for 5d inputs
                        NHWC/NCHW used for 4d image-like inputs
                        HWIO/IOHW used for Weights of Conv Ops
                        NFC/NCF used for inputs to Conv1D or other 1D ops
                        NTF/TNF used for inputs with time steps like the ones used for LSTM op
                        NF used for 2D inputs, like the inputs to Dense/FullyConnected layers
                        NC used for 2D inputs with 1 for batch and other for Channels (rarely used)
                        F used for 1D inputs, e.g. Bias tensor
                        For multiple inputs specify multiple --source_model_input_layout on the
                        command line.
                        Eg:
                            --source_model_input_layout "data1" NCHW --source_model_input_layout
                        "data2" NCHW
  --desired_input_layout INPUT_NAME DESIRED_INPUT_LAYOUT
                        Desired Layout of each input tensor. If not specified, it will use the
                        default based on the Source Framework, shape of input and input encoding.
                        Accepted values are-
                            NCDHW, NDHWC, NCHW, NHWC, HWIO, OIHW, NFC, NCF, NTF, TNF, NF, NC, F
                        N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature,
                        T = Time, I = Input, O = Output
                        NDHWC/NCDHW used for 5d inputs
                        NHWC/NCHW used for 4d image-like inputs
                        HWIO/IOHW used for Weights of Conv Ops
                        NFC/NCF used for inputs to Conv1D or other 1D ops
                        NTF/TNF used for inputs with time steps like the ones used for LSTM op
                        NF used for 2D inputs, like the inputs to Dense/FullyConnected layers
                        NC used for 2D inputs with 1 for batch and other for Channels (rarely used)
                        F used for 1D inputs, e.g. Bias tensor
                        For multiple inputs specify multiple --desired_input_layout on the command
                        line.
                        Eg:
                            --desired_input_layout "data1" NCHW --desired_input_layout "data2" NCHW
  --source_model_output_layout OUTPUT_NAME OUTPUT_LAYOUT
                        Layout of each output tensor. If not specified, it will use the default
                        based on the Source Framework, shape of input and input encoding.
                        Accepted values are-
                            NCDHW, NDHWC, NCHW, NHWC, HWIO, OIHW, NFC, NCF, NTF, TNF, NF, NC, F
                        N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature, T =
                        Time
                        NDHWC/NCDHW used for 5d inputs
                        NHWC/NCHW used for 4d image-like inputs
                        NFC/NCF used for inputs to Conv1D or other 1D ops
                        NTF/TNF used for inputs with time steps like the ones used for LSTM op
                        NF used for 2D inputs, like the inputs to Dense/FullyConnected layers
                        NC used for 2D inputs with 1 for batch and other for Channels (rarely used)
                        F used for 1D inputs, e.g. Bias tensor
                        For multiple inputs specify multiple --source_model_output_layout on the
                        command line.
                        Eg:
                            --source_model_output_layout "data1" NCHW --source_model_output_layout
                        "data2" NCHW
  --desired_output_layout OUTPUT_NAME DESIRED_OUTPUT_LAYOUT
                        Desired Layout of each output tensor. If not specified, it will use the
                        default based on the Source Framework.
                        Accepted values are-
                            NCDHW, NDHWC, NCHW, NHWC, HWIO, OIHW, NFC, NCF, NTF, TNF, NF, NC, F
                        N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature, T =
                        Time
                        NDHWC/NCDHW used for 5d outputs
                        NHWC/NCHW used for 4d image-like outputs
                        NFC/NCF used for outputs to Conv1D or other 1D ops
                        NTF/TNF used for outputs with time steps like the ones used for LSTM op
                        NF used for 2D outputs, like the outputs to Dense/FullyConnected layers
                        NC used for 2D outputs with 1 for batch and other for Channels (rarely used)
                        F used for 1D outputs, e.g. Bias tensor
                        For multiple outputs specify multiple --desired_output_layout on the command
                        line.
                        Eg:
                            --desired_output_layout "data1" NCHW --desired_output_layout "data2"
                        NCHW
  --desired_input_color_encoding [ ...], -e [ ...]
                        Usage:     --input_color_encoding "INPUT_NAME" INPUT_ENCODING_IN
                        [INPUT_ENCODING_OUT]
                        Input encoding of the network inputs. Default is bgr.
                        e.g.
                           --input_color_encoding "data" rgba
                        Quotes must wrap the input node name to handle special characters,
                        spaces, etc. To specify encodings for multiple inputs, invoke
                        --input_color_encoding for each one.
                        e.g.
                            --input_color_encoding "data1" rgba --input_color_encoding "data2" other
                        Optionally, an output encoding may be specified for an input node by
                        providing a second encoding. The default output encoding is bgr.
                        e.g.
                            --input_color_encoding "data3" rgba rgb
                        Input encoding types:
                            image color encodings: bgr,rgb, nv21, nv12, ...
                            time_series: for inputs of rnn models;
                            other: not available above or is unknown.
                        Supported encodings:
                           bgr
                           rgb
                           rgba
                           argb32
                           nv21
                           nv12
  --preserve_io_datatype [PRESERVE_IO_DATATYPE ...]
                        Use this option to preserve IO datatype. The different ways of using this
                        option are as follows:
                            --preserve_io_datatype <space separated list of names of inputs and
                        outputs of the graph>
                        e.g.
                            --preserve_io_datatype input1 input2 output1
                        The user may choose to preserve the datatype for all the inputs and outputs
                        of the graph.
                            --preserve_io_datatype
                        Note: --config gets higher precedence than --preserve_io_datatype.
  --dump_config_template DUMP_IO_CONFIG_TEMPLATE
                        Dumps the yaml template for I/O configuration. This file can be edited as
                        per the custom requirements and passed using the option --configUse this
                        option to specify a yaml file to which the IO config template is dumped.
  --config IO_CONFIG    Use this option to specify a yaml file for input and output options.
  --dry_run [DRY_RUN]   Evaluates the model without actually converting any ops, and returns
                        unsupported ops/attributes as well as unused inputs and/or outputs if any.
  --enable_framework_trace
                        Use this option to enable converter to trace the op/tensor change
                        information.
                        Currently framework op trace is supported only for ONNX converter.
  --remove_unused_inputs
                        Use this option to remove the disconnected graph input nodes after the
                        conversion
  --gguf_config GGUF_CONFIG
                        This is an optional argument that can be used when input network is a GGUF
                        File.It specifies the path to the config file for building GenAI model.(the
                        config.json file generated when saving the huggingface model)
  --quantizer_log QUANTIZER_LOG
                        Valid for use with v2.0.0 JSON schema for quantization overrides or when
                        --use_quantize_v2 is provided. Enable logging in the quantizer, logging to
                        the file <QUANTIZER_LOG>.
                        E.g., --quantizer_log my_model_name.csv will produce the file
                        my_model_name.csv. See --quantizer_log_level.
  --quantizer_log_level {LogLevel.NONE,LogLevel.TRACE,LogLevel.INFO}
                        Sets the logging level in the quantizer. See --quantizer_log.
                        INFO: Emits a file in the CSV format. Requires --quantizer_log
                        <file_name.csv> to be set. Warnings and errors are emitted to the console.
                        TRACE: Emits a file in the TXT format. Requires --quantizer_log
                        <file_name.txt> to be set. Warnings and errors are emitted to the console.
                        NONE: Default value. No file is emitted. Warnings and errors are emitted to
                        the console.
  --debug [DEBUG]       Run the converter in debug mode.
  --output_path OUTPUT_PATH, -o OUTPUT_PATH
                        Path where the converted Output model should be saved.If not specified, the
                        converter model will be written to a file with same name as the input model
  --copyright_file COPYRIGHT_FILE
                        Path to copyright file. If provided, the content of the file will be added
                        to the output model.
  --float_bitwidth FLOAT_BITWIDTH
                        Use the --float_bitwidth option to convert the graph to the specified float
                        bitwidth, either 32 (default) or 16.
  --float_bias_bitwidth FLOAT_BIAS_BITWIDTH
                        Use the --float_bias_bitwidth option to select the bitwidth to use for float
                        bias tensor, either 32 or 16 (default '0' if not provided).
  --set_model_version MODEL_VERSION
                        User-defined ASCII string to identify the model, only first 64 bytes will be
                        stored
  --export_format EXPORT_FORMAT
                        DLC_DEFAULT (default)
                        - Produce a Float graph given a Float Source graph
                        - Produce a Quant graph given a Source graph with provided Encodings
                        DLC_STRIP_QUANT
                        - Produce a Float Graph with discarding Quant data
  -h, --help            show this help message and exit

Custom Op Package Options:
  --converter_op_package_lib CONVERTER_OP_PACKAGE_LIB, -cpl CONVERTER_OP_PACKAGE_LIB
                        Absolute path to converter op package library compiled by the OpPackage
                        generator. Must be separated by a comma for multiple package libraries.
                        Note: Order of converter op package libraries must follow the order of xmls.
                        Ex1: --converter_op_package_lib absolute_path_to/libExample.so
                        Ex2: -cpl absolute_path_to/libExample1.so,absolute_path_to/libExample2.so
  --package_name PACKAGE_NAME, -p PACKAGE_NAME
                        A global package name to be used for each node in the Model.cpp file.
                        Defaults to Qnn header defined package name
  --op_package_config CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...], -opc CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]
                        Path to a Qnn Op Package XML configuration file that contains user defined
                        custom operations.

Quantizer Options:
  --quantization_overrides QUANTIZATION_OVERRIDES, -q QUANTIZATION_OVERRIDES
                        Use this option to specify a json file with parameters to use for
                        quantization. These will override any quantization data carried from
                        conversion (eg TF fake quantization) or calculated during the normal
                        quantization process. Format defined as per AIMET specification.

LoRA Converter Options:
  --lora_weight_list LORA_WEIGHT_LIST
                        Path to a file specifying a list of tensor names that should be updateable.
  --quant_updatable_mode {none,adapter_only,all}
                        Specify whether/for which tensors the quantization encodings change across
                        use-cases. In none mode, no quantization encodings are updatable. In
                        adapter_only mode quantization encodings for only lora/adapter branch
                        (Conv->Mul->Conv) change across use-case, the base branch quantization
                        encodings remain the same. In all mode, all quantization encodings are
                        updatable.

Onnx Converter Options:
  --onnx_skip_simplification, -oss
                        Do not attempt to simplify the model automatically. This may prevent some
                        models from
                        properly converting  when sequences of unsupported static operations are
                        present.
  --onnx_override_batch BATCH
                        The batch dimension override. This will take the first dimension of all
                        inputs and treat it as a batch dim, overriding it with the value provided
                        here. For example:
                        --onnx_override_batch 6
                        will result in a shape change from [1,3,224,224] to [6,3,224,224].
                        If there are inputs without batch dim this should not be used and each input
                        should be overridden independently using -s option for input dimension
                        overrides.
  --onnx_define_symbol SYMBOL_NAME VALUE
                        This option allows overriding specific input dimension symbols. For instance
                        you might see input shapes specified with variables such as :
                        data: [1,3,height,width]
                        To override these simply pass the option as:
                        --onnx_define_symbol height 224 --onnx_define_symbol width 448
                        which results in dimensions that look like:
                        data: [1,3,224,448]
  --onnx_validate_models
                        Validate the original ONNX model against optimized ONNX model.
                        Constant inputs with all value 1s will be generated and will be used
                        by both models and their outputs are checked against each other.
                        The % average error and 90th percentile of output differences will be
                        calculated for this.
                        Note: Usage of this flag will incur extra time due to inference of the
                        models.
  --onnx_summary        Summarize the original onnx model and optimized onnx model.
                        Summary will print the model information such as number of parameters,
                        number of operators and their count, input-output tensor name, shape and
                        dtypes.
  --onnx_perform_sequence_construct_optimizer
                        This option allows optimization on SequenceConstruct Op.
                        When SequenceConstruct op is one of the outputs of the graph, it removes
                        SequenceConstruct op and makes its inputs as graph outputs to replace the
                        original output of SequenceConstruct.
  --tf_summary          Summarize the original TF model and optimized TF model.
                        Summary will print the model information such as number of parameters,
                        number of operators and their count, input-output tensor name, shape and
                        dtypes.

TensorFlow Converter Options:
  --tf_override_batch BATCH
                        The batch dimension override. This will take the first dimension of all
                        inputs and treat it as a batch dim, overriding it with the value provided
                        here. For example:
                        --tf_override_batch 6
                        will result in a shape change from [1,224,224,3] to [6,224,224,3].
                        If there are inputs without batch dim this should not be used and each input
                        should be overridden independently using -s option for input dimension
                        overrides.
  --tf_disable_optimization
                        Do not attempt to optimize the model automatically.
  --tf_show_unconsumed_nodes
                        Displays a list of unconsumed nodes, if there any are found. Nodeswhich are
                        unconsumed do not violate the structural fidelity of thegenerated graph.
  --tf_saved_model_tag SAVED_MODEL_TAG
                        Specify the tag to seletet a MetaGraph from savedmodel. ex:
                        --saved_model_tag serve. Default value will be 'serve' when it is not
                        assigned.
  --tf_saved_model_signature_key SAVED_MODEL_SIGNATURE_KEY
                        Specify signature key to select input and output of the model. ex:
                        --tf_saved_model_signature_key serving_default. Default value will be
                        'serving_default' when it is not assigned
  --tf_validate_models  Validate the original TF model against optimized TF model.
                        Constant inputs with all value 1s will be generated and will be used
                        by both models and their outputs are checked against each other.
                        The % average error and 90th percentile of output differences will be
                        calculated for this.
                        Note: Usage of this flag will incur extra time due to inference of the
                        models.

Tflite Converter Options:
  --tflite_signature_name SIGNATURE_NAME
                      Use this option to specify a specific Subgraph signature to convert

PyTorch Converter Options:
  --dump_exported_onnx  Dump the exported Onnx model from input Torchscript model

Backend Options:
  --target_backend BACKEND
                        Use this option to specify the backend on which the model needs to run.
                        Providing this option will generate a graph optimized for the given backend
                        and this graph may not run on other backends. The default backend is HTP.
                        Supported backends are CPU,GPU,DSP,HTP,HTA,LPAI.
  --target_soc_model SOC_MODEL
                        Use this option to specify the SOC on which the model needs to run.
                        This can be found from SOC info of the device and it starts with strings
                        such as SDM, SM, QCS, IPQ, SA, QC, SC, SXR, SSG, STP, QRB, or AIC.
                        NOTE: --target_backend option must be provided to use --target_soc_model
                        option.

Note: Only one of: {'package_name', 'op_package_config'} can be specified

Model Preparation

Quantization Support

Quantization is supported through the converter interface and is performed at conversion time. The only required option to enable quantization along with conversion is the –input_list option, which provides the quantizer with the required input data for the given model. The following options are available in each converter listed above to enable and configure quantization:

Quantizer Options:
--quantization_overrides QUANTIZATION_OVERRIDES
                        Use this option to specify a json file with parameters
                        to use for quantization. These will override any
                        quantization data carried from conversion (eg TF fake
                        quantization) or calculated during the normal
                        quantization process. Format defined as per AIMET
                        specification.
--input_list INPUT_LIST
                      Path to a file specifying the input data. This file
                      should be a plain text file, containing one or more
                      absolute file paths per line. Each path is expected to
                      point to a binary file containing one input in the
                      "raw" format, ready to be consumed by the quantizer
                      without any further preprocessing. Multiple files per
                      line separated by spaces indicate multiple inputs to
                      the network. See documentation for more details. Must
                      be specified for quantization. All subsequent
                      quantization options are ignored when this is not
                      provided.
--param_quantizer PARAM_QUANTIZER
                      Optional parameter to indicate the weight/bias
                      quantizer to use. Must be followed by one of the
                      following options: "tf": Uses the real min/max of the
                      data and specified bitwidth (default) "enhanced": Uses
                      an algorithm useful for quantizing models with long
                      tails present in the weight distribution "adjusted":
                      Uses an adjusted min/max for computing the range,
                      particularly good for denoise models "symmetric":
                      Ensures min and max have the same absolute values
                      about zero. Data will be stored as int#_t data such
                      that the offset is always 0.
--act_quantizer ACT_QUANTIZER
                      Optional parameter to indicate the activation
                      quantizer to use. Must be followed by one of the
                      following options: "tf": Uses the real min/max of the
                      data and specified bitwidth (default) "enhanced": Uses
                      an algorithm useful for quantizing models with long
                      tails present in the weight distribution "adjusted":
                      Uses an adjusted min/max for computing the range,
                      particularly good for denoise models "symmetric":
                      Ensures min and max have the same absolute values
                      about zero. Data will be stored as int#_t data such
                      that the offset is always 0.
--algorithms ALGORITHMS [ALGORITHMS ...]
                      Use this option to enable new optimization algorithms.
                      Usage is: --algorithms <algo_name1> ... The
                      available optimization algorithms are: "cle" - Cross
                      layer equalization includes a number of methods for
                      equalizing weights and biases across layers in order
                      to rectify imbalances that cause quantization errors.
--bias_bitwidth BIAS_BITWIDTH
                      Use the --bias_bitwidth option to select the bitwidth to use
                      when quantizing the biases, either 8 (default) or 32.
--act_bitwidth ACT_BITWIDTH
                      Use the --act_bitwidth option to select the bitwidth to use
                      when quantizing the activations, either 8 (default) or
                      16.
--weight_bitwidth WEIGHT_BITWIDTH
                      Use the --weight_bitwidth option to select the bitwidth to
                      use when quantizing the weights, either 4, 8 (default) or 16.
--float_bitwidth FLOAT_BITWIDTH
                      Use the --float_bitwidth option to select the bitwidth to use for float
                      tensors,either 32 (default) or 16.
--float_bias_bitwidth FLOAT_BIAS_BITWIDTH
                      Use the --float_bias_bitwidth option to select the bitwidth to
                      use when biases are in float, either 32 or 16.
--ignore_encodings    Use only quantizer generated encodings, ignoring any
                      user or model provided encodings. Note: Cannot use
                      --ignore_encodings with --quantization_overrides
--use_per_channel_quantization [USE_PER_CHANNEL_QUANTIZATION [USE_PER_CHANNEL_QUANTIZATION ...]]
                      Use per-channel quantization for
                      convolution-based op weights. Note: This will replace
                      built-in model QAT encodings when used for a given
                      weight.Usage "--use_per_channel_quantization" to
                      enable or "--use_per_channel_quantization false"
                      (default) to disable
--use_per_row_quantization [USE_PER_ROW_QUANTIZATION [USE_PER_ROW_QUANTIZATION ...]]
                      Use this option to enable rowwise quantization of Matmul and
                      FullyConnected op. Usage "--use_per_row_quantization" to enable
                      or "--use_per_row_quantization false" (default) to
                      disable. This option may not be supported by all backends.

Basic command line usage to convert and quantize a model using the TF converter would look like:

$ qnn-tensorflow-converter -i <path>/frozen_graph.pb
                    -d <network_input_name> <dims>
                    --out_node <network_output_name>
                    -o <optional_output_path>
                    --allow_unconsumed_nodes  # optional, but most likely will be need for larger models
                    -p <optional_package_name> # Defaults to "qti.aisw"
                    --input_list input_list.txt

This will quantize the network using the default quantizer and bitwidths (8 bits for activations, weights, and biases).

For more detailed information on quantization, options, and algorithms please refer to Quantization.

qairt-quantizer

The qairt-quantizer tool converts non-quantized DLC models into quantized DLC models.

Basic command line usage looks like:

usage: qairt-quantizer --input_dlc INPUT_DLC [--output_dlc OUTPUT_DLC] [--input_list INPUT_LIST]
                       [--enable_float_fallback] [--apply_algorithms ALGORITHMS [ALGORITHMS ...]]
                       [--bias_bitwidth BIAS_BITWIDTH] [--act_bitwidth ACT_BITWIDTH]
                       [--weights_bitwidth WEIGHTS_BITWIDTH] [--float_bitwidth FLOAT_BITWIDTH]
                       [--float_bias_bitwidth FLOAT_BIAS_BITWIDTH] [--ignore_quantization_overrides]
                       [--use_per_channel_quantization] [--use_per_row_quantization]
                       [--enable_per_row_quantized_bias]
                       [--preserve_io_datatype [PRESERVE_IO_DATATYPE ...]]
                       [--use_native_input_files] [--use_native_output_files]
                       [--restrict_quantization_steps ENCODING_MIN, ENCODING_MAX]
                       [--keep_weights_quantized] [--adjust_bias_encoding]
                       [--act_quantizer_calibration ACT_QUANTIZER_CALIBRATION]
                       [--param_quantizer_calibration PARAM_QUANTIZER_CALIBRATION]
                       [--act_quantizer_schema ACT_QUANTIZER_SCHEMA]
                       [--param_quantizer_schema PARAM_QUANTIZER_SCHEMA]
                       [--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE]
                       [--use_aimet_quantizer] [--op_package_lib OP_PACKAGE_LIB]
                       [--dump_encoding_json] [--config CONFIG_FILE] [--export_stripped_dlc] [-h]
                       [--target_backend BACKEND] [--target_soc_model SOC_MODEL] [--debug [DEBUG]]

required arguments:
  --input_dlc INPUT_DLC, -i INPUT_DLC
                        Path to the dlc container containing the model for which fixed-point
                        encoding metadata should be generated. This argument is required

optional arguments:
  --output_dlc OUTPUT_DLC, -o OUTPUT_DLC
                        Path at which the metadata-included quantized model container should be
                        written.If this argument is omitted, the quantized model will be written at
                        <unquantized_model_name>_quantized.dlc
  --input_list INPUT_LIST, -l INPUT_LIST
                        Path to a file specifying the input data. This file should be a plain text
                        file, containing one or more absolute file paths per line. Each path is
                        expected to point to a binary file containing one input in the "raw" format,
                        ready to be consumed by the quantizer without any further preprocessing.
                        Multiple files per line separated by spaces indicate multiple inputs to the
                        network. See documentation for more details. Must be specified for
                        quantization. All subsequent quantization options are ignored when this is
                        not provided.
  --enable_float_fallback, -f
                        Use this option to enable fallback to floating point (FP) instead of fixed
                        point.
                        This option can be paired with --float_bitwidth to indicate the bitwidth for
                        FP (by default 32).
                        If this option is enabled, then input list must not be provided and
                        --ignore_quantization_overrides must not be provided.
                        The external quantization encodings (encoding file/FakeQuant encodings)
                        might be missing quantization parameters for some interim tensors.
                        First it will try to fill the gaps by propagating across math-invariant
                        functions. If the quantization params are still missing,
                        then it will apply fallback to nodes to floating point.
  --apply_algorithms ALGORITHMS [ALGORITHMS ...]
                        Use this option to enable new optimization algorithms. Usage is:
                        --apply_algorithms <algo_name1> ... The available optimization algorithms
                        are: "cle" - Cross layer equalization includes a number of methods for
                        equalizing weights and biases across layers in order to rectify imbalances
                        that cause quantization errors.
  --bias_bitwidth BIAS_BITWIDTH
                        Use the --bias_bitwidth option to select the bitwidth to use when quantizing
                        the biases, either 8 (default) or 32.
  --act_bitwidth ACT_BITWIDTH
                        Use the --act_bitwidth option to select the bitwidth to use when quantizing
                        the activations, either 8 (default) or 16.
  --weights_bitwidth WEIGHTS_BITWIDTH
                        Use the --weights_bitwidth option to select the bitwidth to use when
                        quantizing the weights, either 4, 8 (default) or 16.
  --float_bitwidth FLOAT_BITWIDTH
                        Use the --float_bitwidth option to select the bitwidth to use for float
                        tensors,either 32 (default) or 16.
  --float_bias_bitwidth FLOAT_BIAS_BITWIDTH
                        Use the --float_bias_bitwidth option to select the bitwidth to use when
                        biases are in float, either 32 or 16 (default '0' if not provided).
  --ignore_quantization_overrides
                        Use only quantizer generated encodings, ignoring any user or model provided
                        encodings.
                        Note: Cannot use --ignore_quantization_overrides with
                        --quantization_overrides (argument of Qairt Converter)
  --use_per_channel_quantization
                        Use this option to enable per-channel quantization for convolution-based op
                        weights.
                        Note: This will only be used if built-in model Quantization-Aware Trained
                        (QAT) encodings are not present for a given weight.
  --use_per_row_quantization
                        Use this option to enable rowwise quantization of Matmul and FullyConnected
                        ops.
  --enable_per_row_quantized_bias
                        Use this option to enable rowwise quantization of bias for FullyConnected
                        ops, when weights are per-row quantized.
  --preserve_io_datatype [PRESERVE_IO_DATATYPE ...]
                        Use this option to preserve IO datatype. The different ways of using this
                        option are as follows:
                            --preserve_io_datatype <space separated list of names of inputs and
                        outputs of the graph>
                        e.g.
                           --preserve_io_datatype input1 input2 output1
                        The user may choose to preserve the datatype for all the inputs and outputs
                        of the graph.
                            --preserve_io_datatype
  --use_native_input_files
                        Boolean flag to indicate how to read input files.
                        If not provided, reads inputs as floats and quantizes if necessary based on
                        quantization parameters in the model. (default)
                        If provided, reads inputs assuming the data type to be native to the model.
                        For ex., uint8_t.
  --use_native_output_files
                        Boolean flag to indicate the data type of the output files
                        If not provided, outputs the file as floats. (default)
                        If provided, outputs the file that is native to the model. For ex., uint8_t.
  --restrict_quantization_steps ENCODING_MIN, ENCODING_MAX
                        Specifies the number of steps to use for computing quantization encodings
                        such that scale = (max - min) / number of quantization steps.
                        The option should be passed as a space separated pair of hexadecimal string
                        minimum and maximum valuesi.e. --restrict_quantization_steps "MIN MAX".
                        Please note that this is a hexadecimal string literal and not a signed
                        integer, to supply a negative value an explicit minus sign is required.
                        E.g.--restrict_quantization_steps "-0x80 0x7F" indicates an example 8 bit
                        range,
                            --restrict_quantization_steps "-0x8000 0x7F7F" indicates an example 16
                        bit range.
                        This argument is required for 16-bit Matmul operations.
  --keep_weights_quantized
                        Use this option to keep the weights quantized even when the output of the op
                        is in floating point. Bias will be converted to floating point as per the
                        output of the op. Required to enable wFxp_actFP configurations according to
                        the provided bitwidth for weights and activations
                        Note: These modes are not supported by all runtimes. Please check
                        corresponding Backend OpDef supplement if these are supported
  --adjust_bias_encoding
                        Use --adjust_bias_encoding option to modify bias encoding and weight
                        encoding to ensure that the bias value is in the range of the bias encoding.
                        This option is only applicable for per-channel quantized weights.
                        NOTE: This may result in clipping of the weight values
  --act_quantizer_calibration ACT_QUANTIZER_CALIBRATION
                        Specify which quantization calibration method to use for activations
                        supported values: min-max (default), sqnr, entropy, mse, percentile
                        This option can be paired with --act_quantizer_schema to override the
                        quantization
                        schema to use for activations otherwise default schema(asymmetric) will be
                        used
  --param_quantizer_calibration PARAM_QUANTIZER_CALIBRATION
                        Specify which quantization calibration method to use for parameters
                        supported values: min-max (default), sqnr, entropy, mse, percentile
                        This option can be paired with --param_quantizer_schema to override the
                        quantization
                        schema to use for parameters otherwise default schema(asymmetric) will be
                        used
  --act_quantizer_schema ACT_QUANTIZER_SCHEMA
                        Specify which quantization schema to use for activations
                        supported values: asymmetric (default), symmetric, unsignedsymmetric
  --param_quantizer_schema PARAM_QUANTIZER_SCHEMA
                        Specify which quantization schema to use for parameters
                        supported values: asymmetric (default), symmetric, unsignedsymmetric
  --percentile_calibration_value PERCENTILE_CALIBRATION_VALUE
                        Specify the percentile value to be used with Percentile calibration method
                        The specified float value must lie within 90 and 100, default: 99.99
  --use_aimet_quantizer
                        Use AIMET for Quantization instead of QNN IR quantizer
  --op_package_lib OP_PACKAGE_LIB, -opl OP_PACKAGE_LIB
                        Use this argument to pass an op package library for quantization. Must be in
                        the form <op_package_lib_path:interfaceProviderName> and be separated by a
                        comma for multiple package libs
  --dump_encoding_json  Use this argument to dump encoding of all the tensors in a json file
  --config CONFIG_FILE, -c CONFIG_FILE
                        Use this argument to pass the path of the config YAML file with quantizer
                        options
  --export_stripped_dlc
                        Use this argument to export a DLC which strips out data not needed for graph
                        composition
  -h, --help            show this help message and exit
  --debug [DEBUG]       Run the quantizer in debug mode.

Backend Options:
  --target_backend BACKEND
                        Use this option to specify the backend on which the model needs to run.
                        Providing this option will generate a graph optimized for the given backend
                        and this graph may not run on other backends. The default backend is HTP.
                        Supported backends are CPU,GPU,DSP,HTP,HTA,LPAI.
  --target_soc_model SOC_MODEL
                        Use this option to specify the SOC on which the model needs to run.
                        This can be found from SOC info of the device and it starts with strings
                        such as SDM, SM, QCS, IPQ, SA, QC, SC, SXR, SSG, STP, QRB, or AIC.
                        NOTE: --target_backend option must be provided to use --target_soc_model
                        option.

For more information on usage, please refer to SNPE documentation on the snpe-dlc-quant tool.

qnn-model-lib-generator

Note

For developers who want to execute the model preparation tools under Windows-PC, or on a Qualcomm device with a Windows operating system.
The qnn-model-lib-generator are located under /bin/x86_64-windows-msvc within the SDK for native Windows-PC usage.
For developers who want to run qnn-model-lib-generator on a device with a Windows OS, it is located under /bin/aarch64-windows-msvc.
qnn-model-lib-generator will try to use the CMake command from your platform to generate libraries.
Please make sure the CMake in Windows-OS is feasible by making sure the compile tools are installed(windows-platform compiling tools).

The qnn-model-lib-generator tool compiles QNN model source code into artifacts for a specific target.

usage: qnn-model-lib-generator [-h] [-c <QNN_MODEL>.cpp] [-b <QNN_MODEL>.bin]
       [-t LIB_TARGETS ] [-l LIB_NAME] [-o OUTPUT_DIR]
Script compiles provided Qnn Model artifacts for specified targets.

Required argument(s):
 -c <QNN_MODEL>.cpp                    Filepath for the qnn model .cpp file

optional argument(s):
 -b <QNN_MODEL>.bin                    Filepath for the qnn model .bin file
                                       (Note: if not passed, runtime will fail if .cpp needs any items from a .bin file.)

 -t LIB_TARGETS                        Specifies the targets to build the models for. Default: aarch64-android x86_64-linux-clang
 -l LIB_NAME                           Specifies the name to use for libraries. Default: uses name in <model.bin> if provided,
                                       else generic qnn_model.so
  -o OUTPUT_DIR                         Location for saving output libraries.

Note

For Windows users, please execute this tool with python3.

qnn-op-package-generator

The qnn-op-package-generator tool is used to generate skeleton code for a QNN op package using an XML config file that describes the attributes of the package. The tool creates the package as a directory containing skeleton source code and makefiles that can be compiled to create a shared library object.

usage: qnn-op-package-generator [-h] --config_path CONFIG_PATH [--debug]
                                [--output_path OUTPUT_PATH] [-f]

optional arguments:
  -h, --help            show this help message and exit

required arguments:
  --config_path CONFIG_PATH, -p CONFIG_PATH
                        The path to a config file that defines a QNN Op
                        package(s).

optional arguments:
  --debug               Returns debugging information from generating the
                        package
  --output_path OUTPUT_PATH, -o OUTPUT_PATH
                        Path where the package should be saved
  -f, --force-generation
                        This option will delete the entire existing package
                        Note appropriate file permissions must be set to use
                        this option.
  --converter_op_package, -cop
                        Generates Converter Op Package skeleton code needed
                        by the output shape inference for converters

qnn-context-binary-generator

The qnn-context-binary-generator tool is used to create a context binary by using a particular backend and consuming a model library created by the qnn-model-lib-generator.

usage: qnn-context-binary-generator --model QNN_MODEL.so --backend QNN_BACKEND.so
                                    --binary_file BINARY_FILE_NAME
                                    [--model_prefix MODEL_PREFIX]
                                    [--output_dir OUTPUT_DIRECTORY]
                                    [--op_packages ONE_OR_MORE_OP_PACKAGES]
                                    [--config_file CONFIG_FILE.json]
                                    [--profiling_level PROFILING_LEVEL]
                                    [--verbose] [--version] [--help]

REQUIRED ARGUMENTS:
-------------------
  --model                         <FILE>      Path to the <qnn_model_name.so> file containing a QNN network.
                                              To create a context binary with multiple graphs, use
                                              comma-separated list of model.so files. The syntax is
                                              <qnn_model_name_1.so>,<qnn_model_name_2.so>.

  --backend                       <FILE>      Path to a QNN backend .so library to create the context binary.

  --binary_file                   <VAL>       Name of the binary file to save the context binary to with
                                              .bin file extension. If not provided, no backend binary is created.
                                              If absolute path is provided, binary is saved in this path.
                                              Else binary is saved in the same path as --output_dir option.


OPTIONAL ARGUMENTS:
-------------------
  --model_prefix                              Function prefix to use when loading <qnn_model_name.so> file
                                              containing a QNN network. Default: QnnModel.

  --output_dir                    <DIR>       The directory to save output to. Defaults to ./output.

  --op_packages                   <VAL>       Provide a comma separated list of op packages
                                              and interface providers to register. The syntax is:
                                              op_package_path:interface_provider[,op_package_path:interface_provider...]

  --profiling_level               <VAL>       Enable profiling. Valid Values:
                                              1. basic:    captures execution and init time.
                                              2. detailed: in addition to basic, captures per Op timing
                                                  for execution.
                                              3. backend:  backend-specific profiling level specified
                                                  in the backend extension related JSON config file.

  --profiling_option              <VAL>       Set profiling options:
                                              1. optrace:    Generates an optrace of the run.

  --config_file                   <FILE>      Path to a JSON config file. The config file currently
                                              supports options related to backend extensions and
                                              context priority. Please refer to SDK documentation
                                              for more details.

  --enable_intermediate_outputs               Enable all intermediate nodes to be output along with
                                              default outputs in the saved context.
                                              Note that options --enable_intermediate_outputs and --set_output_tensors
                                              are mutually exclusive. Only one of the options can be specified at a time.

  --set_output_tensors            <VAL>       Provide a comma-separated list of intermediate output tensor names, for which the outputs
                                              will be written in addition to final graph output tensors.
                                              Note that options --enable_intermediate_outputs and --set_output_tensors
                                              are mutually exclusive. Only one of the options can be specified at a time.
                                              The syntax is: graphName0:tensorName0,tensorName1;graphName1:tensorName0,tensorName1.
                                              In case of a single graph, its name is not necessary and a list of comma separated tensor
                                              names can be provided, e.g.: tensorName0,tensorName1.
                                              The same format can be provided in a .txt file.

  --backend_binary                <VAL>       Name of the binary file to save a backend-specific context binary to with
                                              .bin file extension. If not provided, no backend binary is created.
                                              If absolute path is provided, binary is saved in this path.
                                              Else binary is saved in the same path as --output_dir option.

  --log_level                                 Specifies max logging level to be set. Valid settings:
                                              "error", "warn", "info" and "verbose"

  --dlc_path                     <VAL>        Paths to a comma separated list of Deep Learning Containers (DLC) from which to load the models.
                                              Necessitates libQnnModelDlc.so as the --model argument.
                                              To compose multiple graphs in the context, use comma-separated list of DLC files.
                                              The syntax is <qnn_model_name_1.dlc>,<qnn_model_name_2.dlc>
                                              Default: None

  --input_output_tensor_mem_type  <VAL>       Specifies mem type to be used for input and output tensors during graph creation.
                                              Valid settings:"raw" and "memhandle"

  --platform_options              <VAL>       Specifies values to pass as platform options. Multiple platform options can be provided
                                              using the syntax: key0:value0;key1:value1;key2:value2

  --data_format_config            <VAL>        Path to a JSON config file, specifying the data formats of certain tensors.
                                               Please refer to SDK documentation for more details.

  --adapter_weight_config         <VAL>        Path to a YAML config file containing adapter weight information for LoRA.
                                               Config should specifiy the use case name, graph name, the location of safetensor weights and encodings,
                                               and optionally whether the use case should be encodings and/or weights only e.g.

                                                use_case:
                                                        - name: <use_case>
                                                          graph: <graph>
                                                          weights: <path_to_safetensors>.safetensors
                                                          encodings: <path_to_encodings>.encodings
                                                          encodings_only: <true/false>
                                                          weights_only: <true/false>

  --soc_model                     <VAL>       Specifies simulated soc model value.
                                              A valid soc model value can be chosen from :ref:`Supported Snapdragon Devices<general/overview:Supported Snapdragon devices>`
                                              Default: 0 (use default soc model set by the backend).

  --version                                   Print the QNN SDK version.

  --help                                      Show this help message.

See qnn-net-run section for more details about --op_packages and --config_file options.

Execution

qnn-net-run

The qnn-net-run tool is used to consume a model library compiled from the output of the QNN converter, and run it on a particular backend.

DESCRIPTION:
------------
Example application demonstrating how to load and execute a neural network
using QNN APIs.


REQUIRED ARGUMENTS:
-------------------
  --model             <FILE>       Path to the model containing a QNN network.
                                   To compose multiple graphs, use comma-separated list of
                                   model.so files. The syntax is
                                   <qnn_model_name_1.so>,<qnn_model_name_2.so>.

  --backend           <FILE>       Path to a QNN backend to execute the model.

  --input_list        <FILE>       Path to a file listing the inputs for the network.
                                   If there are multiple graphs in model.so, this has
                                   to be comma-separated list of input list files.
                                   When multiple graphs are present, to skip execution of a graph use
                                   "__"(double underscore without quotes) as the file name in the
                                   comma-seperated list of input list files.

  --retrieve_context  <VAL>       Path to cached binary from which to load a saved
                                  context from and execute graphs. --retrieve_context and
                                  --model are mutually exclusive. Only one of the options
                                  can be specified at a time.


OPTIONAL ARGUMENTS:
-------------------
  --model_prefix                             Function prefix to use when loading <qnn_model_name.so>.
                                             Default: QnnModel

  --debug                                    Specifies that output from all layers of the network
                                             will be saved. Note that options --debug and --set_output_tensors
                                             are mutually exclusive. Only one of the options can be specified
                                             at a time.This option can not be used when loading a saved context
                                             through --retrieve_context or --retrieve_context_list option.

  --output_dir                   <DIR>       The directory to save output to. Defaults to ./output.

  --use_native_output_files                  Specifies that the output files will be generated in the data
                                             type native to the graph. If not specified, output files will
                                             be generated in floating point.

  --use_native_input_files                   Specifies that the input files will be parsed in the data
                                             type native to the graph. If not specified, input files will
                                             be parsed in floating point. Note that options --use_native_input_files
                                             and --native_input_tensor_names are mutually exclusive.
                                             Only one of the options can be specified at a time.

  --native_input_tensor_names    <VAL>       Provide a comma-separated list of input tensor names,
                                             for which the input files would be read/parsed in native format.
                                             Note that options --use_native_input_files and
                                             --native_input_tensor_names are mutually exclusive.
                                             Only one of the options can be specified at a time.
                                             The syntax is: graphName0:tensorName0,tensorName1;graphName1:tensorName0,tensorName1

  --op_packages                  <VAL>       Provide a comma-separated list of op packages, interface
                                             providers, and, optionally, targets to register. Valid values
                                             for target are CPU and HTP. The syntax is:
                                             op_package_path:interface_provider:target[,op_package_path:interface_provider:target...]

  --profiling_level              <VAL>       Enable profiling. Valid Values:
                                               1. basic:    captures execution and init time.
                                               2. detailed: in addition to basic, captures per Op timing
                                                            for execution, if a backend supports it.
                                               3. client:   captures only the performance metrics
                                                            measured by qnn-net-run.
                                               4. backend:  backend-specific profiling level
                                                            specified in the backend extension
                                                            related JSON config file.

  --profiling_option             <VAL>       Set profiling options:
                                               1. optrace:      Generates an optrace of the run.

  --perf_profile                 <VAL>       Specifies performance profile to be used. Valid settings are
                                             low_balanced, balanced, default, high_performance,
                                             sustained_high_performance, burst, low_power_saver,
                                             power_saver, high_power_saver, extreme_power_saver
                                             and system_settings.
                                             Note: perf_profile option will override any existing performance settings from backend config.


  --config_file                  <FILE>      Path to a JSON config file. The config file currently
                                             supports options related to backend extensions,
                                             context priority and graph configs. Please refer to SDK
                                             documentation for more details.

  --log_level                    <VAL>       Specifies max logging level to be set. Valid settings:
                                             error, warn, info, debug, and verbose.

  --shared_buffer                            Specifies creation of shared buffers for graph I/O between the application
                                             and the device/coprocessor associated with a backend directly.

  --synchronous                              Specifies that graphs should be executed synchronously rather than asynchronously.
                                             If a backend does not support asynchronous execution, this flag is unnecessary.

  --num_inferences               <VAL>       Specifies the number of inferences. Loops over the input_list until
                                             the number of inferences has transpired.

  --duration                     <VAL>       Specifies the duration of the graph execution in seconds.
                                             Loops over the input_list until this amount of time has transpired.

  --keep_num_outputs             <VAL>       Specifies the number of outputs to be saved.
                                             Once the number of outputs reach the limit, subsequent outputs would be just discarded.

  --batch_multiplier             <VAL>       Specifies the value with which the batch value in input and output tensors dimensions
                                             will be multiplied. The modified input and output tensors will be used only during
                                             the execute graphs. Composed graphs will still use the tensor dimensions from model.

  --timeout                      <VAL>       Specifies the value of the timeout for execution of graph in micro seconds. Please note
                                             using this option with a backend that does not support timeout signals results in an error.

  --retrieve_context_timeout     <VAL>       Specifies the value of the timeout for initialization of graph in micro seconds. Please note
                                             using this option with a backend that does not support timeout signals results in an error.
                                             Also note that this option can only be used when loading a saved context through
                                             --retrieve_context or --retrieve_context_list option.

  --max_input_cache_tensor_sets  <VAL>       Specifies the maximum number of input tensor sets that can be cached.
                                             Use value "-1" to cache all the input tensors created.
                                             Note that options --max_input_cache_tensor_sets and --max_input_cache_size_mb are mutually exclusive.
                                             Only one of the options can be specified at a time.

  --max_input_cache_size_mb      <VAL>       Specifies the maximum cache size in mega bytes(MB).
                                             Note that options --max_input_cache_tensor_sets and --max_input_cache_size_mb are mutually exclusive.
                                             Only one of the options can be specified at a time.

  --set_output_tensors          <VAL>        Provide a comma-separated list of intermediate output tensor names, for which the outputs
                                             will be written in addition to final graph output tensors. Note that options --debug and
                                             --set_output_tensors are mutually exclusive. Only one of the options can be specified at a time.
                                             Also note that this option can not be used when graph is retrieved from context binary,
                                             since the graph is already finalized when retrieved from context binary.
                                             The syntax is: graphName0:tensorName0,tensorName1;graphName1:tensorName0,tensorName1.
                                             In case of a single graph, its name is not necessary and a list of comma separated tensor
                                             names can be provided, e.g.: tensorName0,tensorName1.
                                             The same format can be provided in a .txt file.

 --use_mmap                                  Specifies that the context binary that is being read should be loaded
                                             using the Memory-mapped (MMAP) file I/O. Please note some platforms
                                             may not support this due to OS limitations in which case an error
                                             is thrown when this option is used.

 --validate_binary                           Specifies that the context binary will be validated before creating a context.
                                             This option can only be used with backends that support binary validation.

 --platform_options             <VAL>        Specifies values to pass as platform options. Multiple platform options can be provided
                                             using the syntax: key0:value0;key1:value1;key2:value2

 --graph_profiling_start_delay  <VAL>        Specifies graph profiling start delay in seconds. Please Note that this option can only be used
                                             in conjunction with graph-level profiling handles.

 --dlc_path                     <VAL>        Paths to a comma separated list of Deep Learning Containers (DLC) from which to load the models.
                                             Necessitates libQnnModelDlc.so as the --model argument.
                                             To compose multiple graphs in the context, use comma-separated list of DLC files.
                                             The syntax is <qnn_model_name_1.dlc>,<qnn_model_name_2.dlc>
                                             Default: None

 --graph_profiling_num_executions  <VAL>     Specifies the maximum number of QnnGraph_execute/QnnGraph_executeAsync calls to be profiled.
                                             Please Note that this option can only be used in conjunction with graph-level profiling handles.

 --io_tensor_mem_handle_type       <VAL>     Specifies mem handle type to be used for Input and output tensors during graph execution.
                                             Valid settings: "ion" and "dma_buf".

 --device_options                  <VAL>     Specifies values to pass as device options. Multiple device options can be provided using the
                                             syntax: key0:value0;key1:value1;key2:value2
                                             Currently supported options:
                                             device_id:<n> - selects a particular hardware device by ID to execute on. This ID will be used
                                                             during QnnDevice creation. A default device will be chosen by the backend if
                                                             an ID is not provided. This value will override a device ID selected in a
                                                             backend config file.
                                             core_id:<n> - selects a particular core by ID to execute on the selected device. This ID will
                                                           be used during QnnDevice creation. A default core will be chosen by the backend
                                                           if an ID is not provided. This value will override a core ID selected in a
                                                           backend config file.

 --retrieve_context_list           <VAL>     Provide the path to yaml file which contains info regarding multiple contexts. --retrieve_context_list
                                             is mutually exclusive with --retrieve_context, --model and --dlc_path. Please refer to SDK documentation
                                             for more details.

 --binary_updates                  <VAL>     Path to yaml that contains paths to binary updates.
                                             Updates are applied after initial graph execution on
                                             a per graph basis.

  --version                                  Print the QNN SDK version.

  --help                                     Show this help message.

EXIT CODES:
------------
List of exit codes used in qnn-net-run application.

Exit codes 1, 2, 126 – 165 and 255 should be avoided for user-defined exit codes since they have
special purpose as below:
1, 2  : Abnormal termination of a program.
126 - 165 are specifically used to indicate seg faults, bus errors etc..

 3  - Application failure reason unknown. See DSP logs (logcat).

 4  - Application failure due to invalid application argument.

 6  - Application failure during setting log level.

 7  - Application failure due to null or invalid function pointer etc.

 9  - Application failure during qnn_net_run_HtpVXXHexagon initialization.

 10 - Application failure during backend creation.

 11 - Application failure during device creation.

 12 - Application failure during Op Package registration.

 13 - Application failure during creating context.

 14 - Application failure during graph prepare.

 15 - Application failure during graph finalize.

 16 - Application failure during create from binary.

 17 - Application failure during graph execution.

 18 - Application failure during context free.

 19 - Application failure during device free.

 20 - Application failure during backend termination.

 21 - Application failure during graph execution abort.

 22 - Application failure during graph execution timeout.

 23 - Application failure during the create from binary with suboptimal cache.

 24 - Application failure during backend termination.

 25 - Application failure during processing binary section or updating binary section etc.

 26 - Application failure during binary update/execution.

See <QNN_SDK_ROOT>/examples/QNN/NetRun folder for reference example on how to use qnn-net-run tool.

Typical arguments:

--backend - The appropriate argument depends on what target and backend you want to run on

Android (aarch64): <QNN_SDK_ROOT>/lib/aarch64-android/

  • CPU - libQnnCpu.so

  • GPU - libQnnGpu.so

  • HTA - libQnnHta.so

  • DSP (Hexagon v65) - libQnnDspV65Stub.so

  • DSP (Hexagon v66) - libQnnDspV66Stub.so

  • DSP - libQnnDsp.so

  • HTP (Hexagon v68) - libQnnHtp.so

  • [Deprecated] HTP Alternate Prepare (Hexagon v68) - libQnnHtpAltPrepStub.so

  • LPAI (Stub library) - libQnnLpaiStub.so

  • LPAI - libQnnLpai.so

  • Saver - libQnnSaver.so

Linux x86: <QNN_SDK_ROOT>/lib/x86_64-linux-clang/

  • CPU - libQnnCpu.so

  • HTP (Hexagon v68) - libQnnHtp.so

  • LPAI - libQnnLpai.so

  • Saver - libQnnSaver.so

Windows x86: <QNN_SDK_ROOT>/lib/x86_64-windows-msvc/

  • CPU - QnnCpu.dll

  • LPAI - QnnLpai.dll

  • Saver - QnnSaver.dll

WoS: <QNN_SDK_ROOT>/lib/aarch64-windows-msvc/

  • CPU - QnnCpu.dll

  • DSP (Hexagon v66) - QnnDspV66Stub.dll

  • DSP - QnnDsp.dll

  • HTP (Hexagon v68) - QnnHtp.dll

  • Saver - QnnSaver.dll

Note

Hexagon based backend libraries are emulations on x86_64 platforms

--input_list - This argument provides a file containing paths to input files to be used for graph execution. Input files can be specified with the below format:

<input_layer_name>:=<input_layer_path>[<space><input_layer_name>:=<input_layer_path>]
[<input_layer_name>:=<input_layer_path>[<space><input_layer_name>:=<input_layer_path>]]
...

Below is an example containing 3 sets of inputs with layer names “Input_1” and “Input_2”, and files located in the relative path “Placeholder_1/real_input_inputs_1/”:

Input_1:=Placeholder_1/real_input_inputs_1/0-0#e6fb51.rawtensor Input_2:=Placeholder_1/real_input_inputs_1/0-1#8a171b.rawtensor
Input_1:=Placeholder_1/real_input_inputs_1/1-0#67c965.rawtensor Input_2:=Placeholder_1/real_input_inputs_1/1-1#54f1ff.rawtensor
Input_1:=Placeholder_1/real_input_inputs_1/2-0#b42dc6.rawtensor Input_2:=Placeholder_1/real_input_inputs_1/2-1#346a0e.rawtensor

Note: If the batch dimension of the model is greater than 1, the number of batch elements in the input file has to either match the batch dimension specified in the model or it has to be one. In the latter case, qnn-net-run will combine multiple lines into a single input tensor.

--op_packages - This argument is only needed if you are using custom op packages. The native QNN ops are already included as part of the backend libraries.

When using custom op packages, each provided op package requires a colon separated command line argument containing the path to the op package shared library (.so) file, as well as the name of the interface provider, formatted as <op_package_path>:<interface_provider>.

The interface_provider argument must be the name of the function in the op package library that satisfies the QnnOpPackage_InterfaceProvider_t interface. In the skeleton code created by qnn-op-package-generator, this function will be named <package_name><backend>InterfaceProvider.

See Generating Op Packages for more information.

{
  "backend_extensions" :
    {
      "shared_library_path" :  "path_to_shared_library",
      "config_file_path" :  "path_to_config_file"
    },
  "context_configs" :
    {
      "context_priority" :  "low | normal | normal_high | high",
      "async_execute_queue_depth" : uint32_value,
      "enable_graphs" :  ["<graph_name_1>", "<graph_name_2>", ...],
      "memory_limit_hint"  : uint64_value,
      "is_persistent_binary" : boolean_value,
      "cache_compatibility_mode" : "permissive | strict",
      "spill_fill_buffer" : int64_value,
      "weights_buffer" : int64_value
    },
  "graph_configs" : [
    {
      "graph_name" :  "graph_name_1",
      "graph_priority" :  "low | normal | normal_high | high"
      "graph_profiling_start_delay" : double_value
      "graph_profiling_num_executions" : uint64_value
    }
  ],
  "profile_configs" :
    {
      "num_max_events" : uint64_value
    },
  "async_graph_execution_config" :
    {
      "input_tensors_creation_tasks_limit" : uint32_value,
      "execute_enqueue_tasks_limit" : uint32_value
    },
  "soc_configs" :
    {
      "soc_model" : int32_value
    }
}

All the options in the JSON file are optional. context_priority is used to specify priority of the context as a context config. async_execute_queue_depth is used to specify the number of executions that can be in the queue at a given time. While using a context binary, enable_graphs is used to implement the graph selection functionality. memory_limit_hint is used to set the peak memory limit hint of a deserialized context in MBs. is_persistent_binary indicates that the context binary pointer is available during QnnContext_createFromBinary and until QnnContext_free is called. *spill_fill_buffer is used to store spill fill values in a buffer shared between application and backend. *weights_buffer is used to store weights in a buffer shared between application and backend.

Set Cache Compatibility Mode : cache_compatibility_mode specifies the mode used to check whether cache record is optimal for the device. The available modes indicate binary cache compatibility:

  • “permissive”: Binary cache is compatible if it could run on the device; default.

  • “strict”: Binary cache is compatible if it could run on the device and fully utilize hardware capability. If it cannot fully utilize hardware, selecting this option results in a recommendation to prepare the cache again. This option returns an error if it is not supported by the selected backend.

Graph Selection : Allows to specify a subset of graphs in a context to be loaded and executed. If enable_graphs is specified, only those graphs are loaded. If a graph name is selected and it doesn’t exist, that would be an error. If enable_graphs is not specified or passed as an empty list, default behaviour continues where all graphs in a context are loaded.

graph_configs can be used to specify asynchronous execution order and depth, if a backend supports asynchronous execution. Every set of graph configs has to be specified along with a graph name. graph_profiling_start_delay is used to set the profiling start delay time in seconds. graph_profiling_num_executions is used to set the maximum number of QnnGraph_execute/QnnGraph_executeAsync calls that will be profiled.

profile_configs can be used to specify the max profile events per profiling handle.

async_graph_execution_config can be used to specify the limits on number of tasks that run in parallel when graphs are executed asynchronously using graphExecuteAsync. input_tensors_creation_tasks_limit specifies the maximum number of tasks in which input tensor sets are populated, which can be used for graph execution. execute_enqueue_tasks_limit specifies the maximum number of tasks in which the backend graphExecuteAsync will be called using the pre-populated input tensors. If unspecified, these values will be set to the specified “async_execute_queue_depth” or 10 which is the default for “async_execute_queue_depth”.

backend_extensions is used to exercise custom options in a particular backend. This can be done by providing an extensions shared library (.so) and a config file, if necessary. This is also required to enable various performance modes, which can be exercised using backend config. Currently, HTP supports it through libQnnHtpNetRunExtensions.so shared library, DSP supports it through libQnnDspNetRunExtensions.so and GPU supports it through libQnnGpuNetRunExtensions.so. For different custom options which can be enabled with HTP see HTP Backend Extensions

soc_configs can be used to specify the simulated soc model listed in Supported Snapdragon Devices

--shared_buffer - This argument is only needed to indicate qnn-net-run to use shared buffers for zero-copy use case with a device/coprocessor associated with a particular backend (for ex., DSP with HTP backend) for graph input and output tensor data. This option is supported on Android only. qnn-net-run implements this feature using rpcmem APIs, which further create shared buffers using ION/DMA-BUF memory allocator on Android, available through the shared library libcdsprpc.so. In addition to specifying this option, for qnn-net-run to be able to discover libcdsprpc.so, the path in which the shared library is present needs to be appended to LD_LIBRARY_PATH variable.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/vendor/lib64

--retrieve_context_list - This argument is used to specify a YAML file that contains information about multiple contexts,each with its associated binary path, context configuration, and input files, enabling streamlined setup of contexts. The template of the YAML file is shown below:

version : 1
contexts:
- name: <context_name_1>
  binaryFilePath: <binary_file_path>
  contextConfig:
    context_priority:  <low | normal | normal_high | high>
    async_execute_queue_depth: <uint32_value>
    enable_graphs: ["<graph_name_1>", "<graph_name_2>", ...]
    memory_limit_hint: <uint64_value>
    is_persistent_binary: <boolean_value>
    cache_compatibility_mode: <permissive | strict>
    spill_fill_buffer: <int64_value>
    weights_buffer: <int64_value>
  inputFileList:
    - graphName: <string_value>
      inputFilePath: <input_list_file_path>
- name: <context_name_2>
  binaryFilePath: <binary_file_path>
  contextConfig:
    context_priority:  <low | normal | normal_high | high>
    async_execute_queue_depth: <uint32_value>
    enable_graphs: ["<graph_name_1>", "<graph_name_2>", ...]
    memory_limit_hint: <uint64_value>
    is_persistent_binary: <boolean_value>
    cache_compatibility_mode: <permissive | strict>
    spill_fill_buffer: <int64_value>
    weights_buffer: <int64_value>
  inputFileList:
    - graphName: <string_value>
      inputFilePath: <input_list_file_path>
version is used to specify the version of the configuration file.
contexts is used to specify a list of context configurations.
name is used to specify the name of the context.
binaryFilePath is used to specify the path to the serialized binary file for the context.
contextConfig is used to specify a dictionary containing context configuration options. Check context_config for more details.
inputFileList is used to specify a list of graphName and inputFilePath for the context, graphName is used to specify the name of the graph and
inputFilePath is used to specify the path to the input file for the graph.

Running Quantized Model on HTP backend with qnn-net-run

The HTP backend currently allows to finalize / create an optimized version of a quantized QNN model offline, on Linux development host (using x86_64-linux-clang backend library) and then execute the finalized model on device (using hexagon-v68 backend libraries).

First, configure the environment by following instructions in Setup section. Next, build QNN Model library from your network, using artifacts produced by one of QNN converters. See Building Example Model for reference. Lastly, use the qnn-context-binary-generator utility to generate a serialized representation of the finalized graph to execute the serialized binary on device.

1# Generate the optimized serialized representation of QNN Model on Linux development host.
2$ qnn-context-binary-generator --binary_file qnngraph.serialized.bin \
3                               --model <path_to_model_library>/libQnnModel.so \ # a x86_64-linux-clang built quantized QNN model
4                               --backend ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnHtp.so \
5                               --output_dir <output_dir_for_result_and_qnngraph_serialized_binary> \

To use produced serialized representation of the finalized graph (qnngraph.serialized.bin) ensure the below binaries are available on the android device:

  • libQnnHtpV68Stub.so (ARM)

  • libQnnHtpPrepare.so (ARM)

  • libQnnModel.so (ARM)

  • libQnnHtpV68Skel.so (cDSP v68)

  • qnngraph.serialized.bin (serialized binary from run on Linux development host)

See <QNN_SDK_ROOT>/examples/QNN/NetRun/android/android-qnn-net-run.sh script for reference on how to use qnn-net-run tool on android device.

1# Run the optimized graph on HTP target
2$ qnn-net-run --retrieve_context qnngraph.serialized.bin \
3              --backend <path_to_model_library>/libQnnHtp.so \
4              --output_dir <output_dir_for_result> \
5              --input_list <path_to_input_list.txt>

Running Float Model on HTP backend with qnn-net-run

The QNN HTP backend can support running float32 models on select Qualcomm SoCs using float16 math.

First, configure the environment by following instructions in Setup section. Next, build QNN Model library from your network, using artifacts produced by one of QNN converters. See Building Example Model for reference.

Lastly, configure backend_extensions parameters through a JSON file and set custom options for the HTP backend. Pass this file to qnn-net-run using --config_file argument. backend_extensions take two parameters, an extensions shared library (.so) (for HTP use libQnnHtpNetRunExtensions.so) and a config file for the backend.

Below is the template for the JSON file:

{
  "backend_extensions" :
    {
      "shared_library_path" :  "path_to_shared_library",
      "config_file_path" :  "path_to_config_file"
    }
}

For HTP backend extensions configurations, you can set “vtcm_mb” and “graph_names” through a config file.

Here is an example of the config file:

 1{
 2   "graphs": [
 3      {
 4        "vtcm_mb": 8,  // Provides performance infrastructure configuration options that are memory specific.
 5                       // Optional; if not set, QNN HTP defaults to 4.
 6
 7        "graph_names": [ "qnn_model" ]  // Provide the list of names of the graph for the inference as specified when using qnn converter tools
 8                                        // "qnn_model" must be the name of the .cpp file generated during the model conversion (without the .cpp file extension)
 9        .....
10      },
11      {
12         .....  // Other graph object
13      }
14   ]
15}

Note

“fp16_relaxed_precision” is deprecated starting from 2.35 release.

See <QNN_SDK_ROOT>/examples/QNN/NetRun/android/android-qnn-net-run.sh script for reference on how to use qnn-net-run tool on android device.

1# Run the optimized graph on HTP target
2$ qnn-net-run --model <path_to_model_library>/libQnnModel.so \ # a x86_64-linux-clang built float QNN model
3              --backend ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnHtp.so \
4              --config_file <path_to_JSON_file.json> \
5              --output_dir <output_dir_for_result> \
6              --input_list <path_to_input_list.txt>

qnn-throughput-net-run

The qnn-throughput-net-run tool is used to exercise the execution of multiple models on a QNN backend or on different backends in a multi-threaded fashion. It allows repeated execution of models on a specified backend for a specified duration or number of iterations.

Usage:
------
qnn-throughput-net-run [--config <config_file>.json]
                       [--output <results>.json]

REQUIRED argument(s):
 --config        <FILE>.json       Path to the json config file .

OPTIONAL argument(s):
 --output        <FILE>.json       Specify the json file used to save the performance test results.

 --version                         Print the QNN SDK version.

 --help                            Show help message.

Configuration JSON File:

qnn-throughput-net-run uses configuration file as input to run the models on the backends. The configuration json file comprises of four objects (required) - backends, models, contexts and testCase.

Below is an example of a json configuration file. Please refer the following section for detailed information on the four configuration objects backends, models, contexts and testCase.

{
  "backends": [
    {
      "backendName": "cpu_backend",
      "backendPath": "libQnnCpu.so",
      "profilingLevel": "BASIC",
      "backendExtensions": "libQnnHtpNetRunExtensions.so",
      "perfProfile": "high_performance"
    },
    {
      "backendName": "gpu_backend",
      "backendPath": "libQnnGpu.so",
      "profilingLevel": "OFF"
    }
  ],
  "models": [
    {
      "modelName": "model_1",
      "modelPath": "libqnn_model_1.so",
      "loadFromCachedBinary": false,
      "inputPath": "model_1-input_list.txt",
      "inputDataType": "FLOAT",
      "postProcessor": "MSE",
      "outputPath": "model_1-output",
      "outputDataType": "FLOAT_ONLY",
      "saveOutput": "NATIVE_ALL",
      "groundTruthPath": "model_1-golden_list.txt"
    },
    {
      "modelName": "model_2",
      "modelPath": "libqnn_model_2.so",
      "loadFromCachedBinary": false,
      "inputPath": "model_2-input_list.txt",
      "inputDataType": "FLOAT",
      "postProcessor": "MSE",
      "outputPath": "model_2-output",
      "outputDataType": "FLOAT_ONLY",
      "saveOutput": "NATIVE_LAST"
    }
  ],
  "contexts": [
    {
      "contextName": "cpu_context_1"
    },
    {
      "contextName": "gpu_context_1"
    }
  ],
  "testCase": {
    "iteration": 5,
    "logLevel": "error",
    "threads": [
      {
        "threadName": "cpu_thread_1",
        "backend": "cpu_backend",
        "context": "cpu_context_1",
        "model": "model_1",
        "interval": 10,
        "loopUnit": "count",
        "loop": 1
      },
      {
        "threadName": "gpu_thread_1",
        "backend": "gpu_backend",
        "context": "gpu_context_1",
        "model": "model_2",
        "interval": 0,
        "loopUnit": "count",
        "loop": 10
      }
    ]
  }
}

Key

Value Type

Default Value

Optional / Required

Description

backendName

string

-

Required

Is a unique identifier for the testcase to designate on which backend the model should be run.

backendPath

string

-

Required

Specifies the on device backend .so library file path.

profilingLevel

string

OFF

Optional

Sets the QNN profiling level for the backend. Possible values: OFF, BASIC, DETAILED.

  • BASIC - Captures execution and init times.

  • DETAILED - In addition to BASIC captures per Op timing for execution, if backend supports.

backendExtensions

string

-

Optional

Enables backend specific options through optional backend extensions shared library and config file. Syntax: path_to_shared_library.

This is required to enable various performance modes which are exercised using perfProfile option. Currently, HTP supports it through libQnnHtpNetRunExtensions.so shared library.

perfProfile

string

default

Optional

Specifies performance profile to set.

Possible values: low_balanced, balanced, default, high_performance, sustained_high_performance, burst, low_power_saver, power_saver, high_power_saver, extreme_power_saver and system_settings.

opPackagePath

string

Native QNN Ops. part of the backend libraries

Optional

Comma seperated list of custom op packages and interface providers for registration.

Syntax: op_package_1_path:interface_provider_1[,op_package_2_path:interface_provider_2…]

platformOption

string

-

Optional

Enables backend specific platform options through QnnBackend_Config_t.

Syntax: "key:value"

Key

Value Type

Default Value

Optional / Required

Description

modelName

string

-

Required

Is a unique identifier for the testcase to designate which model to run.

modelPath

string

-

Required

Specifies the <model>.so / <serialized_context>.bin file path.

loadFromCachedBinary

bool

false

Optional

Set to true if <serialized_context>.bin is used in modelPath.

inputPath

string

-

Optional

Path to a file listing the inputs for the model.

If there are multiple graphs in the <model>.so / <serialized_context>.bin, this has to be comma-separated list of input path of individual graph. Syntax: Graph1_input_path[,Graph2_input_path,…]

If not set, Random Input Data is used.

inputDataType

string

NATIVE

Optional

Possible values: NATIVE, FLOAT.

postProcessor

string

-

Optional

Possible values: NONE, MSE, MSE_FLOAT32, MSE_INT8, MSE_INT16. If there are multiple graphs in the <model>.so / <serialized_context>.bin, this has to be comma-separated list of postProcessor values. Syntax: MSE[,NONE,…]

MSE will output a mean squared error result for each execution with the golden file specified by the parameter groundTruthPath. If the groundTruthPath is not specified, the first execution output result is used to compute the MSE. If the datatype of the file specified in groundTruthPath is different from the network’s output type, users need to specify the relevant datatype in the postProcessor parameter.

outputPath

string

-

Optional

If postProcessor is not NONE, output files and profiling logs will be saved to this directory.

outputDataType

string

NATIVE_ONLY

Optional

Possible values: NATIVE_ONLY, FLOAT_ONLY, FLOAT_AND_NATIVE.

saveOutput

string

NONE

Optional

Possible values: NONE, NATIVE_LAST,NATIVE_ALL.

  • NATIVE_LAST - Saves only the result of the last network execution to the outputPath.

  • NATIVE_ALL - Saves the results of all network executions to the outputPath.

groundTruthPath

string

NONE

Optional

Specifies the golden file path for computing the MSE. If there are multiple graphs in the <model>.so / <serialized_context>.bin, this has to be comma-separated list of ground truth path of individual graph. Syntax: Graph1_ground_truth_path_[,Graph2_ground_truth_path_,…]

Key

Value Type

Default Value

Optional / Required

Description

contextName

string

-

Required

Is a unique identifier for the testcase to designate the context in which a model should be created.

priority

string

DEFAULT

Optional

Specifies the priority of the context. Possible values: DEFAULT, LOW, NORMAL, HIGH.

executeAsyncQueueDepth

int

-

Optional

Specfies the queue depth for async execution.

cacheCompatibilityMode

string

-

Optional

Specifies the cache compatibility check mode; valid values are: “permissive” (default), and “strict”.

Key

Value Type

Default Value

Optional / Required

Description

iteration

int

-

Required

Number of times the entire use case is repeated. If the value is negative, test runs forever until keyboard interrupt.

logLevel

string

-

Optional

Specifies max logging level to be set. Valid settings: error, warn, info, debug, and verbose

threads

string

-

Required

Property value is an array of json objects, where each object contains all the thread details, that are to be executed by the qnn-throughput-net-run. Each object of the array has the below properties listed under threads as key/value pairs.

Key

Value Type

Default Value

Optional / Required

Description

threadName

string

-

Required

Is a unique identifier for the testcase to identify the thread and save the output results.

backend

string

-

Required

Specifies the backend to be used when this thread executes the graph. The value specified should match with one of the backendName entry in the backends property of the configuration json.

context

string

-

Required

Specifies the context to be used when this thread executes the graph. The value specified should match with one of the contextName entry in the contexts property of the configuration json.

model

string

-

Required

Specifies the model to be used by the thread for execution. The value specified should match with one of the modelName entry in the models property of the configuration json.

initModelInLoop

bool

false

Optional

Set it to true if the model needs to be initialized repeatedly for every iteration. The value cannot be set to true if loadFromCachedBinary from models property is true.

loadInputDataInLoop

bool

false

Optional

Set it to true if the input needs to be reloaded for every loop of execution.

useRandomData

bool

false

Optional

Set it to true if random data is needed to be used as input.

interval

int

0

Optional

Repesents the interval (in microseconds) between each graph execution in the thread.

loopUnit

string

count

Optional

Possible values: count, second.

loop

int

1

Optional

Value is taken either as seconds or count based on the value for the loopUnit. If loopUnit is second, the value specifies the number of seconds the threads repeats execution. If loopUnit is count, the value specifies number of times thread repeats execution.

executeAsynchronous

bool

false

Optional

Set it to true if the graphs should be executed asynchronously rather than synchronously. If the backend does not support asynchronous execution, this option results in an error.

backendConfig

string

-

Optional

Specifies the backend config file to enable backend specific options through backendExtensions shared library. Syntax: path_to_backend_config_file.

An example json file sample_config.json file can be found at <QNN_SDK_ROOT>/examples/QNN/ThroughputNetRun.

Analysis

qairt-accuracy-evaluator (Beta)

The qairt-accuracy-evaluator tool provides a framework to evaluate end-to-end accuracy metrics for a model on a given dataset. In addition, the tool can be used to identify the best quantization options for a model on a given set of inputs.

Dependencies

The QNN Accuracy Evaluator assumes that the platform dependencies and environment setup instructions have been followed as outlined in the Setup page. Certain additional python packages are required by this tool, refer to Optional Python packages.

Note: The qairt-accuracy-evaluator currently supports only ONNX models.

Usage

User needs to set QNN_SDK_ROOT environment variable to root directory of QNN SDK. The following environment variables might need to be set with appropriate values: QNN_MODEL_ZOO : Path to model zoo. If not set, an absolute path must be provided explicitly.

Note: This environment variable is required only if the model path supplied is not absolute and relative to the set model zoo path. ADB_PATH : Set the path to the ADB binary. If not set, it is queried and set from its executable path.

To conduct an accuracy analysis of a given model using a specific dataset, the user must create a configuration that specifies the backends, quantization options, and reference inference frameworks. Sample config files can be found at ${QNN_SDK_ROOT}/lib/python/qti/aisw/accuracy_evaluator/configs/samples/model_configs.

The high-level structure of a model config is shown below:

model
    info
    globals
    dataset
    preprocessing
    inference-engine
    adapter    # This is only applicable when use_memory_plugins is enabled
    postprocessing
    verifier
    metrics

Users can utilize the info section of the model configuration to provide a brief description of the model or dataset being evaluated and specify the maximum number of calibration inputs for quantization. These fields are optional and default to None. Additionally, users can define constants to be used throughout their configuration. These variables can be overridden from the CLI using the -set_global command, offering convenience and flexibility. Note that the values provided are applicable only within the model configuration and not accessible within the script itself. The evaluator will replace strings (variable names) within the configuration with user-defined values before the start of the evaluation. Additionally, users can enable the memory pipeline by setting the memory_pipeline field in the info section. This approach is recommended for X86-based evaluations or AIC Backend due to its optimized performance. The following parameters control the multi-threading and processing behavior of the memory pipeline. Users can provide these parameters under the info section of the evaluator configuration:

  • memory_pipeline: Flag to enable or disable memory pipeline.

  • dump_stages: Users can specify the list of stages they want to dump. For example: dump_stages: [ ‘preproc’, ‘infer’,’postproc’ ]. Note: When an Android-based schema is present for evaluation, we will always dump the preprocessed files to the disk.

  • max_parallel_evaluations: Users can control the number of parallel evaluations they want to perform during evaluation. By default, num_parallel_evaluations would be max(number of CPU cores / 2, number of targets/devices supplied).

  • max_parallel_compilation: Users can control the number of parallel compilations they want to perform during evaluation. By default, num_parallel_compilation would be number of CPU cores / 2.

  • data_chunk_size: Users can specify the number of samples they want to evaluate at a time for Android/remote targets, which might be resource-constrained (storage/timeout). By default, data_chunk_size will be the same as the number of samples in the configured dataset.

User needs to provide all dataset information under the dataset section in the model config file, failing which, an error is thrown. An example of this is shown below:

dataset:
    name: COCO2014
    path: '/home/ml-datasets/COCO/2014/'
    inputlist_file: inputlist.txt
    calibration:
        type: index
        file: calibration-index.txt

Details of the dataset fields is as follows:

Field

Description

name

Name of the dataset

path

Base directory of the dataset files

inputlist_file

Text file containing all the pre-processed input files relative to the path field, one input per line.
For models having multiple inputs, the inputs in each line have to be comma separated

calibration

Specifies the calibration file type to be used with quantization. Optional. It has following params
  • type: Type can be ‘index’, ‘raw’ or ‘dataset’
    • index - File provided contains the indexes to be picked from inputlist for calibration

    • raw - File provided contains entries of pre-processed raw files for calibration

    • dataset - File provided contains images processed separately and passed to inference

  • file: pre-processed calibration file name

The inference engine is used to run the model on multiple inference schemas. A sample inference engine section is shown below, followed by the description of the different configurable entries in the inference section.

inference-engine:
    model_path: MLPerfModels/ResNetV1.5/modelFiles/ONNX/resnet50_v1.onnx
    simplify_model : True
    inference_schemas:
        - inference_schema:
            name: qnn
            precision: quant
            target_arch: x86_64-linux-clang
            backend: htp
            tag: qnn_int8_htp_x86
            converter_params:
               float_bias_bitwidth: 32
            quantizer_params:
               param_quantizer_schema: symmetric
               act_quantizer_calibration: min-max
               use_per_channel_quantization: True
            backend_extensions:
               vtcm_mb: 4
               rpc_control_latency: 100
               dsp_arch: v75 #mandatory
    inputs_info:
        - input_tensor_0:
              type: float32
              shape: ["*", 3, 224, 224]
    outputs_info:
        - ArgMax_0:
              type: int64
              shape: ["*"]
        - softmax_tensor_0:
              type: float32
              shape: ["*", 1001]

Details of each configurable entry is given below:

Field

Description

model_path

Absolute or relative path of the model. If the path is relative, it would be taken relative to MODEL_ZOO_PATH, if set, else absolute path is needed.

simplify_model

Flag to enable or disable model simplification for ONNX models. By default, this flag is set to True and the model would be simplified. Note: Model simplification would be skipped for models having custom operators or for inference schemas having quantization_overrides parameter configured.

inference_schemas

List of inference schemas to perform inference on. Each inference_schema has further entries as the following:
  • name - Name of the inference schema. Options: qnn, onnxrt, tensorflow, torchscript, tensorflow-session

  • precision - Precision to run inference on. Options: fp32, fp16, int8/quant

  • target_arch - Target architecture on which to run inference. Options: x86_64-linux-clang, aarch64-android, wos

  • backend - Backend on which to run inference. Allowed backends for x86_64-linux-clang: {cpu,htp}, aarch64: {cpu,gpu,htp} and wos: {cpu,htp}.

  • tag - Tag unique for a inference schema

  • converter_params - Params to be passed as arguments to converter

  • quantizer_params - Params to be passed as arguments to quantizer

  • contextbin_params - Params to be passed as arguments to context-binary-generator

  • netrun_params - Params to be passed as arguments to net-run

  • backend_extensions - Params to be passed as backend extensions config file to context-binary-generator and net-run

input_info

Information about each model input. Requires following params in the given order
  • type - numpy type (float16, float32, float64, int8, int16, int32, int64)

  • shape - list of dimensions

output_info

Information about each model output. Requires following params in the given order
  • type - numpy type (float16, float32, float64, int8, int16, int32, int64)

  • shape - list of dimensions

Note

For HTP backend emulation on host, set the backend to “htp” and target_arch as “x86_64-linux-clang” in the configuration file.
For HTP backend execution on Android device, set the backend to “htp” and target_arch as “aarch64-android” in the configuration file. Also, users must provide the dsp_arch version such as “v69”, “v73”, “v75” under the backend_extensions section.
For HTP backend execution on Windows on Snapdragon, set the backend to “htp” and target_arch as “wos” in the configuration file. Also, users must provide the dsp_arch version such as “v69”, “v73”, “v75” under the backend_extensions section.
The adapter section in the model configuration is valid only when the user enables use_memory_plugins.
The use_memory_plugins value is disregarded when the user provides the use_memory_pipeline CLI argument or specifies the memory_pipeline field in the info section of evaluator config.

Command line options available for config mode are as follows:

qairt-acc-evaluator options

options:
    -config CONFIG        path to model config yaml
    -work_dir WORK_DIR      working directory path. default is ./qacc_temp
    -onnx_symbol ONNX_SYMBOL [ONNX_SYMBOL ...]
                            Replace onnx symbols in input/output shapes. Can be passed as list of multiple items.
                            Default replaced by 1. Example: __unk_200:1
    -device_id DEVICE_ID    Target device id to be provided
    -inference_schema_type INFERENCE_SCHEMA_TYPE
                            run only the inference schemas with this name. Example: qnn, onnxrt
    -inference_schema_tag INFERENCE_SCHEMA_TAG
                            run only this inference schema tag
    -cleanup CLEANUP        end: deletes the files after all stages are completed.
                            intermediate: deletes after previous stage outputs are used. (default:'')
    -use_memory_plugins     Flag to enable memory plugins.
    -use_memory_pipeline    Flag to enable memory pipeline. use_memory_plugins is ignored.
    -silent                 Run in silent mode. Do not expect any CLI input from user.
    -debug                  Enable debug logs on console and the file. (default: False)
    -set_global SET_GLOBAL [SET_GLOBAL ...]
                            Option used to override global variables provided in the model configuration. Multiple global variables can be specified.
                            Example: -set_global count:10 -set_global calib:5 (default: None)

Note

Users can accelerate their evaluations using memory pipeline to minimize unnecessary reading and writing of data during evaluation by passing the -use_memory_pipeline flag to the evaluator command. This feature is currently supported for Linux only.

Config file options

- inference_schema:
    name: qnn
    target_arch: x86_64-linux-clang
    backend: cpu
    precision: fp32
    tag: qnn_cpu_x86

- inference_schema:
    name: qnn
    target_arch: aarch64-android
    backend: cpu
    precision: fp32
    tag: qnn_cpu_android

- inference_schema:
    name: qnn
    target_arch: wos
    backend: cpu
    precision: fp32
    tag: qnn_cpu_x86

- inference_schema:
    name: qnn
    target_arch: aarch64-android
    backend: gpu
    precision: fp32
    tag: qnn_gpu_android

- inference_schema:
    name: qnn
    target_arch: x86_64-linux-clang
    backend: htp
    precision: quant
    tag: htp_int8
    converter_params:
        quantization_overrides: "path to the ext quant json"
    quantizer_params:
        param_quantizer_calibration: min-max | sqnr
        param_quantizer_schema: asymmetric  | symmetric
        use_per_channel_quantization: True | False
        use_per_row_quantization: True | False
        act_bitwidth: 8 | 16
        bias_bitwidth: 8 | 32
        weights_bitwidth: 8 | 4
    backend_extensions:
        dsp_arch: v79 # mandatory
        vtcm_mb: 4
        rpc_control_latency: 100

- inference_schema:
    name: qnn
    target_arch: aarch64-android
    backend: htp
    precision: quant
    tag: htp_int8
    converter_params:
        quantization_overrides: "path to the ext quant json"
    quantizer_params:
        param_quantizer_calibration: min-max | sqnr
        param_quantizer_schema: asymmetric  | symmetric
        use_per_channel_quantization: True | False
        use_per_row_quantization: True | False
        act_bitwidth: 8 | 16
        bias_bitwidth: 8 | 32
        weights_bitwidth: 8 | 4
    backend_extensions:
        dsp_arch: v79 # mandatory
        vtcm_mb: 4
        rpc_control_latency: 100

- inference_schema:
    name: qnn
    target_arch: wos
    backend: htp
    precision: quant
    tag: htp_int8
    converter_params:
        quantization_overrides: "path to the ext quant json"
    quantizer_params:
        param_quantizer_calibration: min-max | sqnr
        param_quantizer_schema: asymmetric  | symmetric
        use_per_channel_quantization: True | False
        use_per_row_quantization: True | False
        act_bitwidth: 8 | 16
        bias_bitwidth: 8 | 32
        weights_bitwidth: 8 | 4
    backend_extensions:
        dsp_arch: v79 # mandatory
        vtcm_mb: 4
        rpc_control_latency: 100

Verifiers

The verifier section provides information about the verifier being used to compare the inference outputs, in case of multiple inference schemas. A sample verifier section is shown below, followed by the description of the different configurable entries in the section.

verifier:
    enabled: True
    fetch_top: 1
    type: average
    tol: 0.01

Details of each configurable entry is given below:

Field

Description

verifier

If multiple inference schemas are provided, compare the inference outputs with the reference inference schema. If reference inference schema is not defined, the first inference schema is considered the reference. If only one inference schema is defined, verifier is not executed. The following params need to be provided:
  • enabled - By default enabled (True)

  • fetch_top - Fetch top ‘n’ highest mismatching outputs, Default 1

  • type - One of in-built verifiers (average, cosine, l1_norm, l2_norm). Default average

  • tol - Tolerance value. Default 0.001

Following are the verifiers that can be used to compare the outputs.

  1. cosine - Comparison between two tensors based on the Cosine Similarity score

  2. average - Comparison between two tensors based on the average difference between the two tensors

  3. l1_norm - Comparison between two tensors based on the L1 Norm of the difference

  4. l2_norm - Comparison between two tensors based on the L2 Norm of the difference

  5. standard_deviation - Comparison between two tensors based on the standard deviation difference

  6. mse - Comparison between two tensors based on the Mean Square Error between the tensors

  7. snr - Signal to Noise Ratio between the two tensors

  8. kl_divergence - KL Divergence value between the two tensors

Plugins

Plugins are Python classes used to implement different stages of the inference pipeline, such as dataset handling, preprocessing, postprocessing, and metrics logic.

Dataset and pre-processing plugins perform transformations to the input before they are passed to inference.

Adapter plugins convert the model’s inference outputs into standard formats for use by subsequent postprocessor or metric plugins. Note: This is applicable only when use_memory_plugins is enabled.

Post-processing plugins transform inference outputs.

Metric plugins analyze inference outputs to assess their accuracy

Sample plugins are provided in the SDK at ${QNN_SDK_ROOT}/lib/python/qti/aisw/accuracy_evaluator/plugins.

Users can implement their own plugins (custom plugins) to meet their specific requirements. To include custom plugins, export the CUSTOM_PLUGIN_PATH environment variable pointing to the location of the custom plugin(s), so that they are also included while registering the plugin(s).

export CUSTOM_PLUGIN_PATH=/path/to/custom/plugins/directory

In the model configuration file, plugins are defined as a transformation chain, as shown below:

transformations:
    - plugin:
          name: resize
          params:
              dims: 416,416
              channel_order: RGB
              type: letterbox

    - plugin:
          name: normalize
    - plugin:
          name: convert_nchw

Plugins required for dataset transformation are configured in the dataset section as shown below.

dataset:
    name: ILSVRC2012
    path: '/home/ml-datasets/imageNet/'
    inputlist_file: inputlist.txt
    annotation_file: ground_truth.txt
    calibration:
        type: dataset
        file: calibration.txt
    transformations:
        - plugin:
              name: filter_dataset
              params:
                  random: False
                  max_inputs: -1
                  max_calib: -1

The preprocessing and postprocessing plugins that the user wishes to use are configured in the processing section as shown below:

preprocessing:
    transformations:
        - plugin:
              name: resize
              params:
                  dims: 416,416
                  channel_order: RGB
                  type: letterbox

        - plugin:
              name: normalize

postprocessing:
    squash_results: True
    transformations:
        - plugin:
              name: object_detection
              params:
                  dims: 416,416
                  type: letterbox
                  dtypes: [float32, float32, float32, float32]

Metric calculation plugins are configured in the metrics section as shown below.

metrics:
    transformations:
        - plugin:
              name: topk
              params:
                  kval: 1,5
                  softmax_index: 1
                  round: 7
                  label_offset: 1

Plugins that need to be executed for a pipeline stage are listed under ‘transformations’ and preceded by the ‘plugin’ keyword. The following table lists details of each configurable entry for a plugin.

Field

Description

name

Name of the plugin

params

Parameters expected and required by the plugin

A complete list of all plugins and their parameters can be found at Accuracy Evaluator Plugins

Sample Command

qairt-accuracy-evaluator -config {path to configs}/qnn_resnet50_config.yaml

Results

The tool displays a table with quantization options ordered by output match based on the selected verifier and also generates a csv file with the same data. The comparator column shows output match percentage/value based on the selected verifier.The quant params column displays the quantization params used for that run. Other columns also show backend, runtime/compile params used. The information is also stored in a csv file at {work_dir}/metrics-info.csv.

Artifacts associated with each of the configured quantization option are stored at ` {work_dir}/infer/schema{i}_qnn_{backend}_{precision}_{j}`. Model outputs are stored at ` {work_dir}/infer/schema{i}_qnn_{backend}_{precision}_{j}/Result_{k}`.

Note

Snapshot of console log has been added for clarity.

../_static/resources/qnn_acc_eval_output.png

Note

Snapshot of csv file has been added for clarity.

../_static/resources/qnn_acc_eval_csv.png

qnn-architecture-checker (Beta)

Architecture Checker is a tool made for models running with HTP backend, including quantized 8-bit, quantized 16-bit and FP16 models. It outputs a list of issues in the model that keep the model from getting better performance while running on the HTP backend. Architecture checker tool can be invoked with the modifier feature which will apply the recommended modifications for these issues. This will help in visualizing the changes that can be applied to the model to make it a better fit on the HTP backend.

X86-Linux/ WSL Usage:
$ qnn-architecture-checker -i <path>/model.json
                         -b <optional_path>/model.bin
                         -o <optional_output_path>
                         -m <optional_modifier_argument>

X86-Windows/ Windows on Snapdragon Usage:
$ python qnn-architecture-checker -i <path>/model.json
                         -b <optional_path>/model.bin
                         -o <optional_output_path>
                         -m <optional_modifier_argument>

 required arguments:
     -i INPUT_JSON, --input_json INPUT_JSON
                             Path to json file

 optional arguments:
     -b BIN, --bin BIN
                     Path to a bin file
     -o OUTPUT_PATH, --output_path OUTPUT_PATH
                     Path where the output csv should be saved. If not specified, the output csv will be written to the same path as the input file
     -m MODIFY, --modify MODIFY
                     The query to select the modifications to apply.
                         --modify or --modify show - To see all the possible modifications. Display list of rule names and details of the modifications.
                         --modify all - To apply all the possible modifications found for the model.
                         --modify apply=rule_name1,rule_name2 - To apply modifications for specified rule names. The list of rules should be comma separated without spaces
Note:
If running on a quantized model, the quantized model generated with one input image is good enough to satisfy the quantization requirement to have the tool run properly.
QNN_SDK_ROOT environment variable must be configured before running the tool.
Deprecation Note:
The option of enabling architecture checker by passing ‘–arch_checker’ in each converter listed above will be deprecated. E.g: Running qnn-tflite-converter -i <path>/model.tflite -d <network_input_name> <dims> -o <optional_output_path> -p <optional_package_name> –arch_checker will be deprecated.
To enable the Architecture checker, run the converter tool without passing the ‘–arch_checker’ argument, then run the qnn-architecture-checker command to see the architecture checker output.
The usage of “–modify” is only supported with the qnn-architecture-checker command.

The output is a csv file and will be saved as <optional_output_path>/<model_name>_architecture_checker.csv. An example output is shown below:

Graph/Node_name

Issue

Recommendation

Type

Input_tensor_name:[dims]

Output_tensor_name:[dims]

Parameters

Previous node

Next nodes

Modification

Modification_info

1

Graph

This model uses 16-bit activation data. 16-bit activation data takes twice the amount of memory than 8-bit activation data does.

Try to use a smaller datatype to get better performance. E.g., 8-bit

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

2

Node_name_1

The number of channels in the input/output tensor of this convolution node is low (smaller than 32).

Try increasing the number of channels in the input/output tensor to 32 or greater to get better performance.

Conv2d

input_1:[1, 250, 250, 3], __param_1:[5, 5, 3, 32], convolution_0_bias:[32]

output_1:[1, 123, 123, 32]

{‘package’: ‘qti.aisw’, ‘type’: ‘Conv2d’, …}

[‘previous_node_name’]

[‘next_node_name1’, ‘next_node_name2’]

N/A

N/A

How to read the example output csv?
Row 1: This is an issue on the graph, the graph is using 16-bit activation data, as said in the recommendation, changing the activation from 16 bit to 8 bit gives better performance.
Row 2: The issue is on the node with QNN node name as “Node_name_1”. This node has three inputs: input_1, __param_1 and convolution_0_bias where the dimensions are [1, 250, 250, 3], [5, 5, 3, 32] and [32] respectively. This node has one output with QNN tensor name output_1 and the dimension of this tensor is [1, 123, 123, 32]. The type of this node is Conv2d. The previous/next node names and the full set of additional node parameters available in the Parameters column that can be used to locate the node inside the original model. The issue for this node is the channel of the input tensor is low, as the channel is smaller than 32, would recommend to increase the channel to at least 32 to get better performance on HTP backend. Currently the input dimension is [1, 250, 250, 3] and ideally have that to be [1, x, x, 32]. The Modification and Modification_info columns provide details about the modifications applied to the node. If the Architecture Checker is not invoked with modifier or if there aren’t any modifications applicable, then these value will be N/A.
Is the QNN node/tensor name the same in the original model?
It is not the same but should be similar. There is naming sanitization in converter in order to meet the QNN naming standard. The input tensor, output tensor, previous node, next node and all the additional parameters are avaliable in the output csv file to help locate the correct node inside the original model.

Sample Command

qnn-architecture-checker --input_json ./model_net.json
                       --bin ./model.bin
                       --output_path ./archCheckerOutput

Architecture Checker - Model Modifier

For appying modifications to the model, the Architecture Checker can be invoked with “–modify” or “–modify show” which will display a list of possible modifications. In this case, the Architecture Checker tool will only show the rule names and modification detail. It will run without making any changes to the model and generate the csv output. Using the rule names from the above run, the Architecture Checker can be invoked with “–modify all” or “–modify apply=rule_name1,rule_name2”. In this case, the rule specific changes will be applied to the model and the changes can be viewed in the updated model json. Additionally, the output csv will also contain information related to the modifications.

Consider the below csv output generated after applying “–modify apply=elwisediv” modification on an example model.

Graph/Node_name

Issue

Recommendation

Type

Input_tensor_name:[dims]

Output_tensor_name:[dims]

Parameters

Previous node

Next nodes

Modification

Modification_info

1

Node_name_1

ElementWiseDivide usually has poor performance compared to ElementWiseMultiply.

Try replacing ElementWiseDivide with ElementWiseMultiply using the reciprocal value to get better performance.

Eltwise_Binary

input_1:[1, 52, 52, 6], input_2:[1]

output_1:[1, 52, 52, 6]

{‘package’: ‘qti.aisw’, ‘eltwise_type’: ‘ElementWiseDivide’, …}

[‘previous_node_name’]

[‘next_node_name1’, ‘next_node_name2’]

Done

ElementWiseDivide has been replaced by ElementWiseMultiply using the reciprocal value

2

Node_name_2

The number of channels in the input/output tensor of this convolution node is low (smaller than 32).

Try increasing the number of channels in the input/output tensor to 32 or greater to get better performance.

Conv2d

input_3:[1, 250, 250, 3], __param_1:[5, 5, 3, 32], convolution_1_bias:[32]

output_2:[1, 123, 123, 32]

{‘package’: ‘qti.aisw’, ‘type’: ‘Conv2d’, …}

[‘previous_node_name’]

[‘next_node_name1’, ‘next_node_name2’]

N/A

N/A

How to read the example output csv?
Row 1: The issue on the node with QNN node name as “Node_name_1” is that it has element wise divide which gives a poor performance as compared to elementwise multipy. After invoking architecture checker with “–modify apply=elwisediv”, the modifications have been successfully applied i.e. the element wise divide is replaced by element wise multiply with a reciprocal value. This information is available in the Modification and Modification_info columns.
Row 2: The issue on the node with QNN node name as “Node_name_2” is that the node has input tensor with number of channels less than 32. Its recommended to increase the number of channels to 32 or greater for better performance. For this issue, the modification through the tool is not applicable hence the Modification and Modification_info columns are N/A.
After modifying the model, the above run will generate updated model.cpp, model_net.json and/or model.bin along with the csv output. Running the Architecture Checker on the updated model json will no longer show the element wise divide issue on Node_Name_1.

Following are the commands to invoke Architecture Checker with Modifier to display list of modifications:

Sample Command

qnn-architecture-checker --input_json ./model_net.json
                       --bin ./model.bin
                       --output_path ./archCheckerOutput
                       --modify

Sample Command

qnn-architecture-checker --input_json ./model_net.json
                       --bin ./model.bin
                       --output_path ./archCheckerOutput
                       --modify show

Following are the commands to apply the modifications either on all possible modifications or specific rules:

Sample Command

qnn-architecture-checker --input_json ./model_net.json
                       --bin ./model.bin
                       --output_path ./archCheckerOutput
                       --modify all

Sample Command

qnn-architecture-checker --input_json ./model_net.json
                       --bin ./model.bin
                       --output_path ./archCheckerOutput
                       --modify apply=prelu,elwisediv
Note:
The Architecture Checker with modifier is an enchancement to help visualize the changes that can be applied on the model to better fit it on the HTP. To see the actual performance improvements, the model may require retraining/redesigning.

qnn-accuracy-debugger (Beta)

Dependencies

The Accuracy Debugger depends on the setup outlined in Setup. In particular, the following are required:

  1. Platform dependencies are need to be met as per Platform Dependencies

  2. The desired ML frameworks need to be installed. Accuracy debugger is verified to work with the ML framework versions mentioned at Environment Setup

The following environment variables are used inside this guide (User may change the following path depending on their needs):

  1. RESOURCESPATH = {Path to the directory where all models and input files reside}

  2. PROJECTREPOPATH = {Path to your accuracy debugger project directory}

Supported models

The qnn-accuracy-debugger currently supports ONNX, TFLite, and Tensorflow 1.x models. Pytorch models are supported only in oneshot-layerwise debugging algorithm of tool.

Overview

The accuracy-debugger tool finds inaccuracies in a neural-network at the layer level. The tool compares the golden outputs produced by running a model through a specific ML framework (ie. Tensorflow, Onnx, TFlite) with the results produced by running the same model through Qualcomm’s QNN Inference Engine. The inference engine can be run on a variety of computing mediums including GPU, CPU and DSP.

The following features are available in Accuracy Debugger. Each feature can be run with its corresponding option; for example, qnn-accuracy-debugger --{option}.

  1. qnn-accuracy-debugger -–framework_runner This feature uses a ML framework e.g. tensorflow, tflite or onnx, to run the model to get intermediate outputs. Note: The argument –framewok_diagnosis has been replaced by –framework_runner. –framework_diagnosis will be deprecated in the future release.

  2. qnn-accuracy-debugger –-inference_engine This feature uses the QNN engine to run a model to retrieve intermediate outputs.

  3. qnn-accuracy-debugger –-verification This feature compares the output generated by the framework runner and inference engine features using verifiers such as CosineSimilarity, RtolAtol, etc.

  4. qnn-accuracy-debugger –compare_encodings This feature extracts encodings from a given QNN net JSON file, compares them with the given AIMET encodings, and outputs an Excel sheet highlighting mismatches.

  5. qnn-accuracy-debugger –tensor_inspection This feature compares given target outputs with reference outputs.

  6. qnn-accuracy-debugger –quant_checker This feature analyzes the activations, weights, and biases of all the possible quantization options available in the qnn-converters for each subsequent layer of a given model.

Tip:
  • You can use –help after the bin commands to see what other options (required or optional) you can add.

  • If no option is provided, Accuracy Debugger runs framework_runner, inference_engine, and verification sequentially.

Below are the instructons for running the Accuracy Debugger:

Framework Runner

The Framework Runner feature is designed to run models with different machine learning frameworks (e.g. Tensorflow, etc). A selected model is run with a specific ML framework. Golden outputs are produced for future comparison with inference results from the Inference Engine step.

Usage

usage: qnn-accuracy-debugger --framework_runner [-h]
                                   -f FRAMEWORK [FRAMEWORK ...]
                                   -m MODEL_PATH
                                   -i INPUT_TENSOR [INPUT_TENSOR ...]
                                   -o OUTPUT_TENSOR
                                   [-w WORKING_DIR]
                                   [--output_dirname OUTPUT_DIRNAME]
                                   [-v]
                                   [--disable_graph_optimization]
                                   [--onnx_custom_op_lib ONNX_CUSTOM_OP_LIB]
                                   [--add_layer_outputs ADD_LAYER_OUTPUTS]
                                   [--add_layer_types ADD_LAYER_TYPES]
                                   [--skip_layer_types SKIP_LAYER_TYPES]
                                   [--skip_layer_outputs SKIP_LAYER_OUTPUTS]
                                   [--start_layer START_LAYER]
                                   [--end_layer END_LAYER]
                                   [--use_native_output_files]

Script to generate intermediate tensors from an ML Framework.

optional arguments:
     -h, --help            show this help message and exit

required arguments:
     -f FRAMEWORK [FRAMEWORK ...], --framework FRAMEWORK [FRAMEWORK ...]
                             Framework type and version, version is optional. Currently
                             supported frameworks are ["tensorflow","onnx","tflite"] case
                             insensitive but spelling sensitive
     -m MODEL_PATH, --model_path MODEL_PATH
                             Path to the model file(s).
     -i INPUT_TENSOR [INPUT_TENSOR ...], --input_tensor INPUT_TENSOR [INPUT_TENSOR ...]
                             The name, dimensions, raw data, and optionally data
                             type of the network input tensor(s) specifiedin the
                             format "input_name" comma-separated-dimensions path-
                             to-raw-file, for example: "data" 1,224,224,3 data.raw
                             float32. Note that the quotes should always be
                             included in order to handle special characters,
                             spaces, etc. For multiple inputs specify multiple
                             --input_tensor on the command line like:
                             --input_tensor "data1" 1,224,224,3 data1.raw
                             --input_tensor "data2" 1,50,100,3 data2.raw float32.
     -o OUTPUT_TENSOR, --output_tensor OUTPUT_TENSOR
                             Name of the graph's specified output tensor(s).

     optional arguments:
     -w WORKING_DIR, --working_dir WORKING_DIR
                             Working directory for the framework_runner to store
                             temporary files. Creates a new directory if the
                             specified working directory does not exist
     --output_dirname OUTPUT_DIRNAME
                             output directory name for the framework_runner to
                             store temporary files under
                             <working_dir>/framework_runner. Creates a new
                             directory if the specified working directory does not
                             exist
     -v, --verbose           Verbose printing
     --disable_graph_optimization
                             Disables basic model optimization
     --onnx_custom_op_lib ONNX_CUSTOM_OP_LIB
                             path to onnx custom operator library

     (below options are supported only for onnx and ignored for other frameworks)
     --add_layer_outputs ADD_LAYER_OUTPUTS
                     Output layers to be dumped. example:1579,232
     --add_layer_types ADD_LAYER_TYPES
                           outputs of layer types to be dumped. e.g
                           :Resize,Transpose. All enabled by default.
     --skip_layer_types SKIP_LAYER_TYPES
                           comma delimited layer types to skip snooping. e.g
                           :Resize, Transpose
     --skip_layer_outputs SKIP_LAYER_OUTPUTS
                           comma delimited layer output names to skip debugging.
                           e.g :1171, 1174
     --start_layer START_LAYER
                           save all intermediate layer outputs from provided
                           start layer to bottom layer of model
     --end_layer END_LAYER
                           save all intermediate layer outputs from top layer to
                           provided end layer of model
     --use_native_output_files
                           Dumps outputs as per framework model's actual data types.

Please note: All the command line arguments should either be provided through command line or through the config file. They will not override those in the config file if there is overlap.

Sample Commands

qnn-accuracy-debugger \
    --framework_runner \
    --framework tensorflow \
    --model_path $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen.pb \
    --input_tensor "input:0" 1,299,299,3 $RESOURCESPATH/samples/InceptionV3Model/data/chairs.raw \
    --output_tensor InceptionV3/Predictions/Reshape_1:0

qnn-accuracy-debugger \
    --framework_runner \
    --framework onnx \
    --model_path $RESOURCESPATH/samples/dlv3onnx/dlv3plus_mbnet_513-513_op9_mod_basic.onnx \
    --input_tensor Input 1,3,513,513 $RESOURCESPATH/samples/dlv3onnx/data/00000_1_3_513_513.raw \
    --output_tensor Output

To run model with custom operator:
qnn-accuracy-debugger \
    --framework_runner \
    --framework onnx \
    -input_tensor "image" 1,3,640,640 $RESOURCESPATH/models/yolov3/batched-inp-107-0.raw \
    --model_path $RESOURCESPATH/models/yolov3/yolov3_640_640_with_abp_qnms.onnx \
    --output_tensor detection_boxes \
    --onnx_custom_op_lib $RESOURCESPATH/models/libCustomQnmsYoloOrt.so
TIP:
  • a working_directory, if not otherwise specified, is generated from wherever you are calling the script from; it is recommended to call all scripts from the same directory so all your outputs and results are stored under the same directory without having outputs everywhere

  • for tensorflow it is sometimes necessary to add the :0 after the input and output node name to signify the index of the node. Notice the :0 is dropped for onnx models.

Output

The program also creates a directory named latest in working_directory/framework_runner which is symbolically linked to the most recently generated directory. In the example below, latest will have data that is symlinked to the data in the most recent directory YYYY-MM-DD_HH:mm:ss. Users may choose to override the directory name by passing it to –output_dirname (i.e. –output_dirname myTest1Ouput).

The float data produced by the Framework Runner step offers precise reference material for the Verification component to diagnose the accuracy of the network generated by the Inference Engine. Unless a path is otherwise specified, the Accuracy Debugger will create directories within the working_directory/framework_runner directory found in the current working directory. The directories will be named with the date and time of the program’s execution, and contain tensor data. Depending on the tensor naming convention of the model, there may be numerous sub-directories within the new directory. This occurs when tensor names include a slash “/”. For example, for the tensor names ‘inception_3a/1x1/bn/sc’, ‘inception_3a/1x1/bn/sc_internal’ and ‘inception_3a/1x1/bn’, subdirectories will be generated.

../_static/resources/framework_runner.png

The figure above shows a sample output from a framework_runner run. InceptionV3 and Logits contain the outputs of each layer before the last layer. Each output directory contains the .raw files corresponding to each node. Every raw file that can be seen is the output of an operation. The outputs of the final layer are saved inside the Predictions directory. The file framework_runner_options.json contains all the options used to run this feature.

Inference Engine

The Inference Engine feature is designed to find the outputs for a QNN model. The output produced by this step can be compared with the golden outputs produced by the framework runner step.

Usage

usage: qnn-accuracy-debugger --inference_engine [-h]
                            -l INPUT_LIST
                            -r {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,htp}
                            -a {x86_64-linux-clang,aarch64-android,wos-remote,x86_64-windows-msvc,wos}
                            [--stage {source,converted,compiled}]
                            [-i INPUT_TENSOR [INPUT_TENSOR ...]]
                            [-o OUTPUT_TENSOR] [-m MODEL_PATH]
                            [-f FRAMEWORK [FRAMEWORK ...]]
                            [-qmcpp QNN_MODEL_CPP_PATH]
                            [-qmbin QNN_MODEL_BIN_PATH]
                            [-qmb QNN_MODEL_BINARY_PATH] [-p ENGINE_PATH]
                            [-e ENGINE_NAME [ENGINE_VERSION ...]]
                            [--deviceId DEVICEID] [-v]
                            [--host_device {x86,x86_64-windows-msvc,wos}]
                            [-w WORKING_DIR]
                            [--output_dirname OUTPUT_DIRNAME]
                            [--debug_mode_off] [-bbw {8,32}] [-abw {8,16}]
                            [-wbw {8,16}] [-nif] [-nof]
                            [-qo QUANTIZATION_OVERRIDES]
                            [--golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY]
                            [-mn MODEL_NAME] [--args_config ARGS_CONFIG]
                            [--print_version PRINT_VERSION]
                            [--perf_profile {low_balanced,balanced,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}]
                            [--offline_prepare]
                            [--extra_converter_args EXTRA_CONVERTER_ARGS]
                            [--extra_runtime_args EXTRA_RUNTIME_ARGS]
                            [--remote_server REMOTE_SERVER]
                            [--remote_username REMOTE_USERNAME]
                            [--remote_password REMOTE_PASSWORD]
                            [--float_fallback]
                            [--profiling_level {basic,detailed,backend}]
                            [--lib_name LIB_NAME] [-bd BINARIES_DIR]
                            [-pq {tf,enhanced,adjusted,symmetric}]
                            [--act_quantizer {tf,enhanced,adjusted,symmetric}]
                            [--act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}]
                            [--param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}]
                            [--act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}]
                            [--param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}]
                            [--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE]
                            [-fbw {16,32}] [-rqs RESTRICT_QUANTIZATION_STEPS]
                            [--algorithms ALGORITHMS] [--ignore_encodings]
                            [--per_channel_quantization]
                            [--log_level {error,warn,info,debug,verbose}]
                            [--qnn_model_net_json QNN_MODEL_NET_JSON]
                            [--qnn_netrun_config_file QNN_NETRUN_CONFIG_FILE]
                            [--compiler_config COMPILER_CONFIG]
                            [--context_config_params CONTEXT_CONFIG_PARAMS]
                            [--graph_config_params GRAPH_CONFIG_PARAMS]
                            [--start_layer START_LAYER]
                            [--end_layer END_LAYER]
                            [--add_layer_outputs ADD_LAYER_OUTPUTS]
                            [--add_layer_types ADD_LAYER_TYPES]
                            [--skip_layer_types SKIP_LAYER_TYPES]
                            [--skip_layer_outputs SKIP_LAYER_OUTPUTS]
                            [--extra_contextbin_args EXTRA_CONTEXTBIN_ARGS]
                            [--precision {int8,fp16,fp32}]


Script to run QNN inference engine.

options:
  -h, --help            show this help message and exit

Core Arguments:
  --stage {source,converted,compiled}
                        Specifies the starting stage in the Accuracy Debugger
                        pipeline.
                        Source: starting with a source framework.
                        Converted: starting with a model's .cpp and .bin files.
                        Compiled: starting with a model's .so binary
  -l INPUT_LIST, --input_list INPUT_LIST
                        Path to the input list text.
  -r {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,htp}, --runtime {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,htp}
                        Runtime to be used. Please use htp runtime for
                        emulation on x86 host
  -a {x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}, --architecture {x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}
                        Name of the architecture to use for inference engine.

Arguments required for SOURCE stage:
  -i INPUT_TENSOR [INPUT_TENSOR ...], --input_tensor INPUT_TENSOR [INPUT_TENSOR ...]
                        The name, dimension, and raw data of the network input
                        tensor(s) specified in the format "input_name" comma-
                        separated-dimensions path-to-raw-file, for example:
                        "data" 1,224,224,3 data.raw. Note that the quotes
                        should always be included in order to handle special
                        characters, spaces, etc. For multiple inputs specify
                        multiple --input_tensor on the command line like:
                        --input_tensor "data1" 1,224,224,3 data1.raw
                        --input_tensor "data2" 1,50,100,3 data2.raw.
  -o OUTPUT_TENSOR, --output_tensor OUTPUT_TENSOR
                        Name of the graph's output tensor(s).
  -m MODEL_PATH, --model_path MODEL_PATH
                        Path to the model file(s).
  -f FRAMEWORK [FRAMEWORK ...], --framework FRAMEWORK [FRAMEWORK ...]
                        Framework type to be used, followed optionally by
                        framework version.

Arguments required for CONVERTED stage:
  -qmcpp QNN_MODEL_CPP_PATH, --qnn_model_cpp_path QNN_MODEL_CPP_PATH
                        Path to the qnn model .cpp file
  -qmbin QNN_MODEL_BIN_PATH, --qnn_model_bin_path QNN_MODEL_BIN_PATH
                        Path to the qnn model .bin file

Arguments required for COMPILED stage:
  -qmb QNN_MODEL_BINARY_PATH, --qnn_model_binary_path QNN_MODEL_BINARY_PATH
                        Path to the qnn model .so binary.

Optional Arguments:
  -p ENGINE_PATH, --engine_path ENGINE_PATH
                        Path to the inference engine.
  -e ENGINE_NAME [ENGINE_VERSION ...], --engine ENGINE_NAME [ENGINE_VERSION ...]
                        Name of engine that will be running inference,
                        optionally followed by the engine version. Used here
                        for tensor_mapping.
  --deviceId DEVICEID   The serial number of the device to use. If not
                        available, the first in a list of queried devices will
                        be used for validation.
  -v, --verbose         Verbose printing
  --host_device {x86,x86_64-windows-msvc,wos}
                        The device that will be running conversion. Set to x86
                        by default.
  -w WORKING_DIR, --working_dir WORKING_DIR
                        Working directory for the inference_engine to store
                        temporary files. Creates a new directory if the
                        specified working directory does not exist
  --output_dirname OUTPUT_DIRNAME
                        output directory name for the inference_engine to
                        store temporary files under
                        <working_dir>/inference_engine .Creates a new
                        directory if the specified working directory does not
                        exist
  --debug_mode_off      Specifies if wish to turn off debug_mode mode.
  -bbw {8,32}, --bias_bitwidth {8,32}
                        option to select the bitwidth to use when quantizing
                        the bias. default 8
  -abw {8,16}, --act_bitwidth {8,16}
                        option to select the bitwidth to use when quantizing
                        the activations. default 8
  -wbw {8,16}, --weights_bitwidth {8,16}
                        option to select the bitwidth to use when quantizing
                        the weights. default 8
  -nif, --use_native_input_files
                        Specifies that the input files will be parsed in the
                        data type native to the graph. If not specified, input
                        files will be parsed in floating point.
  -nof, --use_native_output_files
                        Specifies that the output files will be generated in
                        the data type native to the graph. If not specified,
                        output files will be generated in floating point.
  -qo QUANTIZATION_OVERRIDES, --quantization_overrides QUANTIZATION_OVERRIDES
                        Path to quantization overrides json file.
  --golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY, --golden_dir_for_mapping GOLDEN_OUTPUT_REFERENCE_DIRECTORY
                        Optional parameter to indicate the directory of the
                        goldens, it's used for tensor mapping without
                        framework.
  -mn MODEL_NAME, --model_name MODEL_NAME
                        Name of the desired output sdk specific model
  --args_config ARGS_CONFIG
                        Path to a config file with arguments. This can be used
                        to feed arguments to the AccuracyDebugger as an
                        alternative to supplying them on the command line.
  --print_version PRINT_VERSION
                        Print the QNN SDK version alongside the output.
  --perf_profile {low_balanced,balanced,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}
  --offline_prepare     Use offline prepare to run QNN model.
  --extra_converter_args EXTRA_CONVERTER_ARGS
                        additional converter arguments in a quoted string.
                        example: --extra_converter_args 'input_dtype=data
                        float;input_layout=data1 NCHW'
  --extra_runtime_args EXTRA_RUNTIME_ARGS
                        additional net runner arguments in a quoted string.
                        example: --extra_runtime_args
                        'arg1=value1;arg2=value2'
  --remote_server REMOTE_SERVER
                        ip address of remote machine
  --remote_username REMOTE_USERNAME
                        username of remote machine
  --remote_password REMOTE_PASSWORD
                        password of remote machine
  --float_fallback      Use this option to enable fallback to floating point
                        (FP) instead of fixed point. This option can be paired
                        with --float_bitwidth to indicate the bitwidth for FP
                        (by default 32). If this option is enabled, then input
                        list must not be provided and --ignore_encodings must
                        not be provided. The external quantization encodings
                        (encoding file/FakeQuant encodings) might be missing
                        quantization parameters for some interim tensors.
                        First it will try to fill the gaps by propagating
                        across math-invariant functions. If the quantization
                        params are still missing, then it will apply fallback
                        to nodes to floating point.
  --profiling_level {basic,detailed,backend}
                        Enables profiling and sets its level.
  --lib_name LIB_NAME   Name to use for model library (.so file or .dll file)
  -bd BINARIES_DIR, --binaries_dir BINARIES_DIR
                        Directory to which to save model binaries, if they
                        don't yet exist.
  -pq {tf,enhanced,adjusted,symmetric}, --param_quantizer {tf,enhanced,adjusted,symmetric}
                        Param quantizer algorithm used.
  --act_quantizer {tf,enhanced,adjusted,symmetric}
                        Optional parameter to indicate the activation
                        quantizer to use
  --act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}
                        Specify which quantization calibration method to use
                        for activations. This option has to be paired with
                        --act_quantizer_schema.
  --param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}
                        Specify which quantization calibration method to use
                        for parameters. This option has to be paired with
                        --param_quantizer_schema.
  --act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}
                        Specify which quantization schema to use for
                        activations. Can not be used together with
                        act_quantizer. Note: This argument mandates
                        --act_quantizer_calibration to be passed
  --param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}
                        Specify which quantization schema to use for
                        parameters. Can not be used together with
                        param_quantizer. Note: This argument mandates
                        --param_quantizer_calibration to be passed
  --percentile_calibration_value PERCENTILE_CALIBRATION_VALUE
                        Value must lie between 90-100
  -fbw {16,32}, --float_bias_bitwidth {16,32}
                        option to select the bitwidth to use when biases are
                        in float. default 32
  -rqs RESTRICT_QUANTIZATION_STEPS, --restrict_quantization_steps RESTRICT_QUANTIZATION_STEPS
                        ENCODING_MIN, ENCODING_MAX Specifies the number of
                        steps to use for computing quantization encodings such
                        that scale = (max - min) / number of quantization
                        steps. The option should be passed as a space
                        separated pair of hexadecimal string minimum and
                        maximum values. i.e. --restrict_quantization_steps
                        'MIN MAX'. Please note that this is a hexadecimal
                        string literal and not a signed integer, to supply a
                        negative value an explicit minus sign is required.
                        E.g.--restrict_quantization_steps '-0x80 0x7F'
                        indicates an example 8 bit range,
                        --restrict_quantization_steps '-0x8000 0x7F7F'
                        indicates an example 16 bit range.
  --algorithms ALGORITHMS
                        Use this option to enable new optimization algorithms.
                        Usage is: --algorithms <algo_name1> ... The available
                        optimization algorithms are: 'cle ' - Cross layer
                        equalization includes a number of methods for
                        equalizing weights and biases across layers in order
                        to rectify imbalances that cause quantization errors.
  --ignore_encodings    Use only quantizer generated encodings, ignoring any
                        user or model provided encodings.
  --per_channel_quantization
                        Use per-channel quantization for convolution-based op
                        weights.
  --log_level {error,warn,info,debug,verbose}
                        Enable verbose logging.
  --qnn_model_net_json QNN_MODEL_NET_JSON
                        Path to the qnn model net json. Only necessary if it's
                        being run from the converted stage. It has information
                        about what structure the data is in within the
                        framework_runner and inference_engine steps. This file
                        is required to generate the model_graph_struct.json
                        file which is used by the verification stage.
  --qnn_netrun_config_file QNN_NETRUN_CONFIG_FILE
                        allow backend_extention features to be applied during
                        qnn-net-run
  --compiler_config COMPILER_CONFIG
                        Path to the compiler config file.
  --context_config_params CONTEXT_CONFIG_PARAMS
                        optional context config params in a quoted string.
                        example: --context_config_params
                        'context_priority=high;
                        cache_compatibility_mode=strict'
  --graph_config_params GRAPH_CONFIG_PARAMS
                        optional graph config params in a quoted string.
                        example: --graph_config_params 'graph_priority=low;
                        graph_profiling_num_executions=10'
  --start_layer START_LAYER
                        save all intermediate layer outputs from provided
                        start layer to bottom layer of model. Can be used in
                        conjunction with --end_layer.
  --end_layer END_LAYER
                        save all intermediate layer outputs from top layer to
                        provided end layer of model. Can be used in
                        conjunction with --start_layer.
  --add_layer_outputs ADD_LAYER_OUTPUTS
                        Output layers to be dumped. example:1579,232
  --add_layer_types ADD_LAYER_TYPES
                        outputs of layer types to be dumped. e.g
                        :Resize,Transpose. All enabled by default.
  --skip_layer_types SKIP_LAYER_TYPES
                        comma delimited layer types to skip snooping. e.g
                        :Resize, Transpose
  --skip_layer_outputs SKIP_LAYER_OUTPUTS
                        comma delimited layer output names to skip debugging.
                        e.g :1171, 1174
  --extra_contextbin_args EXTRA_CONTEXTBIN_ARGS
                        additional context binary generator arguments in a
                        quoted string. example: --extra_contextbin_args
                        'arg1=value1;arg2=value2'
  --precision {int8,fp16,fp32}
                        Choose the precision. Default is int8. Note: This
                        option is not applicable when --stage is set to
                        converted or compiled.

 Please note: All the command line arguments should either be provided through command line or through the config file. They will not override those in the config file if there is overlap.

The inference engine config file can be found in {accuracy_debugger tool root directory}/python/qti/aisw/accuracy_debugger/lib/inference_engine/configs/config_files and is a JSON file. This config file stores information that helps the inference engine determine which tool and parameters to read in.

Sample Command

qnn-accuracy-debugger \
    --inference_engine \
    --framework tensorflow \
    --runtime dspv73 \
    --model_path $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen.pb \
    --input_tensor "input:0" 1,299,299,3 $RESOURCESPATH/samples/InceptionV3Model/data/chairs.raw \
    --output_tensor InceptionV3/Predictions/Reshape_1 \
    --architecture x86_64-linux-clang \
    --input_list $RESOURCESPATH/samples/InceptionV3Model/data/image_list.txt \
    --verbose

Sample Command

qnn-accuracy-debugger \
    --inference_engine \
    --framework tensorflow \
    --runtime dspv73 \
    --host_device wos \
    --model_path <RESOURCESPATH>\InceptionV3Model\inception_v3_2016_08_28_frozen.pb \
    --input_tensor "input:0" 1,299,299,3 <RESOURCESPATH>\samples\InceptionV3Model\data\chairs.raw \
    --output_tensor InceptionV3\Predictions\Reshape_1 \
    --architecture wos \
    --input_list <RESOURCESPATH>\samples\InceptionV3Model\data\image_list.txt \
    --verbose

Sample Command

qnn-accuracy-debugger \
    --inference_engine \
    --framework tensorflow \
    --runtime cpu \
    --host_device x86_64-windows-msvc \
    --model_path <RESOURCESPATH>\InceptionV3Model\inception_v3_2016_08_28_frozen.pb \
    --input_tensor "input:0" 1,299,299,3 <RESOURCESPATH>\samples\InceptionV3Model\data\chairs.raw \
    --output_tensor InceptionV3\Predictions\Reshape_1 \
    --architecture x86_64-windows-msvc \
    --input_list <RESOURCESPATH>\samples\InceptionV3Model\data\image_list.txt \
    --verbose

Sample Command

qnn-accuracy-debugger \
    --inference_engine \
    --deviceId 357415c4 \
    --framework tensorflow \
    --runtime dspv73 \
    --architecture aarch64-android \
    --model_path $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen.pb \
    --input_tensor "input:0" 1,299,299,3 $RESOURCESPATH/samples/InceptionV3Model/data/chairs.raw \
    --output_tensor InceptionV3/Predictions/Reshape_1 \
    --input_list $RESOURCESPATH/samples/InceptionV3Model/data/image_list.txt \
    --verbose
Tip:
  • for runtime (choose from ‘cpu’, ‘gpu’, ‘dsp’, ‘dspv65’, ‘dspv66’, ‘dspv68’, ‘dspv69’, ‘dspv73’, ‘htp’). Make sure the runtime is 73 for kailua, 69 for waipio, etc. Choose HTP runtime for emulation on x86 host.

  • the input_tensor (–i) and output_tensor (-o) does not need the :0 indexing like when runing tensorflow framework runner

  • two files, namely tensor_mapping.json and qnn_model_graph_struct.json are generated to be used in verification, be sure to locate these 2 files in the working_directory/inference_engine/latest

  • Before running the qnn-accuracy-debugger on a Windows x86 system/Windows on Snapdragon system, ensure that you have configured the environment. And, Specify the host and target machine as x86_64-windows-msvc/wos respectively.

  • Note that qnn-accuracy-debugger on Windows x86 system is tested only for CPU runtime currently.

More example commands running from different stages:

Sample Command

source file stage: same as example from above section (stage default is "source")

running from converted stage (x86):
qnn-accuracy-debugger \
    --inference_engine \
    --stage converted \
    -qmcpp $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen_qnn_model.cpp \
    -qmbin $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen_qnn_model.bin \
    --runtime dspv73 \
    --architecture x86_64-linux-clang \
    --input_list $RESOURCESPATH/samples/InceptionV3Model/data/image_list.txt \
    --qnn_model_net_json $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen_qnn_model_net.json \
    --verbose \
    --framework tensorflow \
    --golden_output_reference_directory $RESOURCESPATH/samples/InceptionV3Model/golden_from_framework_runner/

Android Devices (ie. MTP):
qnn-accuracy-debugger \
    --inference_engine \
    --stage converted \
    -qmcpp $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen_qnn_model.cpp \
    -qmbin $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen_qnn_model.bin \
    --deviceId f366ce60 \
    --runtime dspv73 \
    --architecture aarch64-android \
    --input_list $RESOURCESPATH/samples/InceptionV3Model/data/image_list.txt \
    --qnn_model_net_json $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen_qnn_model_net.json \
    --verbose \
    --framework tensorflow \
    --golden_output_reference_directory $RESOURCESPATH/samples/InceptionV3Model/golden_from_framework_runner/


running in compiled stage (x86):

qnn-accuracy-debugger \
    --inference_engine \
    --stage compiled \
    --qnn_model_binary $RESOURCESPATH/samples/InceptionV3Model/qnn_model_binaries/x86_64-linux-clang/libqnn_model.so \
    --runtime dspv73 \
    --architecture x86_64-linux-clang \
    --input_list $RESOURCESPATH/samples/InceptionV3Model/data/image_list.txt \
    --verbose \
    --qnn_model_net_json $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen_qnn_model_net.json \
    --golden_output_reference_directory $RESOURCESPATH/samples/InceptionV3Model/golden_from_framework_runner/

running in compiled stage (wos):

qnn-accuracy-debugger \
    --inference_engine \
    --stage compiled \
    --qnn_model_binary <RESOURCESPATH>\samples\InceptionV3Model\qnn_model_binaries\x86_64-linux-clang\libqnn_model.so \
    --runtime dspv73 \
    --architecture wos \
    --input_list <RESOURCESPATH>\samples\InceptionV3Model\data\image_list.txt \
    --verbose \
    --qnn_model_net_json <RESOURCESPATH>\samples\InceptionV3Model\inception_v3_2016_08_28_frozen_qnn_model_net.json \
    --golden_output_reference_directory <RESOURCESPATH>\samples\InceptionV3Model\golden_from_framework_runner\

Android devices (ie MTP):
qnn-accuracy-debugger \
    --inference_engine \
    --stage compiled \
    --qnn_model_binary $RESOURCESPATH/samples/InceptionV3Model/qnn_model_binaries/aarch64-android/libqnn_model.so \
    --runtime dspv73 \
    --architecture aarch64-android \
    --input_list $RESOURCESPATH/samples/InceptionV3Model/data/image_list.txt \
    --verbose \
    --qnn_model_net_json $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen_qnn_model_net.json \
    --framework tensorflow \
    --golden_output_reference_directory $RESOURCESPATH/samples/InceptionV3Model/golden_from_framework_runner/

To run onnx model with custom operator:
qnn-accuracy-debugger \
    --inference_engine \
    --framework onnx \
    --runtime dspv75
    --architecture aarch64_android \
    --model_path $RESOURCESPATH/AISW-77095/model.onnx \
    --input_tensor "image" 1,3,640,1794 $RESOURCESPATH/inputs/image.raw \
    --output_tensor uncertainty_jacobian_bb \
    --input_list $RESOURCESPATH/input_list.txt \
    --default_verifier mse \
    --engine QNN \
    --engine_path $QNN_SDK_ROOT \
    --extra_converter_args 'op_package_config=$RESOURCESPATH/CustomPreTopKOpPackageCPU_v2.xml;op_package_lib=$RESOURCESPATH/libCustomPreTopKOpPackageHtp.so:CustomPreTopKOpPackageHtpInterfaceProvider:' \
    --extra_contextbin_args 'op_packages=$RESOURCESPATH/libQnnCustomPreTopKOpPackageHtp.so:CustomPreTopKOpPackageHtpInterfaceProvider:' \
    --extra_runtime_args 'op_packages=$RESOURCESPATH/AISW-77095/libQnnCustomPreTopKOpPackageHtp_v75.so:CustomPreTopKOpPackageHtpInterfaceProvider' \
    --debug_mode_off \
    --offline_prepare \
    --verbose
Tip:
  • The qnn_model_net_json file is not required to run this step. However, it is needed to build the qnn_model_graph_struct.json, which can be used in the Verification step. The model_net.json file is generated when the original model is converted into a converted model. Hence if you are debugging this model from the converted model stage, it is recommended to ask for this model_net.json file.

  • framework and golden_dir_for_mapping, or just golden_dir_for_mapping itself is an alternative to the original model to be provided to generate the tensor_mapping.json. However, providing only the golden_dir_for_mapping, the get_tensor_mapping module will try its best to map but it is not guaranteed this mapping will be 100% accurate.

Output

Once the inference engine has finished running, it will store the output in the specified directory (or the current working directory by default) and store the files in that directory. By default, it will store the output in working_directory/inference_engine in the current working directory.

../_static/resources/inference_engine.png

The figure above shows the sample output from one of the runs of inference engine step. The output directory contains raw files. Each raw file is an output of an operation in the network. The model.bin and model.cpp files are created by the model converter. The qnn_model_binaries directory contains the .so file that is generated by the modellibgenerator utility. The file image_list.txt contains the path for sample test images. The inference_engine_options.json file contains all the options with which this run was launched. In addition to generating the .raw files, the inference_engine also generates the model’s graph structure in a .json file. The name of the file is the same as the name of the protobuf model file. The model_graph_struct.json aids in providing structure related information of the converted model graph during the verification step. Specifically, it helps with organizing the nodes in order (for i.e. the beginning nodes should come earlier than ending nodes). The model_net.json has information about what structure the data is in within the framework_runner and inference_engine steps (data can be in different formats for e.g. channels first vs channels last). The verification step uses this information so that data can be properly transposed and compared. It is an optional parameter which can be provided during inference engine step for generating the model_graph_struct.json file (mandated only when running inference engine from the converted stage). Finally, the tensor_mapping file contains a mapping of the various intermediate output file names generated from the framework runner step and the inference engine step.

../_static/resources/inference_engine_2.png

The created .raw files are organized in the same manner as framework_runner (see above).

Verification

The Verification step compares the output (from the intermediate tensors of a given model) produced by the framework runner step with the output produced by the inference engine step. Once the comparison is complete, the verification results are compiled and displayed visually in a format that can be easily interpreted by the user.

There are different types of verifiers for e.g.: CosineSimilarity, RtolAtol, etc. To see available verifiers please use the –help option (qnn-accuracy-debugger –verification –help). Each verifier compares the Framework Runner and Inference Engine output using an error metric. It also prepares reports and/or visualizations to help the user analyze the network’s error data.

Usage

usage: qnn-accuracy-debugger --verification [-h]
                              --default_verifier DEFAULT_VERIFIER
                              [DEFAULT_VERIFIER ...]
                              --golden_output_reference_directory
                              GOLDEN_OUTPUT_REFERENCE_DIRECTORY
                              --inference_results INFERENCE_RESULTS
                              [--tensor_mapping TENSOR_MAPPING]
                              [--qnn_model_json_path QNN_MODEL_JSON_PATH]
                              [--dlc_path DLC_PATH]
                              [--verifier_config VERIFIER_CONFIG]
                              [--graph_struct GRAPH_STRUCT] [-v]
                              [-w WORKING_DIR]
                              [--output_dirname OUTPUT_DIRNAME]
                              [--args_config ARGS_CONFIG]
                              [--target_encodings TARGET_ENCODINGS]
                              [-e ENGINE [ENGINE ...]]
                              [--use_native_output_files]
                              [--disable_layout_transform]

Script to run verification.

required arguments:
  --default_verifier DEFAULT_VERIFIER [DEFAULT_VERIFIER ...]
                        Default verifier used for verification. The options
                        "RtolAtol", "AdjustedRtolAtol", "TopK", "L1Error",
                        "CosineSimilarity", "MSE", "MAE", "SQNR", "ScaledDiff"
                        are supported. An optional list of hyperparameters can
                        be appended. For example: --default_verifier
                        rtolatol,rtolmargin,0.01,atolmargin,0.01 An optional
                        list of placeholders can be appended. For example:
                        --default_verifier CosineSimilarity param1 1 param2 2.
                        to use multiple verifiers, add additional
                        --default_verifier CosineSimilarity
  --golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY, --framework_results GOLDEN_OUTPUT_REFERENCE_DIRECTORY
                        Path to root directory of golden output files. Paths
                        may be absolute, or relative to the working directory.
  --inference_results INFERENCE_RESULTS
                        Path to root directory generated from inference engine
                        diagnosis. Paths may be absolute, or relative to the
                        working directory.

optional arguments:
  --tensor_mapping TENSOR_MAPPING
                        Path to the file describing the tensor name mapping
                        between inference and golden tensors.
  --qnn_model_json_path QNN_MODEL_JSON_PATH
                        Path to the qnn model net json, used for transforming
                        axis of golden outputs w.r.t to qnn outputs. Note:
                        Applicable only for QNN
  --dlc_path DLC_PATH   Path to the dlc file, used for transforming axis of
                        golden outputs w.r.t to target outputs. Note:
                        Applicable for QAIRT/SNPE
  --verifier_config VERIFIER_CONFIG
                        Path to the verifiers' config file
  --graph_struct GRAPH_STRUCT
                        Path to the inference graph structure .json file. This
                        file aids in providing structure related information
                        of the converted model graph during this stage.Note:
                        This file is mandatory when using ScaledDiff verifier
  -v, --verbose         Verbose printing
  -w WORKING_DIR, --working_dir WORKING_DIR
                        Working directory for the verification to store
                        temporary files. Creates a new directory if the
                        specified working directory does not exist
  --output_dirname OUTPUT_DIRNAME
                        output directory name for the verification to store
                        temporary files under <working_dir>/verification.
                        Creates a new directory if the specified working
                        directory does not exist
  --args_config ARGS_CONFIG
                        Path to a config file with arguments. This can be used
                        to feed arguments to the AccuracyDebugger as an
                        alternative to supplying them on the command line.
  --target_encodings TARGET_ENCODINGS
                        Path to target encodings json file.
  --use_native_output_files
                        Loads given outputs as per framework model's actual data types.
  --disable_layout_transform
                        Disables layout transformation of Target outputs. This
                        option has to be used used when Golden/Framework
                        outputs and Target outputs are already in the same
                        layout.

Arguments for generating Tensor mapping (required when --tensor_mapping is not specified):
  -e ENGINE [ENGINE ...], --engine ENGINE [ENGINE ...]
                        Name of engine(qnn/snpe) that is used for running
                        inference.

 Please note: All the command line arguments should either be provided through command line or through the config file. They will not override those in the config file if there is overlap.

The main verification process run using qnn-accuracy-debugger –verification optionally uses –tensor_mapping and –graph_struct to find files to compare. These files are generated by the inference engine step, and should be supplied to verification for best results. By default they are named tensor_mapping.json and {model name}_graph_struct.json, and can be found in the output directory of the inference engine results.

Sample Command

# Compare output of framework runner with inference engine:

qnn-accuracy-debugger \
     --verification \
     --default_verifier CosineSimilarity \
     --default_verifier mse \
     --golden_output_reference_directory $PROJECTREPOPATH/working_directory/framework_runner/2022-10-31_17-07-58/ \
     --inference_results $PROJECTREPOPATH/working_directory/inference_engine/latest/output/Result_0/ \
     --tensor_mapping $PROJECTREPOPATH/working_directory/inference_engine/latest/tensor_mapping.json \
     --graph_struct $PROJECTREPOPATH/working_directory/inference_engine/latest/qnn_model_graph_struct.json \
     --qnn_model_json_path $PROJECTREPOPATH/working_directory/inference_engine/latest/qnn_model_net.json
# Compare outputs of two different inference engine outputs:

qnn-accuracy-debugger \
     --verification \
     --default_verifier mse \
     --golden_output_reference_directory $PROJECTREPOPATH/working_directory/framework_runner/2022-10-31_17-07-58/ \
     --inference_results $PROJECTREPOPATH/working_directory/inference_engine/latest/output/Result_0/ \
     --graph_struct $PROJECTREPOPATH/working_directory/inference_engine/latest/qnn_model_graph_struct.json \
     --disable_layout_transform
Tip:
  • If you passed multiple images in the image_list.txt from run inference engine diagnosis, you’ll receive multiple output/Result_x, choose result that matches the input you used for framework runner for comparison (ie. in framework you used chair.raw and inference chair.raw was the first item in the image_list.txt then choose output/Result_0, if chair.raw was the second item in image_list.txt, then choose output/Result_1).

  • It is recommended to always supply ‘graph_struct’ and ‘tensor_mapping’ to the command as it is used to line up the report and find the corresponding files for comparison. if tensor_mapping did not get generated from previous steps, you can supplement with ‘model_path’, ‘engine’, ‘framework’ to have module generate ‘tensor_mapping’ during runtime.

  • You can also compare inference_engine outputs to inference_engine outputs by passing the /output of the inference_engine output to the ‘framework_results’. If you want the outputs to be exact-name-matching, then you do not need to provide a tensor_mapping file.

  • Note that if you need to generate a tensor mapping instead of providing a path to prexisting tensor mapping file. You can provide the ‘model_path’ option.

Verifier uses two optional config files. The first file is used to set parameters for specific verifiers, as well as which tensors to use these verifiers on. The second file is used to map tensor names from framework_runner to the inference_engine, since certain tensors generated by framework_runner may have different names than tensors generated by inference_engine.

Verifier Config:

The verifier config file is a JSON file that tells verification which verifiers (asides from the default verifier) to use and with which parameters and on what specific tensors. If no config file is provided, the tool will only use the default verifier specified from the command line, with its default parameters, on all the tensors. The JSON file is keyed by verifier names, with each verifier as its own dictionary keyed by “parameters” and “tensors”.

Config File

```json
{
    "MeanIOU": {
        "parameters": {
            "background_classification": 1.0
        },
        "tensors": [["Postprocessor/BatchMultiClassNonMaxSuppression_boxes", "detection_classes:0"]]
    },
    "TopK": {
        "parameters": {
            "k": 5,
            "ordered": false
        },
        "tensors": [["Reshape_1:0"], ["detection_classes:0"]]
    }
}
```

Note that the “tensors” field is a list of lists. This is done because specific verifiers runs on two tensor at a time. Hence the two tensors are placed in a list. Otherwise if a verifier only runs on one tensor, it will have a list of lists with only one tensor name in each list. MeanIOU is not supported as verifer in Debugger.

Tensor Mapping:

Tensor mapping is a JSON file keyed by inference tensor names, of framework tensor names. If the tensor mapping is not provided, the tool will assume inference and golden tensor names are identical.

Tensor Mapping File

```json
{
    "Postprocessor/BatchMultiClassNonMaxSuppression_boxes": "detection_boxes:0",
    "Postprocessor/BatchMultiClassNonMaxSuppression_scores": "detection_scores:0"
}
```

Output

Verification’s output is divided into different verifiers. For example, if both RtolAtol and TopK verifiers are used, there will be two sub-directories named “RtolAtol” and “TopK”. Availble verifiers can be found by issuing –help option.

../_static/resources/verification_2.png

Under each sub-directory, the verification analysis for each tensor is organized similar to how framework_runner (see above) and inference_engine are organized. For each tensor, a CSV and HTML file is generated. In addition to the tensor-specific analysis, the tool also generates a summary CSV and HTML file which summarizes the data from all verifiers and their subsequent tensors. The following figure shows how a sample summary generated in the verification step looks. Each row in this summary corresponds to one tensor name that is identified by the framework runner and inference engine steps. The final column shows cosinesimilarity score which can vary between 0 to 1 (this range might be different for other verifiers). Higher scores denote similarity while lower scores indicate variance. The developer can then further investigate those specific tensor details. Developer should inspect tensors from top-to-bottom order, meaning if a tensor is broken at an earlier node, anything that was generated post that node is unreliable until that node is properly fixed.

../_static/resources/verification_results.png

Compare Encodings

The Compare Encodings feature is designed to compare QNN and AIMET encodings. This feature takes QNN model net and AIMET encoding JSON files as inputs. This feature executes in the following order.

  1. Extracts encodings from the given QNN model net JSON.

  2. Compares extracted QNN encodings with given AIMET encodings.

  3. Writes results to an Excel file that highlights mismatches.

  4. Throws warnings if some encodings are present in QNN but not in AIMET and vice-versa.

  5. Writes the extracted QNN encodings JSON file (for reference).

Usage

usage: qnn-accuracy-debugger --compare_encodings [-h]
                             --input INPUT
                             --aimet_encodings_json AIMET_ENCODINGS_JSON
                             [--precision PRECISION]
                             [--params_only]
                             [--activations_only]
                             [--specific_node SPECIFIC_NODE]
                             [--working_dir WORKING_DIR]
                             [--output_dirname OUTPUT_DIRNAME]
                             [-v]

Script to compare QNN encodings with AIMET encodings

optional arguments:
  -h, --help            Show this help message and exit

required arguments:
  --input INPUT
                        Path to QNN model net JSON file
  --aimet_encodings_json AIMET_ENCODINGS_JSON
                        Path to AIMET encodings JSON file

optional arguments:
  --precision PRECISION
                        Number of decimal places up to which comparison will be done (default: 17)
  --params_only         Compare only parameters in the encodings
  --activations_only    Compare only activations in the encodings
  --specific_node SPECIFIC_NODE
                        Display encoding differences for the given node
  --working_dir WORKING_DIR
                        Working directory for the compare_encodings to store temporary files.
                        Creates a new directory if the specified working directory does not exist.
  --output_dirname OUTPUT_DIRNAME
                        Output directory name for the compare_encodings to store temporary files
                        under <working_dir>/compare_encodings. Creates a new directory if the
                        specified working directory does not exist.
  -v, --verbose         Verbose printing

Sample Commands

# Compare both params and activations
qnn-accuracy-debugger \
    --compare_encodings \
    --input QNN_model_net.json \
    --aimet_encodings_json aimet_encodings.json

# Compare only params
qnn-accuracy-debugger \
    --compare_encodings \
    --input QNN_model_net.json \
    --aimet_encodings_json aimet_encodings.json \
    --params_only

# Compare only activations
qnn-accuracy-debugger \
    --compare_encodings \
    --input QNN_model_net.json \
    --aimet_encodings_json aimet_encodings.json \
    --activations_only

# Compare only a specific encoding
qnn-accuracy-debugger \
    --compare_encodings \
    --input QNN_model_net.json \
    --aimet_encodings_json aimet_encodings.json \
    --specific_node _2_22_Conv_output_0

Tip

A working_directory is generated from wherever this script is called from unless otherwise specified.

Output

The program creates a directory named latest in working_directory/compare_encodings which is symbolically linked to the most recently generated directory. In the example below, latest will have data that is symlinked to the data in the most recent directory YYYY-MM-DD_HH:mm:ss. Users may choose to override the directory name by passing it to –output_dirname, e.g., –output_dirname myTest.

../_static/resources/compare_encodings.png

The figure above shows a sample output from a compare_encodings run. The following details what each file contains.

  • compare_encodings_options.json contains all the options used to run this feature

  • encodings_diff.xlsx contains comparison results with mismatches highlighted

  • log.txt contains log statements for the run

  • extracted_encodings.json contains extracted QNN encodings

Tensor inspection

Tensor inspection compares given reference output and target output tensors and dumps various statistics to represent differences between them.

The Tensor inspection feature can:

  1. Plot histograms for golden and target tensors

  2. Plot a graph indicating deviation between golden and target tensors

  3. Plot a cumulative distribution graph (CDF) for golden vs target tensors

  4. Plot a density (KDE) graph for target tensor highlighting target min/max and calibrated min/max values

  5. Create a CSV file containing information about: target min/max; calibrated min/max; golden output min/max; target/calibrated min/max differences; and computed metrics (verifiers).

Note

Only data with matching target/golden filenames is inspected; other data is ignored.
This feature expects the golden and target tensors to have the same dimensions, datatypes, and layouts.
Calibrated min/max values are extracted from a user provided encodings file. If an encodings file is not provided, density plot will be skipped and also the CSV summary output will not include calibrated min/max information.

Usage

usage: qnn-accuracy-debugger --tensor_inspection [-h]
                        --golden_data GOLDEN_DATA
                        --target_data TARGET_DATA
                        --verifier VERIFIER [VERIFIER ...]
                        [-w WORKING_DIR]
                        [--data_type {int8,uint8,int16,uint16,float32}]
                        [--target_encodings TARGET_ENCODINGS]
                        [-v]

Script to inspection tensor.

required arguments:
  --golden_data GOLDEN_DATA
                        Path to golden/framework outputs folder. Paths may be absolute or
                        relative to the working directory.
  --target_data TARGET_DATA
                        Path to target outputs folder. Paths may be absolute or relative to the
                        working directory.
  --verifier VERIFIER [VERIFIER ...]
                        Verifier used for verification. The options "RtolAtol",
                        "AdjustedRtolAtol", "TopK", "L1Error", "CosineSimilarity", "MSE", "MAE",
                        "SQNR", "ScaledDiff" are supported.
                        An optional list of hyperparameters can be appended, for example:
                        --verifier rtolatol,rtolmargin,0.01,atolmargin,0,01.
                        To use multiple verifiers, add additional --verifier CosineSimilarity

optional arguments:
  -w WORKING_DIR, --working_dir WORKING_DIR
                        Working directory to save results. Creates a new directory if the
                        specified working directory does not exist
  --data_type {int8,uint8,int16,uint16,float32}
                        DataType of the output tensor.
  --target_encodings TARGET_ENCODINGS
                        Path to target encodings json file.
  -v, --verbose         Verbose printing

Sample Commands

# Basic run
qnn-accuracy-debugger --tensor_inspection \
    --golden_data golden_tensors_dir \
    --target_data target_tensors_dir \
    --verifier sqnr

# Pass target encodings file and enable multiple verifiers
qnn-accuracy-debugger --tensor_inspection \
    --golden_data golden_tensors_dir \
    --target_data target_tensors_dir \
    --verifier mse \
    --verifier sqnr \
    --verifier rtolatol,rtolmargin,0.01,atolmargin,0.01 \
    --target_encodings qnn_encoding.json

Tip

A working_directory is generated from wherever this script is called from unless otherwise specified.

../_static/resources/tensor_inspection.png

The figure above shows a sample output from a Tensor inspection run. The following details what each file contains.

  • Each tensor will have its own directory; the directory name matches the tensor name.

    • CDF_plots.html – Golden vs target CDF graph

    • Diff_plots.html – Golden and target deviation graph

    • Distribution_min-max.png – Density plot for target tensor highlighting target vs calibrated min/max values

    • Histograms.html – Golden and target histograms

    • golden_data.csv – Golden tensor data

    • target_data.csv – Target tensor data

  • log.txt – Log statements from the entire run

  • summary.csv – Target min/max, calibrated min/max, golden output min/max, target vs calibrated min/max differences, and verifier outputs

Histogram Plots

  1. Comparison: We compare histograms for both the golden data and the target data.

  2. Overlay: To enhance clarity, we overlay the histograms bin by bin.

  3. Binned Ranges: Each bin represents a value range, showing the frequency of occurrence.

  4. Visual Insight: Overlapping histograms reveal differences or similarities between the datasets.

  5. Interactive: Hover over histograms to get tensor range and frequencies for the dataset.

Cumulative Distribution Function (CDF) Plots

  1. Overview: CDF plots display the cumulative probability distribution.

  2. Overlay: We superimpose CDF plots for golden and target data.

  3. Percentiles: These plots illustrate data distribution across different percentiles.

  4. Hover Details: Exact cumulative probabilities are available on hover.

Tensor Difference Plots

  1. Inspection: We generate plots highlighting differences between golden and target data tensors.

  2. Scatter and Line: Scatter plots represent tensor values, while line plots show differences at each index.

  3. Interactive: Hover over points to access precise values.

Run QNN Accuracy Debugger E2E

This feature is designed to run the framework runner, inference engine, and verification features sequentially with a single command to debug the model. The following debugging algorithms are available.

  1. Oneshot-layerwise(default):
    • This algorithm is designed to debug all layers of model at a time by performing below steps
      • Execute framework runner to collect reference outputs in fp32

      • Execute inference engine to collect backend outputs in provided target precision.

      • Execute verification for comparison of intermediate outputs from the above 2 steps

      • Execute tensor inspection (when –enable_tensor_inspection is passed) to dump various plots, e.g., scatter, line, CDF, etc., for intermediate outputs

    • It provides quick analysis to identify layers of model causing accuracy deviation.

    • User can chose cumulative-layerwise(below) for deeper analysis of accuracy deviation.

  2. Cumulative-layerwise:
    • This algorithm is designed to debug one layer at a time by performing below steps
      • Execute framework runner to collect reference outputs from all layers of model in fp32.

      • Execute inference engine and verification in iterative manner to perform below operations
        • to collect backend outputs in target precision for each layer while removing the effect of its preceeding layers on final output.

        • to compare intermediate outputs from framework runner and inference engine

    • It provides deeper analysis to identify all layers of model causing accuracy deviation.

    • Currently this option supports only onnx models.

  3. Layerwise:
    • This algorithm is designed to debug a single layer model at a time by performing the following steps
      • Get golden reference per layer outputs from an external tool or, if a golden reference is not given, run framework runner to collect intermediate layer outputs.

      • Iteratively execute inference engine and verification to:
        • Collect backend outputs in target precision for each single layer model by removing the preceding and following layers

        • Compare intermediate output from golden reference with inference engine single layer model output

    • Layerwise snooping provides deeper analysis to identify all model layers causing accuracy deviation on hardware with respect to framework/simulation outputs.

    • Layerwise snooping only supports ONNX models.

Usage

usage: qnn-accuracy-debugger [--framework_runner] [--inference_engine] [--verification] [-h]

Script that runs Framework Runner, Inference Engine or Verification.

Arguments to select which component of the tool to run.  Arguments are mutually exclusive (at
most 1 can be selected).  If none are selected, then all components are run:
--framework_runner Run framework
--inference_engine    Run inference engine
--verification        Run verification

optional arguments:
-h, --help              Show this help message. To show help for any of the components, run
                        script with --help and --<component>. For example, to show the help
                        for Framework Runner, run script with the following: --help
                        --framework_runner

usage: qnn-accuracy-debugger [-h] -f FRAMEWORK [FRAMEWORK ...] -m MODEL_PATH -i INPUT_TENSOR
                            [INPUT_TENSOR ...] -o OUTPUT_TENSOR -r RUNTIME -a
                            {aarch64-android,x86_64-linux-clang,aarch64-android-clang6.0}
                            -l INPUT_LIST --default_verifier DEFAULT_VERIFIER [DEFAULT_VERIFIER ...]
                            [--debugging_algorithm {layerwise,cumulative-layerwise,oneshot-layerwise}]

Options for running the Accuracy Debugger components

optional arguments:
-h, --help            show this help message and exit

Arguments required by both Framework Runner and Inference Engine:
-f FRAMEWORK [FRAMEWORK ...], --framework FRAMEWORK [FRAMEWORK ...]
                        Framework type and version, version is optional. Currently supported
                        frameworks are [tensorflow, tflite, onnx]. For example, tensorflow
                        2.3.0
-m MODEL_PATH, --model_path MODEL_PATH
                        Path to the model file(s).
-i INPUT_TENSOR [INPUT_TENSOR ...], --input_tensor INPUT_TENSOR [INPUT_TENSOR ...]
                        The name, dimensions, raw data, and optionally data type of the
                        network input tensor(s) specifiedin the format "input_name" comma-
                        separated-dimensions path-to-raw-file, for example: "data"
                        1,224,224,3 data.raw float32. Note that the quotes should always be
                        included in order to handle special characters, spaces, etc. For
                        multiple inputs specify multiple --input_tensor on the command line
                        like: --input_tensor "data1" 1,224,224,3 data1.raw --input_tensor
                        "data2" 1,50,100,3 data2.raw float32.
-o OUTPUT_TENSOR, --output_tensor OUTPUT_TENSOR
                        Name of the graph's specified output tensor(s).

Arguments required by Inference Engine:
-r RUNTIME, --runtime RUNTIME
                        Runtime to be used for inference.
-a {aarch64-android,x86_64-linux-clang,aarch64-android-clang6.0}, --architecture {aarch64-an
droid,x86_64-linux-clang,aarch64-android-clang6.0}
                        Name of the architecture to use for inference engine.
-l INPUT_LIST, --input_list INPUT_LIST
                        Path to the input list text.
Arguments required by Verification:                                                    [3/467]
--default_verifier DEFAULT_VERIFIER [DEFAULT_VERIFIER ...]
                        Default verifier used for verification. The options "RtolAtol",
                        "AdjustedRtolAtol", "TopK", "L1Error", "CosineSimilarity", "MSE",
                        "MAE", "SQNR", "ScaledDiff" are supported. An optional
                        list of hyperparameters can be appended. For example:
                        --default_verifier rtolatol,rtolmargin,0.01,atolmargin,0,01. An
                        optional list of placeholders can be appended. For example:
                        --default_verifier CosineSimilarity param1 1 param2 2. to use
                        multiple verifiers, add additional --default_verifier
                        CosineSimilarity

optional arguments:
--debugging_algorithm {layerwise,cumulative-layerwise,oneshot-layerwise}
                        Performs model debugging layerwise, cumulative-layerwise or in oneshot-
                        layerwise based on choice. Default is oneshot-layerwise.
-v, --verbose           Verbose printing
-w WORKING_DIR, --working_dir WORKING_DIR
                        Working directory for the wrapper to store temporary files. Creates
                        a new directory if the specified working directory does not exitst.
--output_dirname OUTPUT_DIRNAME
                        output directory name for the wrapper to store temporary files under
                        <working_dir>/wrapper. Creates a new directory if the specified
                        working directory does not exist
--deep_analyzer {modelDissectionAnalyzer}
                        Deep Analyzer to perform deep analysis
--golden_output_reference_directory
                        Optional parameter to indicate the directory of the golden reference outputs.
                        When this option is provided, the framework runner is stage skipped.
                        In inference stage, it's used for tensor mapping without a framework.
                        In verification stage, it's used as a reference to compare
                        outputs produced in the inference engine stage.
--enable_tensor_inspection
                        Plots graphs (line, scatter, CDF etc.) for each
                        layer's output. Additionally, summary sheet will have
                        more details like golden min/max, target min/max etc.,

--step_size
                        Number of layers to skip in each iteration of debugging.
                        Applicable only for cumulative-layerwise algorithm.
                        --step_size (> 1) should not be used along with --add_layer_outputs,
                        --add_layer_types, --skip_layer_outputs, skip_layer_types,
                        --start_layer, --end_layer
(below options are ignored for framework_runner component incase of layerwise and cumulative-layerwise runs)
--add_layer_outputs ADD_LAYER_OUTPUTS
                        Output layers to be dumped, e.g., 1579,232
--add_layer_types ADD_LAYER_TYPES
                        Outputs of layer types to be dumped, e.g., Resize, Transpose; all enabled by default
--skip_layer_types SKIP_LAYER_TYPES
                        Comma delimited layer types to skip snooping, e.g., Resize, Transpose
--skip_layer_outputs SKIP_LAYER_OUTPUTS
                        Comma delimited layer output names to skip debugging, e.g., 1171, 1174
--start_layer START_LAYER
                        Extracts the given model from mentioned start layer
                        output name
--end_layer END_LAYER
                        Extracts the given model from mentioned end layer
                        output name
--use_native_output_files
                        Specifies that the output files will be generated in
                        the data type native to the graph. If not specified,
                        output files will be generated in floating point.
--disable_layout_transform
                      Disables layout transformation of Target outputs. This
                      option has to be used used when Golden/Framework
                      outputs and Target outputs are already in the same
                      layout.

Note : --start_layer and --end_layer options are allowed only for Layerwise and Cumulative layerwise run

Sample Command for oneshot-layerwise

Command for Oneshot-layerwise using DSP backend:

qnn-accuracy-debugger \
    --architecture aarch64-android \
    --runtime dspv73 \
    --framework tensorflow \
    --model_path $RESOURCESPATH/samples/InceptionV3Model/inception_v3_2016_08_28_frozen.pb \
    --input_tensor "input:0" 1,299,299,3 $PATHTOGOLDENI/samples/InceptionV3Model/data/chairs.raw \
    --output_tensor InceptionV3/Predictions/Reshape_1:0 \
    --debugging_algorithm oneshot-layerwise
    --input_list $RESOURCESPATH/samples/InceptionV3Model/data/image_list.txt \
    --default_verifier CosineSimilarity \
    --enable_tensor_inspection \
    --verbose
Command for Oneshot-layerwise using HTP emulation on x86 host:

qnn-accuracy-debugger \
    --framework onnx \
    --runtime htp \
    --model_path /local/mnt/workspace/models/vit/vit_base_16_224.onnx \
    --input_tensor "input.1" 1,3,224,224 /local/mnt/workspace/models/vit/000000039769_1_3_224_224.raw \
    --output_tensor 1597 \
    --architecture x86_64-linux-clang \
    --input_list /local/mnt/workspace/models/vit/list.txt \
    --default_verifier CosineSimilarity \
    --offline_prepare \
    --debugging_algorithm oneshot-layerwise
    --enable_tensor_inspection \
    --verbose
Running pre-quantized models (tflite model example):

qnn-accuracy-debugger \
    --debugging_algorithm oneshot-layerwise \
    --runtime dspv75 \
    --architecture aarch64-android \
    --framework tflite \
    --model_path hand_regressor_random_weights.tflite \
    --input_list sample.txt \
    --input_tensor "serving_default_features:0" 1,160,160,1 1.raw uint8 \
    --output_tensor "StatefulPartitionedCall:4" \
    --output_tensor "StatefulPartitionedCall:3" \
    --output_tensor "StatefulPartitionedCall:5" \
    --output_tensor "StatefulPartitionedCall:0" \
    --output_tensor "StatefulPartitionedCall:2" \
    --output_tensor "StatefulPartitionedCall:1" \
    --default_verifier mse \
    --engine QNN \
    --engine_path $QNN_SDK_ROOT \
    --use_native_input_files \
    --use_native_output_files \
    --float_fallback
Example for using external golden outputs dumped by any frameworks like ONNX, TF:

qnn-accuracy-debugger \
    --debugging_algorithm cumulative-layerwise \
    --architecture aarch64-android \
    --runtime dspv75 \
    --framework onnx \
    --model_path /path/to/model.onnx \
    --input_tensor "input.1" 1,3,224,224 /path/to/input.raw \
    --output_tensor 1597 \
    --input_list /path/to/list.txt \
    --default_verifier CosineSimilarity \
    --offline_prepare \
    --golden_output_reference_directory /path/to/goldens
Example for using external golden outputs dumped by QNN:

qnn-accuracy-debugger \
    --debugging_algorithm cumulative-layerwise \
    --architecture aarch64-android \
    --runtime dspv75 \
    --framework onnx \
    --model_path /path/to/model.onnx \
    --input_tensor "input.1" 1,3,224,224 /path/to/input.raw \
    --output_tensor 1597 \
    --input_list /path/to/list.txt \
    --default_verifier CosineSimilarity \
    --offline_prepare \
    --golden_output_reference_directory /path/to/goldens \
    --disable_layout_transform

Note

The –enable_tensor_inspection argument significantly increases overall execution time when used with large models. To speed up execution, omit this argument.

Output

The program creates framework_runner, inference_engine, verification, and wrapper output directories as below:

../_static/resources/oneshot-layerwise.png
  • framework_runner – Contains a timestamped directory that contains the intermediate layer outputs (framework) stored in .raw format as described in the framework runner step.

  • inference_engine – Contains a timestamped directory that contains the intermediate layer outputs (inference engine) stored in .raw format as described in the inference engine step.

  • verification directory – Contains a timestamped directory that contains the following:

    • A directory for each verifier specified while running oneshot; it contains CSV and HTML files with metric details for each layer output

    • tensor_inspection – Individual directories for each layer’s output with the following contents:

      • CDF_plots.png – Golden vs target CDF graph

      • Diff_plots.png – Golden and target deviation graph

      • Histograms.png – Golden and target histograms

      • golden_data.csv – Golden tensor data

      • target_data.csv – Target tensor data

    • summary.csv – Report for verification results of each layers output

  • Wrapper directory containing log.txt with the entire log for the run.

Note: Except wrapper directory all other directories will have a folder called latest which is a symlink to the latest run’s corresponding timestamped directory.

Snapshot of summary.csv file:

../_static/resources/oneshot_summary.png

Understanding the oneshot-layerwise report:

Column

Description

Name

Output name of the current layer

Layer Type

Type of the current layer

Size

Size of this layer’s output

Tensor_dims

Shape of this layer’s output

<Verifier name>

Verifier value of the current layer output compared to reference output

golden_min

minimum value in the reference output for current layer

golden_max

maximum value in the reference output for current layer

target_min

minimum value in the target output for current layer

target_max

maximum value in the target output for current layer

Sample Command for cumulative-layerwise

Command for Cumulative-layerwise using DSP backend:

qnn-accuracy-debugger \
    --framework onnx \
    --runtime dspv73 \
    --model_path /local/mnt/workspace/models/vit/vit_base_16_224.onnx \
    --input_tensor "input.1" 1,3,224,224 /local/mnt/workspace/models/vit/000000039769_1_3_224_224.raw \
    --output_tensor 1597 \
    --architecture x86_64-linux-clang \
    --input_list /local/mnt/workspace/models/vit/list.txt \
    --default_verifier CosineSimilarity \
    --offline_prepare \
    --debugging_algorithm cumulative-layerwise
    --engine QNN \
    --verbose
Command for Cumulative-layerwise using HTP emulation on x86 host:

qnn-accuracy-debugger \
    --framework onnx \
    --runtime htp \
    --model_path /local/mnt/workspace/models/vit/vit_base_16_224.onnx \
    --input_tensor "input.1" 1,3,224,224 /local/mnt/workspace/models/vit/000000039769_1_3_224_224.raw \
    --output_tensor 1597 \
    --architecture x86_64-linux-clang \
    --input_list /local/mnt/workspace/models/vit/list.txt \
    --default_verifier CosineSimilarity \
    --offline_prepare \
    --debugging_algorithm cumulative-layerwise
    --engine QNN \
    --verbose

Output

The program creates output directories framework_runner, cumulative_layerwise_snooping and wrapper directories as below

../_static/resources/cumulative_layerwise_work_dir.png
  • framework_runner directory contains timestamped directory that contains the intermediate layers outputs stored in .raw format just as mentioned in Framework Runner step.

  • cumulative_layerwise_snooping directory contains intemediate outputs obtained from inference engine step stored in separate directories with respective layer names. Also it contains final report named cumulative_layerwise.csv which contains verifier scores for each layer. User can identify layers with most deviating scores as problematic nodes.

  • Wrapper directory consists a log.txt where user can refer entire logs for the whole run.

../_static/resources/cumulative_layerwise_report.png

Understanding the cumulative-layerwise report

At the end of cumulative-layerwise run, the tool generates .csv with below information for each layer

Column

Description

O/P Name

Output name of the current layer.

Status

If empty, indicates normal execution.Other possible values:
  • skip - This layer was not debugged as requested by the user.

  • part - Due to the mismatch at this layer, the model was partitioned after this layer

  • err_part - error occured while partitioning model at that layer.

  • err_con - coverter error occurred at this layer.

  • err_lib - lib-generator error occurred at this layer.

  • err_cntx - context-bin-generator error occurred at this layer.

  • err-exec - Failed to execute the compiled model at this layer.

  • err-compare - Failed to compare the backend output of this layer with reference.

Layer Type

Type of the current layer.

Shape

Shape of this layer’s output.

Activations

The Min, Max and Median of the outputs at this layer taken from reference execution.

<Verifier name>

Absolute verifier value of the current layer compared to reference platform.

Orig outputs

Displays the original outputs verifier score observed when the model was run with the current
layer output enabled starting from the last partitioned layer.

Info

Displays information for the output verifiers, if the values are abnormal.

Command for Layerwise:

qnn-accuracy-debugger \
    --framework onnx \
    --runtime dspv73 \
    --model_path /local/mnt/workspace/models/vit/vit_base_16_224.onnx \
    --input_tensor "input.1" 1,3,224,224 /local/mnt/workspace/models/vit/000000039769_1_3_224_224.raw \
    --output_tensor 1597 \
    --architecture x86_64-linux-clang \
    --input_list /local/mnt/workspace/models/vit/list.txt \
    --default_verifier CosineSimilarity \
    --offline_prepare \
    --debugging_algorithm layerwise \
    --quantization_overrides /local/mnt/workspace/layer_output_dump/vit_base_16_224.encodings \
    --engine QNN \
    --verbose

Output

The program creates layerwise_snooping and wrapper output directories as well as framework_runner if a golden reference is not provided (like described for cumulative-layerwise).

  • layerwise_snooping directory – Contains each single layer model outputs obtained from the inference engine stage stored in separate directories and the final report named layerwise.csv which contains verifier scores for each layer model. Users can identify layers with the most deviating scores as problematic nodes.

  • wrapper directory – Contains log.txt which stores the full logs for the run.

  • The output .csv is similar to the cumulative-layerwise output, but the original outputs column will not be present in layerwise snooping, since we are not dealing with final outputs of the model.

Debugging Accuracy issue with Quantized model using Cumulative Layerwise Snooping

  • With quantized models, it is expected to have some mismatch at most data intensive layers - arising due to quantization error.

  • The debugger can be used to identify operators which are most sensitive with high verifier score and run those at higher precision to improve overall accuracy.

  • The sensitivity is determined by the verifier score seen at that layer regarding the reference platform (like ONNXRT).

  • Note that Cumulative-layerwise debugging takes considerable time as the partitioned model shall be quantized and compiled at every layer that does not have a 100% match with reference.

  • Below is one strategy to debug larger models:

    • Run Oneshot-layerwise on the model which helps to identify the starting point of sensitivity in the model.

    • Run Cumulative-layerwise at different parts of the model using start-layer and end-layer options (if the model has 100 nodes, use start layer at starting node from Oneshot-layerwise run and end layer at the 25th node for run 1, start layer at 26th and end layer at 50th node for run 2, start layer at 51st node and end layer at 75th node for run 3 .. and so on).The final reports of all runs help to identify the most sensitive layers in the model. Let’s say node A,B,C have high verifier scores which indicates high sensitivity

      • Run the original model with those specific layers (A/B/C - one at a time or combinations) in FP16 and observe the improvement in accuracy.

Debugging Accuracy issue for models exhibiting Accuracy discrepancy between golden reference (for ex. - AIMET/framework runtime output) vs target output using Layerwise Snooping

  • One of the popular usecase for layerwise snooping is debugging accuracy difference between AIMET vs target
    • Though we are creating an exact simulation of hardware using tools like AIMET, still it is expected to have a very minute mismatch due to environment differences. This can be because simulation executes on GPU FP32 kernels and is simulating noise rather than actual execution on integer kernels in the case of hardware execution.

    • If we have a higher deviation between simulation and hardware, then layerwise snooping could be used to point out to the nodes having higher deviations. The nodes showing higher deviation as per layerwise.csv can be identified as the erroneous nodes.

  • Other usecases include debugging Framework runtime’s FP32 output vs target INT16 output deviations.

Binary Snooping

The binary snooping tool debugs the given ONNX graph in a binary search fashion.

For the graph under analysis, it quantizes half of the graph and lets the other half run in fp16/32. The final model output is used to calculate the subgraph quantization effect. If the subgraph has a high effect(verifier scores greater than 60% of sum of the two subgraphs scores) on the final model output due to quantization, the process repeats until the subgraph size is less than the min_graph_size or the subgraph cannot be divided again. If both subgraphs have similar scores(verifier scores greater than 40% of sum of the two subgraphs scores), both subgraphs are investigated further.

usage

usage: qnn-accuracy-debugger --binary_snooping \
                           -m MODEL_PATH \
                           -l INPUT_LIST \
                           -i INPUT_TENSOR \
                           -f FRAMEWORK \
                           -o OUTPUT_TENSOR \
                           -e ENGINE_NAME \
                           -qo QUANTIZATION_OVERRIDES \
                           [--verifier VERIFIER] \
                           [-a {x86_64-linux-clang,aarch64-android,aarch64-qnx,wos-remote,x86_64-windows-msvc,wos}] \
                           [--host_device {x86,x86_64-windows-msvc,wos}] \
                           [-r {cpu,gpu,dsp,dspv68,dspv69,dspv73,dspv75,dspv79,aic,htp}] \
                           [--deviceId DEVICEID] \
                           [--golden_output_reference_directory GOLDEN_OUTPUT_REFERENCE_DIRECTORY] \
                           [--bias_bitwidth BIAS_BITWIDTH] \
                           [--use_per_channel_quantization USE_PER_CHANNEL_QUANTIZATION] \
                           [--weights_bitwidth WEIGHTS_BITWIDTH] \
                           [--act_bitwidth {8,16}] [-fbw {16,32}] \
                           [-rqs RESTRICT_QUANTIZATION_STEPS] \
                           [-w WORKING_DIR] \
                           [--output_dirname OUTPUT_DIRNAME] \
                           [-p ENGINE_PATH] \
                           [--min_graph_size MIN_GRAPH_SIZE] \
                           [--extra_converter_args EXTRA_CONVERTER_ARGS] \
                           [--act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}] \
                           [--param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}] \
                           [--act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}] \
                           [--param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}] \
                           [--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE] \
                           [--param_quantizer {tf,enhanced,adjusted,symmetric}] \
                           [--act_quantizer {tf,enhanced,adjusted,symmetric}] \
                           [--per_channel_quantization] \
                           [--algorithms ALGORITHMS] \
                           [--verifier_config VERIFIER_CONFIG] \
                           [--start_layer START_LAYER] \
                           [--end_layer END_LAYER] [--precision {int8,fp16}] \
                           [--compiler_config COMPILER_CONFIG] \
                           [--ignore_encodings] \
                           [--extra_runtime_args EXTRA_RUNTIME_ARGS] \
                           [--add_layer_outputs ADD_LAYER_OUTPUTS] \
                           [--add_layer_types ADD_LAYER_TYPES] \
                           [--skip_layer_types SKIP_LAYER_TYPES] \
                           [--skip_layer_outputs SKIP_LAYER_OUTPUTS] \
                           [--remote_server REMOTE_SERVER] \
                           [--remote_username REMOTE_USERNAME] \
                           [--remote_password REMOTE_PASSWORD] [-nif] [-nof]

Sample Commands

Sample command to run binary snooping on mv2 large model

qnn-accuracy-debugger\
  --binary_snooping\
  --framework onnx\
  --model_path models/mv2/mobilenet-v2.onnx\
  --architecture aarch64-android\
  --input_list models/mv2/inputs/input_list_1.txt\
  --input_tensor "input.1" 1,3,224,224 /local/mnt/workspace/harsraj/models/mv2/inputs/data1.raw\
  --output_tensor "473"\
  --engine_path $QNN_SDK_ROOT\
  --working_dir   working_directory/QNN/BINARY_MV2_DSP\
  --runtime dspv75\
  --engine QNN\
  --verifier mse\
  --extra_converter_args "float_bitwidth=32;preserve_io=layout"\
  --quantization_overrides /local/mnt/workspace/harsraj/models/mv2/quantized_encoding.json\
  --min_graph_size 16

Outputs The algorithm provides two JSON files:

  1. graph_result.json (for each subgraph) - Contains verifier scores for two child subgraphs; for example 318_473 has child subgraphs 318_392 and 393_473.

  2. subgraph_result.json (for each subgraph) - Contains the corresponding and sorted verifier scores.

Keys in both files look like “subgraph_start_node_activation_name” + _ + “subgraph_end_node_activation_name”.

For example, 318_473 means a subgraph starts at node activation 318 and ends at node activation 473. Only the subgraph from 318 to 473 is quantized while the rest of the model runs in fp16/32.

Debugging accuracy issues with binary snooping results Subgraphs with maximum verifier scores in subgraph_result.json are the culprit subgraphs.

One subgraph can be a subset of another subgraph. In this case prioritize a subgraph size you are comfortable debugging. The details of a subset can be found in graph_result.json.

Quantization Checker

The quantization checker analyzes activations, weights, and biases of a given model. It provides:

  1. comparison between quantized and unquantized weights and biases

  2. Analysis on unquantized weights, biases, and activations

  3. Results in csv, html, or plots

  4. Problematic weights and biases for a given bitwidth quantization

Usage

usage: qnn-accuracy-debugger --quant_checker [-h] \
                            --model_path \
                            --input_tensor \
                            --config_file \
                            --framework \
                            --input_list \
                            --output_tensor \
                            [--engine_path] \
                            [--working_dir] \
                            [--quantization_overrides] \
                            [--extra_converter_args] \
                            [--bias_width] \
                            [--weights_width] \
                            [--host_device] \
                            [--deviceId] \
                            [--generate_csv] \
                            [--generate_plots] \
                            [--per_channel_plots] \
                            [--golden_output_reference_directory] \
                            [--output_dirname]
                            [--verbose]

Sample quant_checker_config_file

{
“WEIGHT_COMPARISON_ALGORITHMS”: [

{“algo_name”:”minmax”,”threshold”:”10”}, {“algo_name”:”maxdiff”, “threshold”:”10”}, {“algo_name”:”sqnr”, “threshold”:”26”}, {“algo_name”:”stats”, “threshold”:”2”}, {“algo_name”:”data_range_analyzer”}, {“algo_name”:”data_distribution_analyzer”, “threshold”:”0.6”}

], “BIAS_COMPARISON_ALGORITHMS”: [

{“algo_name”:”minmax”, “threshold”:”10”}, {“algo_name”:”maxdiff”, “threshold”:”10”}, {“algo_name”:”sqnr”, “threshold”:”26”}, {“algo_name”:”stats”, “threshold”:”2”}, {“algo_name”:”data_range_analyzer”}, {“algo_name”:”data_distribution_analyzer”, “threshold”:”0.6”}

], “ACT_COMPARISON_ALGORITHMS”: [

{“algo_name”:”minmax”, “threshold”:”10”}, {“algo_name”:”data_range_analyzer”}

], “INPUT_DATA_ANALYSIS_ALGORITHMS”: [{“algo_name”:”stats”, “threshold”:”2”}], “QUANTIZATION_ALGORITHMS”: [“cle”, “None”], “QUANTIZATION_VARIATIONS”: [“tf”, “enhanced”, “symmetric”, “asymmetric”]

}

Output

Output are available in the <working-directory>/results, which looks like: .. container:

.. figure:: /../_static/resources/quant_checker_acc_debug_output_dir_struct.png

Results are provided in:

  1. HTML

  2. CSV

  3. Histogram

A log is provided in the <working-directory>/quant_checker directory.

HTML Each HTML file contains a summary of the results for each quantization option and for each input file provided.

The following example provides additional guidance on the contents of the HTML files.

../_static/resources/qnn_quantatization_checker_html_sample.png

CSV Results Files

Each CSV file contains detailed computation results for a specific node type (activation/weight/bias) and quantization option. Each row in the csv file displays the op name, node name, passes accuracy (True/False), computation result (accuracy differences), threshold used for each algorithm, and the algorithm name. The format of the computation results (accuracy differences) differs according to the algorithms/metrics used.

The following table provides additional notes about the different algorithms and information in each csv row. .. list-table:

:header-rows: 1
:widths: auto

* - Field
  - Comparator
  - Information
  - Example

* - minmax
  - Indicates the difference between the unquantized
  minimum and the dequantized minimum value. Correspondingly,
  indicates the same difference for the maximum unquantized and
  dequantized value.
  - computation result: "min: #VALUE max: #VALUE"
* - maxdiff
  - Calculates the absolute difference between the
  unquantized and dequantized data for all data points and
  displays the maximum value of the result.
  - computation result: "#VALUE"
* - sqnr
  - Calculates the signal to quantization noise ratio
  between the two tensors of unquantized and dequantized
  data.
  - computation result: "#VALUE"
* - data_range_analyzer
  - Calculates the difference between the
  maximum and minimum values in a tensor and compares that to
  the maximum value supported by the bit-width used to
  determine if the range of values can be reasonably
  represented by the selected quantization bit width.
  - computation result: "unique dec places: #INT_VALUE
  data range : #VALUE". Information in the computation results
  field includes how many unique decimal places we need to
  express the unquantized data in quantized format and what is
  the actual data range.
* - data_distribution_analyzer
  - Calculates the clustering of
  the data to find whether a large number of unique unquantized
  values are quantized to the same value or not.
  - computation result: "Distribution of pixels above
  threshold: #VALUE"
* - stats
  - Calculates some basic statistics on the received
  data such as the min, max, median, variance, standard
  deviation, the mode and the skew. The skew is used to
  indicate how symmetric the data is.
  - computation result: skew: #VALUE min: #VALUE max:
  #VALUE median: #VALUE variance: #VALUE stdDev: #VALUE mode:
  #VALUE

The following CSV example shows weight data for one of the quantization options.

../_static/resources/qnn_quantatization_checker_csv_weights.png

Separate .csv files are available for activations, weights and biases for each quantization option. The activation related results also include analysis for each input file provided.

Histogram

For each quantization variation and for each weight and bias tensor in the model, we generate historagm. a histogram is generated for each quantization variation and for each weight and bias tensor in the model. The following example illustrates the generated histograms.

../_static/resources/quant_checker_hist.png
align

left

Logs

The log files contain the following information.

  • The commands executed as part of the script’s run, including different runs of the snpe-converter tool with different quantization options

  • Analysis failures for activations, weights, and biases

The following example shows a sample log output.

<====ACTIVATIONS ANALYSIS FAILURES====>

<====ACTIVATIONS ANALYSIS FAILURES====>

Results for the enhanced quantization: | Op Name | Activation Node | Passes Accuracy | Accuracy Difference | Threshold Used | Algorithm Used | | conv_tanh_comp1_conv0 | ReLU_6919 | False | minabs_diff: 0.59 maxabs_diff: 17.16 | 0.05 | minmax |

where,

  1. Op Name : Op name as expressed in corresponding qnn artifacts

  2. Activation Node : Activation node name in the operation

  3. Passes Accuracy : True if the quantized activation (or weight or bias) meets threshold when compared with values from float32 graph; false otherwise

  4. Accuracy Difference : Details about the accuracy per the algorithm used

  5. Threshold Used : The threshold used to influence the result of “Passes Accuracy” column

  6. Algorithm Used : Metric used to compare actual quantized activations/weights/biases against unquantized float data or analyze the quality of unquantized float data. Metrics can be minmax, maxdiff, sqnr, stats, data_range_analyzer, data_distribution_analyzer.

qairt-accuracy-debugger (Beta)

Dependencies

The Accuracy Debugger depends on the setup outlined in Setup. In particular, the following are required:

  1. Platform dependencies are need to be met as per Platform Dependencies

  2. The desired ML frameworks need to be installed. Accuracy debugger is verified to work with the ML framework versions mentioned at Environment Setup

Supported models

The qairt-accuracy-debugger currently supports ONNX.

Overview

The Accuracy Debugger tool finds inaccuracies in a neural-network at the layer level. Primarily functionality of this tool is to compare the golden outputs produced by running a model through ML framework with the results produced by running the same model on Target devices (HTP, CPU, GPU etc.,).

The following component are available in Accuracy Debugger. Each component can be run with its corresponding subcommand; for example, qairt-accuracy-debugger {component}.

  1. qairt-accuracy-debugger framework_runner uses an ML framework e.g. Onnx, to run the model to get intermediate outputs.

  2. qairt-accuracy-debugger inference_engine uses inference engine to run a model on the target device to retrieve intermediate outputs.

  3. qairt-accuracy-debugger verification compares the output generated by the framework runner and inference engine features using verifiers such as CosineSimilarity, RtolAtol, etc.

  4. qairt-accuracy-debugger compare_encodings compares target encodings with the AIMET encodings, and outputs an Excel sheet highlighting mismatches.

  5. qairt-accuracy-debugger tensor_visualizer compares given target outputs with golden outputs.

  6. qairt-accuracy-debugger snooping runs chosen snooping algorithm to investigate accuracy issues.

Tip:
  • You can use –help with component name to see options (required or optional) available for that component.

Below are the instructions for running various components available in Accuracy Debugger:

Framework Runner

The Framework Runner component is designed to run models with different machine learning frameworks (e.g. Tensorflow, Onnx, TFLite etc). A given model is run with a specific ML framework. Golden outputs are produced for future comparison with inference results from the Inference Engine step.

Usage

usage: qairt-accuracy-debugger framework_runner [-h] -m INPUT_MODEL --input_sample INPUT_SAMPLE [INPUT_SAMPLE ...] [--working_directory WORKING_DIRECTORY]
                                      [-o OUTPUT_TENSOR] [--log_level {info,debug,warning,error}]

options:
  -h, --help            show this help message and exit

required arguments:
  -m INPUT_MODEL, --input_model INPUT_MODEL
                        path to the model file
  --input_sample INPUT_SAMPLE [INPUT_SAMPLE ...]
                        Path to text file containing input sample. Refer to qnn-net-run input_list for format of input_sample file.
  --onnx_define_symbol SYMBOL VALUE
                    Option to override specific input dimension symbols.

optional arguments:
  --working_directory WORKING_DIRECTORY
                        Path to working directory. If not specified a directory with name working_directory will be created in the current directory.
  -o OUTPUT_TENSOR, --output_tensor OUTPUT_TENSOR
                        Name of the graph's specified output tensor(s).
  --log_level {info,debug,warning,error}
                        Log level. Default is info.

Sample Commands

qairt-accuracy-debugger framework_runner \
                          --input_model dlv3onnx/dlv3plus_mbnet_513-513_op9_mod_basic.onnx \
                          --input_sample input_sample.txt \
                          --output_tensor Output
TIP:
  • a working_directory, if not otherwise specified, is generated from wherever you are calling the script

Outputs

Once the Framework Runner has finished running, it will store the outputs in the specified working directory. It creates an output directory with a timestamp of the format YYYY-MM-DD_HH:mm:ss in working_directory/framework_runner. The following figure shows a sample output folder from a Framework Runner run using an Onnx model.

working_directory
└── framework_runner
    ├── 2025-07-07_22-01-02
    │    ├── mobilenetv20_features_batchnorm0_fwd.raw
    │    ├──         .
    │    ├──         .
    │    └── profile_info.json

The output directory contains the outputs of each layer in the model saved as .raw files. Every raw file that can be seen is the output of an operation in the model.

The intermediate outputs produced by the Framework Runner step offers precise reference/golden material for the Verification component to diagnose the accuracy of the network outputs generated by the Inference Engine.

Inference Engine

The Inference Engine component is designed to dump intermediate outputs of the model when run on target devices like CPU, DSP, GPU etc.,. The output produced by this step can be compared with the golden outputs produced by the framework runner step.

Usage

usage: qairt-accuracy-debugger inference_engine [-h] --input_model INPUT_MODEL
                                        [--desired_input_shape DESIRED_INPUT_SHAPE [DESIRED_INPUT_SHAPE ...]]
                                        [--output_tensor OUTPUT_TENSOR]
                                        [--converter_float_bitwidth {32,16}]
                                        [--float_bias_bitwidth {32,16}]
                                        [--quantization_overrides QUANTIZATION_OVERRIDES]
                                        [--onnx_define_symbol SYMBOL VALUE]
                                        [--onnx_defer_loading]
                                        [--enable_framework_trace]
                                        [--op_package_config OP_PACKAGE_CONFIG [OP_PACKAGE_CONFIG ...]]
                                        [--converter_op_package_lib CONVERTER_OP_PACKAGE_LIB]
                                        [--package_name PACKAGE_NAME]
                                        [--calibration_input_list CALIBRATION_INPUT_LIST]
                                        [--bias_bitwidth {8,32}]
                                        [--act_bitwidth {8,16}]
                                        [--weights_bitwidth {8,4}]
                                        [--quantizer_float_bitwidth {32,16}]
                                        [--act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}]
                                        [--param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}]
                                        [--act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}]
                                        [--param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}]
                                        [--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE]
                                        [--use_per_channel_quantization]
                                        [--use_per_row_quantization]
                                        [--float_fallback]
                                        [--quantization_algorithms QUANTIZATION_ALGORITHMS [QUANTIZATION_ALGORITHMS ...]]
                                        [--restrict_quantization_steps RESTRICT_QUANTIZATION_STEPS]
                                        [--dump_encodings_json]
                                        [--ignore_encodings]
                                        [--op_package_lib OP_PACKAGE_LIB]
                                        [--perf_profile {low_balanced,balanced,default,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}]
                                        [--profiling_level PROFILING_LEVEL]
                                        [--input_list INPUT_LIST]
                                        [--netrun_backend_extension_config NETRUN_BACKEND_EXTENSION_CONFIG]
                                        [--offline_prepare_backend_extension_config OFFLINE_PREPARE_BACKEND_EXTENSION_CONFIG]
                                        [--backend {CPU,GPU,HTP,}]
                                        [--platform {aarch64-android,linux-embedded,qnx,wos,x86_64-linux-clang,x86_64-windows-msvc}]
                                        [--offline_prepare]
                                        [--working_directory WORKING_DIRECTORY]
                                        [--deviceId DEVICEID]
                                        [--log_level {ERROR,WARN,INFO,DEBUG,VERBOSE}]
                                        [--op_packages OP_PACKAGES]


Script to run inference engine.

options:
  -h, --help            show this help message and exit

  required arguments:
    --input_model INPUT_MODEL
                          Path to the source model/dlc/bin file

  optional arguments:
    --backend {CPU,GPU,HTP,}
                          Backend type for inference to be run
    --platform {aarch64-android,linux-embedded,qnx,wos,x86_64-linux-clang,x86_64-windows-msvc}
                          The type of device platform to be used for inference
    --offline_prepare     Boolean to indicate offline preapre of the graph
    --working_directory WORKING_DIRECTORY
                          Path to the directory to store the output result
    --deviceId DEVICEID   The serial number of the device to use. If not available, the first in a list of queried devices will be used for inference.
    --log_level {ERROR,WARN,INFO,DEBUG,VERBOSE}
                          Enable verbose logging.
    --op_packages OP_PACKAGES
                          Provide a comma separated list of op package and interface providers to register during graph preparation.Usage: op_package_path:interface_provider[,op_package_path:interface_provider...]

  converter arguments:
    --desired_input_shape DESIRED_INPUT_SHAPE [DESIRED_INPUT_SHAPE ...], --input_tensor DESIRED_INPUT_SHAPE [DESIRED_INPUT_SHAPE ...]
                          The name,dimension,datatype and layout of all the input buffers to the network specified in the format [input_name comma-separated-dimensions data-type layout]. Dimension, datatype and layout are optional.for example: 'data' 1,224,224,3. Note that the
                          quotes should always be included in order to handle special characters, spaces, etc. For multiple inputs, specify multiple --desired_input_shape on the command line like: --desired_input_shape "data1" 1,224,224,3 float32 --desired_input_shape "data2"
                          1,50,100,3 int64
    --output_tensor OUTPUT_TENSOR
                          Name of the graph's specified output tensor(s).
    --converter_float_bitwidth {32,16}
                          Use this option to convert the graph to the specified float bitwidth, either 32 (default) or 16.
    --float_bias_bitwidth {32,16}
                          Option to select the bitwidth to use for float bias tensor, either 32(default) or 16
    --quantization_overrides QUANTIZATION_OVERRIDES
                          Path to quantization overrides json file.
    --onnx_define_symbol SYMBOL VALUE
                          Option to override specific input dimension symbols.
    --onnx_defer_loading  Option to have the model not load weights. If False, the model will be loaded eagerly.
    --enable_framework_trace
                          Use this option to enable converter to trace the o/p tensor change information.
    --op_package_config OP_PACKAGE_CONFIG [OP_PACKAGE_CONFIG ...]
                          Absolute paths to Qnn Op Package XML configuration file that contains user defined custom operations.Note: Only one of: {'op_package_config', 'package_name'} can be specified.
    --converter_op_package_lib CONVERTER_OP_PACKAGE_LIB
                          Absolute path to converter op package library compiled by the OpPackage generator. Must be separated by a comma for multiple package libraries. Note: Libraries must follow the same order as the xml files. E.g.1: --converter_op_package_lib
                          absolute_path_to/libExample.so E.g.2: --converter_op_package_lib absolute_path_to/libExample1.so,absolute_path_to/libExample2.so
    --package_name PACKAGE_NAME
                          A global package name to be used for each node in the Model.cpp file. Defaults to Qnn header defined package name. Note: Only one of: {'op_package_config', 'package_name'} can be specified.

  quantizer_arguments:
    --calibration_input_list CALIBRATION_INPUT_LIST
                          Path to the inputs list text file to run quantization(used with qairt-quantizer)
    --bias_bitwidth {8,32}
                          Option to select the bitwidth to use when quantizing the bias. default 8
    --act_bitwidth {8,16}
                          Option to select the bitwidth to use when quantizing the activations. default 8
    --weights_bitwidth {8,4}
                          Option to select the bitwidth to use when quantizing the weights. default 8
    --quantizer_float_bitwidth {32,16}
                          Use this option to select the bitwidth to use for float tensors, either 32 (default) or 16.
    --act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}
                          Specify which quantization calibration method to use for activations supported values: min-max (default), sqnr, entropy, mse, percentile This option can be paired with --act_quantizer_schema to override the quantization schema to use for activations
                          otherwise default schema(asymmetric) will be used
    --param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}
                          Specify which quantization calibration method to use for parameters supported values: min-max (default), sqnr, entropy, mse, percentile This option can be paired with --act_quantizer_schema to override the quantization schema to use for activations
                          otherwise default schema(asymmetric) will be used
    --act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}
                          Specify which quantization schema to use for activations. Note: Default is asymmetric.
    --param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}
                          Specify which quantization schema to use for parameters. Note: Default is asymmetric.
    --percentile_calibration_value PERCENTILE_CALIBRATION_VALUE
                          Value must lie between 90 and 100. Default is 99.99
    --use_per_channel_quantization
                          Use per-channel quantization for convolution-based op weights. Note: This will replace built-in model QAT encodings when used for a given weight.
    --use_per_row_quantization
                          Use this option to enable rowwise quantization of Matmul and FullyConnected ops.
    --float_fallback      Use this option to enable fallback to floating point (FP) instead of fixed point.This option can be paired with --quantizer_float_bitwidth to indicate the bitwidth forFP (by default 32). If this option is enabled, then input list must not be provided and
                          --ignore_encodings must not be provided. The external quantization encodings (encoding file/FakeQuant encodings) might be missing quantization parameters for some interim tensors. First it will try to fill the gaps by propagating across math-invariant
                          functions. If the quantization params are still missing, then it will apply fallback to nodes to floating point.
    --quantization_algorithms QUANTIZATION_ALGORITHMS [QUANTIZATION_ALGORITHMS ...]
                          Use this option to select quantization algorithms. Usage is: --quantization_algorithms <algo_name1> ...
    --restrict_quantization_steps RESTRICT_QUANTIZATION_STEPS
                          Specifies the number of steps to use for computingquantization encodings E.g.--restrict_quantization_steps "-0x80 0x7F" indicates an example 8 bit range,
    --dump_encodings_json
                          Dump encoding of all the tensors in a json file
    --ignore_encodings    Use only quantizer generated encodings, ignoring any user or model provided encodings.
    --op_package_lib OP_PACKAGE_LIB
                          Use this argument to pass an op package library for quantization. Must be in the form <op_package_lib_path:interfaceProviderName> and be separated by a comma for multiple package libs

  netrun arguments:
    --perf_profile {low_balanced,balanced,default,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}
                          Specifies perf profile to set. Valid settings are "low_balanced" , "balanced" , "default", high_performance" ,"sustained_high_performance", "burst", "low_power_saver", "power_saver", "high_power_saver", "extreme_power_saver", and "system_settings". Note:
                          perf_profile argument is now deprecated for HTP backend, user can specify performance profile through backend extension config now.
    --profiling_level PROFILING_LEVEL
                          Enables profiling and sets its level. For QNN executor, valid settings are "basic", "detailed" and "client" Default is detailed.
    --input_list INPUT_LIST
                          Path to the input list text file to run inference(used with net-run).
    --netrun_backend_extension_config NETRUN_BACKEND_EXTENSION_CONFIG
                          Path to config to be used with qnn-net-run

  offline prepare arguments:
    --offline_prepare_backend_extension_config OFFLINE_PREPARE_BACKEND_EXTENSION_CONFIG
                          Path to config to be used with qnn-context-binary-generator.

Sample Commands

# Example for running on Linux host's CPU without quantization encodings
qairt-accuracy-debugger inference_engine \
                          --backend cpu \
                          --platform x86_64-linux-clang \
                          --input_model source_model/mobilenet.onnx \
                          --input_list inputs/input_list.txt \
                          --calibration_input_list inputs/calibration_list.txt \
                          --param_quantizer_schema symmetric \
                          --act_quantizer_schema asymmetric \
                          --param_quantizer_calibration sqnr \
                          --act_quantizer_calibration percentile \
                          --percentile_calibration_value 99.995 \
                          --bias_bitwidth 32

# Example for running on Android DSP target
qairt-accuracy-debugger inference_engine \
                          --backend htp \
                          --platform aarch64-android \
                          --device_id 357415c4 \
                          --input_model source_model/mobilenet.onnx \
                          --input_list inputs/input_list.txt \
                          --quantization_overrides AIMET_quantization_encodings.json

# Example for running on a WoS HTP target
qairt-accuracy-debugger inference_engine ^
                          --backend htp ^
                          --platform wos ^
                          --input_model source_model/mobilenet.onnx ^
                          --input_list inputs/input_list.txt ^
                          --quantization_overrides AIMET_quantization_encodings.json

# Example for running on a WoS CPU target
qairt-accuracy-debugger inference_engine ^
                          --backend cpu ^
                          --platform wos ^
                          --input_model source_model/mobilenet.onnx ^
                          --input_list inputs/input_list.txt ^
                          --calibration_input_list inputs/calib_list.txt

# Example for running on Android GPU target with fp16 precision
qairt-accuracy-debugger inference_engine \
                          --backend gpu \
                          --platform aarch64-android \
                          --input_model mobilenet.onnx \
                          --input_tensor "data" 1,3,224,224 inputs/data.raw \
                          --output_tensor mobilenetv20_output_flatten0_reshape0 \
                          --input_list inputs/input_list.txt \
                          --converter_float_bitwidth 16
Tip:
  • Although tool can quantize the given model using data provided through –calibration_input_list argument, it is recommended to pass quantization encodings through –quantization_overrides argument to speed-up the execution

More example commands with different stage configurations:

Sample Commands

# source stage: same as examples from above section

# Running from converted stage (Android DSP):
qairt-accuracy-debugger inference_engine \
                          --input_model converted_model.dlc \
                          --backend htp \
                          --device_id f366ce60 \
                          --platform aarch64-android \
                          --input_list inputs/input_list.txt \
                          --quantization_overrides AIMET_quantization_encodings.json

# Running from quantized stage (x86 CPU):
qairt-accuracy-debugger inference_engine \
                          --input_model quantized_model.dlc \
                          --backend cpu \
                          --platform x86_64-linux-clang \
                          --input_list inputs/input_list.txt \

Outputs

Once the Inference Engine has finished running, it will store the outputs in the specified working directory. By default, it will store the output in working_directory/inference_engine in the current working directory. It creates an output directory with a timestamp of the format YYYY-MM-DD_HH:mm:ss in working_directory/inference_engine. Below is the output directory structure:

working_directory
├── inference_engine
│   └── 2025-07-07_22-05-54
│       ├── base.dlc
│       ├── base_quantized.dlc
│       └── Output
│           └── Result_0
│               ├── data_0231.raw
│               ├──      .
│               ├──      .

The “output” directory contains raw files. Each raw file is an output of an operation in the network.

The base_quantized_encoding.json contains quantization encodings used by the model.

Verification

The Verification step compares the output (from the intermediate tensors of a given model) produced by the framework runner step with the output produced by the inference engine step. Once the comparison is complete, the verification results are compiled and displayed visually in a format that can be easily interpreted by the user.

There are different types of verifiers for e.g.: l1_norm, rtol_atol, etc. To see available verifiers please use the –help option (qairt-accuracy-debugger verification –help). Each verifier compares the Framework Runner and Inference Engine output using an error metric. It also prepares reports and/or visualizations to help the user analyze the network’s error data.

Usage

usage: qairt-accuracy-debugger verification [-h] --inference_tensor INFERENCE_TENSOR --reference_tensor REFERENCE_TENSOR
                                      [--comparators {l1_norm,l2_norm,average,cosine,standard_deviation,mse,snr,kl_divergence,rtol_atol,l1_error,mse_rel,topk,adjusted_rtol_atol,mae} [{l1_norm,l2_norm,average,cosine,standard_deviation,mse,snr,kl_divergence,rtol_atol,l1_error,mse_rel,topk,adjusted_rtol_atol,mae} ...]]
                                      [--reference_dtype REFERENCE_DTYPE] [--inference_dtype INFERENCE_DTYPE] [--dlc_file DLC_FILE] [--graph_info GRAPH_INFO]
                                      [--is_qnn_golden_reference] [--working_directory WORKING_DIRECTORY] [--log_level {info,debug,warning,error}]

options:
  -h, --help            show this help message and exit

required arguments:
  --inference_tensor INFERENCE_TENSOR
                        Directory path of inference tensor files.
  --reference_tensor REFERENCE_TENSOR
                        Directory path of reference tensor files.

optional arguments:
  --comparators {l1_norm,l2_norm,average,cosine,standard_deviation,mse,snr,kl_divergence,rtol_atol,l1_error,mse_rel,topk,adjusted_rtol_atol,mae} [{l1_norm,l2_norm,average,cosine,standard_deviation,mse,snr,kl_divergence,rtol_atol,l1_error,mse_rel,topk,adjusted_rtol_atol,mae} ...]
                        Comparator to use to compare tensors. For multiple comparators, specify as follows: --comparator mse std. Default comparator is mse
  --reference_dtype REFERENCE_DTYPE
                        Data type of reference tensor files.
  --inference_dtype INFERENCE_DTYPE
                        Data type of inference tensor files.
  --dlc_file DLC_FILE   Path to dlc file.
  --graph_info GRAPH_INFO
                        Path to json file containing graph information like, tensor mapping, graph structure and layout information in the following format:
                        {'tensor_mapping':{}, graph_structure:{}, layout_info:{}}
  --is_qnn_golden_reference
                        Specifies that outputs passed with --reference_tensor are dumped by QNN.
  --working_directory WORKING_DIRECTORY
                        Path to working directory. If not specified a directory with name working_directory will be created in the current directory.
  --log_level {info,debug,warning,error}
                        Log level. Default is info.

Sample Commands

# Compare output of framework runner with inference engine

qairt-accuracy-debugger verification \
                          --comparator CosineSimilarity mse \
                          --golden_output_reference_directory working_directory/framework_runner_output/ \
                          --inference_results working_directory/inference_engine_output/ \
                          --graph_info working_directory/graph_info.json \
                          --dlc_file working_directory/inference_engine/base.dlc
# Compare outputs of two different inference engine outputs:

qairt-accuracy-debugger verification \
                         --comparator mse \
                         --golden_output_reference_directory working_directory/inference_engine_output1/ \
                         --inference_results working_directory/inference_engine_output2/ \
Tip:
  • If you passed multiple images in the image_list.txt from run inference engine diagnosis, you’ll receive multiple output/Result_x. Choose the result that matches the input you used for framework runner for comparison (i.e., in framework you used chair.raw and inference chair.raw was the first item in the image_list.txt then choose output/Result_0 if chair.raw was the second item in image_list.txt, then choose output/Result_1).

  • It is recommended to always supply dlc_file or graph_info to the command as it is used to line up the report and find the corresponding files for comparison.

  • If both targets and golden outputs are to be exact-name-matching, then you do not need to provide a tensor_mapping file.

Tensor Mapping:

Tensor mapping is a JSON file keyed by inference tensor names, of framework tensor names. If the tensor mapping is not provided, the tool generate it from dlc_file. If dlc_file is not provided, it will assume inference and golden tensor names are identical.

Tensor Mapping File

```json
{
    "Postprocessor/BatchMultiClassNonMaxSuppression_boxes": "detection_boxes:0",
    "Postprocessor/BatchMultiClassNonMaxSuppression_scores": "detection_scores:0"
}
```

Outputs

Once the Verification has finished running, it will store the outputs in the specified working directory. By default, it will store the output in working_directory/verification in the current working directory. It creates an output directory with a timestamp of the format YYYY-MM-DD_HH:mm:ss in working_directory/verification.

Below is the output directory structure:

working_directory
└── verification
    ├── 2025-07-07_22-10-10
         └── verification.csv

Verifier generates a summary CSV file which summarizes the data from all verifiers and their subsequent tensors. The following figure shows how a sample summary generated in the verification step looks. Each row in this summary corresponds to one tensor name that is identified by the framework runner and inference engine steps. The final column shows cosine similarity score which can vary between 0 to 1 (this range might be different for other verifiers). Higher scores denote similarity while lower scores indicate variance. The developer can then further investigate those specific tensor details. The developer should inspect tensors from top-to-bottom order, meaning if a tensor is broken at an earlier node, anything that was generated post that node is unreliable until that node is properly fixed.

../_static/resources/verification_results.png

Compare Encodings

The Compare Encodings feature is designed to compare two encoding files. It provides the following features:
  1. One-to-Many Mapping: It maps a given tensor in an encoding file to all tensors in another encoding file with which it shares the similar encodings (encodings which can be algebraically converted to each other) and vice-versa.

  2. UnMapped Tensors: It provides the list of tensors in an encoding file that do not have similar encodings with any of the tensors in the other encoding file. It can be seen in the .csv file by setting the “Status” field to “UNMAPPED”.

  3. Incorrect consumption of AIMET encodings by QAIRT: It gives the list of AIMET tensors whose encodings were not consumed by QAIRT. It can be seen in the .csv file by setting the “Status” field to “ERROR”.

  4. Supergroup Mapping: It helps in identifying fusions.

It supports the following for encoding comparisons:
  1. QAIRT vs QAIRT

  2. QAIRT vs AIMET

  3. AIMET vs AIMET

It supports the following encoding schema versions:
  1. LEGACY AIMET encoding format

  2. “1.0.0” AIMET encoding format

Usage

usage: qairt-accuracy-debugger compare_encodings [-h] --encoding1_file_path ENCODING1_FILE_PATH --encoding2_file_path ENCODING2_FILE_PATH
                                            [--quantized_dlc1_path QUANTIZED_DLC1_PATH] [--quantized_dlc2_path QUANTIZED_DLC2_PATH]
                                            [--framework_model_path FRAMEWORK_MODEL_PATH] [--scale_threshold SCALE_THRESHOLD] [--working_directory WORKING_DIRECTORY]
                                            [--log_level {info,debug,warning,error}]

options:
  -h, --help            show this help message and exit

required arguments:
  --encoding1_file_path ENCODING1_FILE_PATH
                        Path to either QAIRT or AIMET encodings file
  --encoding2_file_path ENCODING2_FILE_PATH
                        Path to either QAIRT or AIMET encodings file

optional arguments:
  --quantized_dlc1_path QUANTIZED_DLC1_PATH
                        Path to quantized dlc file related to encoding_file1 being passed.If passed along side with framework model for any of the encoding_config, it
                        performs following operations on the qairt encodings file:1. Propagates convert_ops encodings to the its parent op considering the fact
                        thatparent op exists in the framework model2. Resolves any activation name changes done. For e.g. matmul+add in frameworkmodel becomes fc in
                        the dlc graph and the tensor name gets _fc suffix.It also performs supergroup mapping.
  --quantized_dlc2_path QUANTIZED_DLC2_PATH
                        Path to quantized dlc file related to encoding_file2 being passed.If passed along side with framework model for any of the encoding_config, it
                        performs following operations on the qairt encodings file:1. Propagates convert_ops encodings to the its parent op considering the fact
                        thatparent op exists in the framework model2. Resolves any activation name changes done. For e.g. matmul+add in frameworkmodel becomes fc in
                        the dlc graph and the tensor name gets _fc suffix.It also performs supergroup mapping.
  --framework_model_path FRAMEWORK_MODEL_PATH
                        path to the framework model. If passedalong side with quantized dlc for any of the encoding_config, it performs followingoperations on the
                        qairt encodings file:1. Propagates convert_ops encodings to the its parent op considering the fact thatparent op exists in the framework
                        model2. Resolves any activation name changes done. For e.g. matmul+add in frameworkmodel becomes fc in the dlc graph and the tensor name gets
                        _fc suffix.It also performs supergroup mapping.
  --scale_threshold SCALE_THRESHOLD
                        threshold for scale comparision of two encodings. For e.g.scale1=0.5, scale2=0.01. We compare scale1 and scale2
                        as:abs(scale1-scale2)<(min(scale1, scale2)*scale_threshold). This ensures that bound ismaintained by the lowest scale value among the given
                        two scales.
  --working_directory WORKING_DIRECTORY
                        Path to working directory. Default: working_directory
  --log_level {info,debug,warning,error}
                        Log level. Default is info

Sample Commands

# Comparing two encodings with no dlc file
qairt-accuracy-debugger compare_encodings    \
          --encoding1_file_path encoding1.json    \
          --encoding2_file_path encoding2.json

# Comparing two encodings with quantized_dlc being passed for encoding1
qairt-accuracy-debugger compare_encodings    \
          --encoding1_file_path encoding1.json    \
          --quantized_dlc1_path quantized_dlc.dlc \
          --encoding2_file_path encoding2.json \
          --framework_model framework_model.onnx

Tip

A working_directory is generated from wherever this script is called from unless otherwise specified.

Outputs

Once the Compare Encodings has finished running, it will store the outputs in the specified working directory. By default, it will store the output in working_directory/compare_encodings in the current working directory. Creates a directory named latest in working_directory/compare_encodings which is symbolically linked to the most recent run YYYY-MM-DD_HH:mm:ss. Users may choose to override the directory name by passing it to –output_dirname (i.e. –output_dirname myTest1).

The analysis report consists of .csv and .json files.

CSV Files:

The tool produces two .csv files. Each file has 10 columns:

Column

Description

Tensor Name(name of encoding file1)

A tensor name in encoding file1

Tensor Name(name of encoding file2)

A tensor name in encoding file

Status

One of “UNMAPPED”, “SUCCESS”, “WARNING”, “ERROR”

UNMAPPED: When a tensor in a encoding file is not mapped to any of the tensors in another encoding file.

SUCCESS: When a tensor in a encoding file is mapped to one or more tensors tensor in another encoding file.

WARNING: When a tensor in a encoding file is mapped to one or more tensors in another encoding file but does have the exact bitwidth or is_symm value.

ERROR: When a tensor of the same name in both the encoding files does not have the same encoding (scale, offset, channels, dtype).

dtype

One of “SAME”, “NOT_COMPARED”, some other info

SAME: If the value is the same in both encoding files

NOT_COMPARED: If the field is not compared between a pair of tensors belonging to two encoding files. If some tensors are not mapped, the field is not present in the encoding, or if the channels are not the same, there’s no need to compare scale and offsets.

Any info if comparison failed due to mismatch.

is_symm

One of “SAME”, “NOT_COMPARED”, some other info

SAME: If the value is the same in both encoding files

NOT_COMPARED: If the field is not compared between a pair of tensors belonging to two encoding files. If some tensors are not mapped, the field is not present in the encoding, or if the channels are not the same, there’s no need to compare scale and offsets.

Any info if comparison failed due to mismatch.

bitwidth

One of “SAME”, “NOT_COMPARED”, some other info

SAME: If the value is the same in both encoding files

NOT_COMPARED: If the field is not compared between a pair of tensors belonging to two encoding files. If some tensors are not mapped, the field is not present in the encoding, or if the channels are not the same, there’s no need to compare scale and offsets.

Any info if comparison failed due to mismatch.

channels

One of “SAME”, “NOT_COMPARED”, some other info

SAME: If the value is the same in both encoding files

NOT_COMPARED: If the field is not compared between a pair of tensors belonging to two encoding files. If some tensors are not mapped, the field is not present in the encoding, or if the channels are not the same, there’s no need to compare scale and offsets.

Any info if comparison failed due to mismatch.

scale

One of “SAME”, “NOT_COMPARED”, some other info

SAME: If the value is the same in both encoding files

NOT_COMPARED: If the field is not compared between a pair of tensors belonging to two encoding files. If some tensors are not mapped, the field is not present in the encoding, or if the channels are not the same, there’s no need to compare scale and offsets.

Any info if comparison failed due to mismatch.

offset

One of “SAME”, “NOT_COMPARED”, some other info

SAME: If the value is the same in both encoding files

NOT_COMPARED: If the field is not compared between a pair of tensors belonging to two encoding files. If some tensors are not mapped, the field is not present in the encoding, or if the channels are not the same, there’s no need to compare scale and offsets.

Any info if comparison failed due to mismatch.

Total Mappings

Number of tensors in another file with which the given tensor name shares its encoding.

It produces two .csv files:

  1. param_comparison.csv

../_static/resources/param_comparison.png
  1. activation_comparison.csv

../_static/resources/activation_comparison.png

JSON Files

  1. Encoding comparison .json files

Because it produces a “one-to-many” map of tensors sharing the same encodings between two files, CSV is not a “conclusive” format to represent the whole data. CSV gives overall information at a glance but JSON provides indepth details related to “one-to-many” maps. This can be used when “Total Mappings” for any tensor is >1 and it was not expected.

For example: for one tensor the info looks something like this:

../_static/resources/one2many.png

for the tensor name /rms_norm_0/Cast_1_output_0,

we have

  1. “compare_info” which contains all the tensor names in another encoding file along with its comparison info

  2. “Status” which contains the status of comparison with tensor names in another encoding file

  3. “Mapping” is a list of tensor names in another encoding file which is mapped to the tensor

It generates 4 .json files:

  1. <encoding1 file name>_param.json: comparison of params in encoding file1 against the params in encoding file2

  2. <encoding1 file name>_activation.json: comparison of activations in encoding file1 against the activations in encoding file2

  3. <encoding2 file name>_param.json: comparison of params in encoding file2 against the params in encoding file1

  4. <encoding2 file name>_activation.json: comparison of activations in encoding file2 against the activations in encoding file1

  1. Supergroup Info .json files:

When for any of the encoding_config quantized_dlc_path along with framework_model_path is provided, the tool dumps supergroup mapping. For example, if providing quantized_dlc_path for encoding_config1 as well as framework_model_path, then map each activation tensor in encoding_config2 to a supergroup in the dlc file belonging to encoding_config1.

A sample mapping is shown below:

../_static/resources/supergroup_mapping.png

Keys in the .json file are the activation name in encoding_config2 and mapping represent a supergroup’s info (inputs, outputs, and tensors) in in the dlc file belonging to encoding_config1. When quantized_dlc along with framework model is provided for both of the encoding_config, it generates 2 such supergroup mappings.

Tensor visualizer

Tensor visualizer compares given reference output and target output tensors and plots various statistics to represent differences between them.

The Tensor visualizer feature can:

  1. Plot histograms for golden and target tensors

  2. Plot a graph indicating deviation between golden and target tensors

  3. Plot a cumulative distribution graph (CDF) for golden vs. target tensors

Note

Only data with matching target/golden filenames is inspected; other data is ignored.
This feature expects the golden and target tensors to have the same dimensions, datatypes, and layouts.

Usage

usage: qairt-accuracy-debugger tensor_visualizer [-h] --target_tensors TARGET_TENSORS --golden_tensors GOLDEN_TENSORS [-dt DATA_TYPE] [-wd WORKING_DIRECTORY] [--log_level {info,debug,warning,error}]

options:
  -h, --help            show this help message and exit

required arguments:
  --target_tensors TARGET_TENSORS
                        Directory path to Target tensor files
  --golden_tensors GOLDEN_TENSORS
                        Directory path to Golden tensor files

optional arguments:
  -dt DATA_TYPE, --data_type DATA_TYPE
                        Data type to load the tensor file in. Default: float32
  -wd WORKING_DIRECTORY, --working_directory WORKING_DIRECTORY
                        Path to output directory. Default: tensor_visualizer_output_dir
  --log_level {info,debug,warning,error}
                        Log level. Default is info

Sample Commands

# Basic run
qairt-accuracy-debugger tensor_visualizer \
                         --golden_tensors golden_tensors_dir \
                         --target_tensors target_tensors_dir \

Tip

A working_directory is generated from wherever this script is called from unless otherwise specified.

Outputs

Once the Tensor Inspection has finished running, it will store the outputs in the specified working directory. It creates a output directory with timestamp of the format YYYY-MM-DD_HH:mm:ss in working_directory/tensor_visualizer.

Below is the output directory structure:

working_directory
├── tensor_visualizer
│   └── 2025-07-07_08-18-24
│       ├── mobilenetv20_features_batchnorm0_fwd
│           ├── CDF_plots.jpeg
│           ├── Diff_plots.jpeg
│           └── Histograms.jpeg

The following details what each file contains.

  • Each tensor will have its own directory; the directory name matches the tensor name.

    • Histograms.html – Golden and target histograms

    • CDF_plots.html – Golden vs. target CDF graph

    • Diff_plots.html – Golden and target deviation graph

Histogram Plots

  1. Comparison: We compare histograms for both the golden data and the target data.

  2. Overlay: To enhance clarity, we overlay the histograms bin by bin.

  3. Binned Ranges: Each bin represents a value range, showing the frequency of occurrence.

  4. Visual Insight: Overlapping histograms reveal differences or similarities between the datasets.

Cumulative Distribution Function (CDF) Plots

  1. Overview: CDF plots display the cumulative probability distribution.

  2. Overlay: We superimpose CDF plots for golden and target data.

  3. Percentiles: These plots illustrate data distribution across different percentiles.

Tensor Difference Plots

  1. Inspection: We generate plots highlighting differences between golden and target data tensors.

  2. Scatter and Line: Scatter plots represent tensor values, while line plots show differences at each index.

Snooping

Snooping algorithms help in finding inaccuracies in a neural-network at the layer level. The following snooping options are available:

  1. oneshot-layerwise

  2. cumulative-layerwise

  3. layerwise

Usage

usage: qairt-accuracy-debugger snooping [-h] [--algorithm {oneshot,layerwise,cumulative_layerwise}] [--desired_input_shape DESIRED_INPUT_SHAPE [DESIRED_INPUT_SHAPE ...]] [--converter_float_bitwidth {32,16}] [--float_bias_bitwidth {32,16}]
                                  [--quantization_overrides QUANTIZATION_OVERRIDES] [--onnx_define_symbol SYMBOL VALUE] [--onnx_defer_loading] [--enable_framework_trace] [--calibration_input_list CALIBRATION_INPUT_LIST] [--bias_bitwidth {8,32}] [--act_bitwidth {8,16}]
                                  [--weights_bitwidth {8,4}] [--quantizer_float_bitwidth {32,16}] [--act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}] [--param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}]
                                  [--act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}] [--param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}] [--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE] [--use_per_channel_quantization]
                                  [--use_per_row_quantization] [--float_fallback] [--quantization_algorithms QUANTIZATION_ALGORITHMS [QUANTIZATION_ALGORITHMS ...]] [--restrict_quantization_steps RESTRICT_QUANTIZATION_STEPS] [--dump_encodings_json] [--ignore_encodings]
                                  [--perf_profile {low_balanced,balanced,default,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}] [--profiling_level PROFILING_LEVEL]
                                  [--netrun_backend_extension_config NETRUN_BACKEND_EXTENSION_CONFIG] [--offline_prepare_backend_extension_config OFFLINE_PREPARE_BACKEND_EXTENSION_CONFIG] [--device_id DEVICE_ID] [--soc_model SOC_MODEL] -m INPUT_MODEL --input_sample INPUT_SAMPLE
                                  [INPUT_SAMPLE ...] [--working_directory WORKING_DIRECTORY] [-o OUTPUT_TENSOR] [--log_level {info,debug,warning,error}] --backend {HTP,CPU,GPU} --platform {aarch64-android,x86_64-linux-clang,wos} [--golden_reference GOLDEN_REFERENCE]
                                  [--is_qnn_golden_reference] [--retain_compilation_artifacts]
                                  [--comparator {l1_norm,l2_norm,average,cosine,standard_deviation,mse,snr,kl_divergence,rtol_atol,l1_error,mse_rel,topk,adjusted_rtol_atol,mae} [{l1_norm,l2_norm,average,cosine,standard_deviation,mse,snr,kl_divergence,rtol_atol,l1_error,mse_rel,topk,adjusted_rtol_atol,mae} ...]]
                                  [--offline_prepare]
                                  [--debug_subgraph_inputs DEBUG_SUBGRAPH_INPUTS]
                                  [--debug_subgraph_outputs DEBUG_SUBGRAPH_OUTPUTS]

options:
  -h, --help            show this help message and exit

required arguments:
  -m INPUT_MODEL, --input_model INPUT_MODEL
                        path to the model file
  --input_sample INPUT_SAMPLE [INPUT_SAMPLE ...]
                        Path to text file containing input sample. Refer to qnn-net-run input_list for format of input_sample file.
  --backend {HTP,CPU,GPU}
                        Backend type for inference to be run
  --platform {aarch64-android,x86_64-linux-clang,wos}
                        The type of device platform to be used for inference

optional arguments:
  --algorithm {oneshot,layerwise,cumulative_layerwise}
                        Algorithm to use to debug the model.
  --device_id DEVICE_ID
                        The serial number of the device to use. If not available, the first in a list of queried devices will be used for inference.
  --soc_model SOC_MODEL
                        Option to specify the SOC on which the model needs to run. This can be found from SOC info of the device and it starts with strings such as SDM, SM, QCS, IPQ, SA, QC, SC, SXR, SSG, STP, or QRB.
  --working_directory WORKING_DIRECTORY
                        Path to working directory. If not specified a directory with name working_directory will be created in the current directory.
  -o OUTPUT_TENSOR, --output_tensor OUTPUT_TENSOR
                        Name of the graph's specified output tensor(s).
  --log_level {info,debug,warning,error}
                        Log level. Default is info
  --golden_reference GOLDEN_REFERENCE
                        The path of directory where golden reference tensor files are saved.
  --is_qnn_golden_reference
                        Specifies that outputs passed with --golden_reference are dumped by QNN. This option should be used only when --golden_reference is supplied.
  --retain_compilation_artifacts
                        Flag to retain compilation artifacts.
  --comparator {l1_norm,l2_norm,average,cosine,standard_deviation,mse,snr,kl_divergence,rtol_atol,l1_error,mse_rel,topk,adjusted_rtol_atol,mae} [{l1_norm,l2_norm,average,cosine,standard_deviation,mse,snr,kl_divergence,rtol_atol,l1_error,mse_rel,topk,adjusted_rtol_atol,mae} ...]
                        Comparator to use to compare tensors. For multiple comparators, specify as follows: --comparator mse std
  --offline_prepare     Boolean to indicate offline preapre of the graph
  --debug_subgraph_inputs DEBUG_SUBGRAPH_INPUTS
                        Provide comma-separated inputs for the subgraph to be debugged. Currently, only layerwise and cumulative algorithms are supported.
  --debug_subgraph_outputs DEBUG_SUBGRAPH_OUTPUTS
                        Provide comma-separated outputs for the subgraph to be debugged. Currently, only layerwise and cumulative algorithms are supported.

converter arguments:
  --desired_input_shape DESIRED_INPUT_SHAPE [DESIRED_INPUT_SHAPE ...], --input_tensor DESIRED_INPUT_SHAPE [DESIRED_INPUT_SHAPE ...]
                        The name,dimension,datatype and layout of all the input buffers to the network specified in the format [input_name comma-separated-dimensions data-type layout]. Dimension, datatype and layout are optional.for example: 'data' 1,224,224,3. Note that the
                        quotes should always be included in order to handle special characters, spaces, etc. For multiple inputs, specify multiple --desired_input_shape on the command line like: --desired_input_shape "data1" 1,224,224,3 float32 --desired_input_shape "data2"
                        1,50,100,3 int64
  --converter_float_bitwidth {32,16}
                        Use this option to convert the graph to the specified float bitwidth, either 32 (default) or 16.
  --float_bias_bitwidth {32,16}
                        Option to select the bitwidth to use for float bias tensor, either 32(default) or 16
  --quantization_overrides QUANTIZATION_OVERRIDES
                        Path to quantization overrides json file.
  --onnx_define_symbol SYMBOL VALUE
                        Option to override specific input dimension symbols.
  --onnx_defer_loading  Option to have the model not load weights. If False, the model will be loaded eagerly.
  --enable_framework_trace
                        Use this option to enable converter to trace the o/p tensor change information.

quantizer_arguments:
  --calibration_input_list CALIBRATION_INPUT_LIST
                        Path to the inputs list text file to run quantization(used with qairt-quantizer)
  --bias_bitwidth {8,32}
                        Option to select the bitwidth to use when quantizing the bias. default 8
  --act_bitwidth {8,16}
                        Option to select the bitwidth to use when quantizing the activations. default 8
  --weights_bitwidth {8,4}
                        Option to select the bitwidth to use when quantizing the weights. default 8
  --quantizer_float_bitwidth {32,16}
                        Use this option to select the bitwidth to use for float tensors, either 32 (default) or 16.
  --act_quantizer_calibration {min-max,sqnr,entropy,mse,percentile.}
                        Specify which quantization calibration method to use for activations. Supported values: min-max (default), sqnr, entropy, mse, percentile. This option can be paired with --act_quantizer_schema to override the quantization schema to use for activations,
                        otherwise the default schema (asymmetric) will be used.
  --param_quantizer_calibration {min-max,sqnr,entropy,mse,percentile}
                        Specify which quantization calibration method to use for parameters. Supported values: min-max (default), sqnr, entropy, mse, percentile. This option can be paired with --act_quantizer_schema to override the quantization schema to use for activations,
                        otherwise the default schema (asymmetric) will be used.
  --act_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}
                        Specify which quantization schema to use for activations. Note: Default is asymmetric.
  --param_quantizer_schema {asymmetric,symmetric,unsignedsymmetric}
                        Specify which quantization schema to use for parameters. Note: Default is asymmetric.
  --percentile_calibration_value PERCENTILE_CALIBRATION_VALUE
                        Value must lie between 90 and 100. Default is 99.99
  --use_per_channel_quantization
                        Use per-channel quantization for convolution-based op weights. Note: This will replace built-in model QAT encodings when used for a given weight.
  --use_per_row_quantization
                        Use this option to enable rowwise quantization of Matmul and FullyConnected ops.
  --float_fallback      Use this option to enable fallback to floating point (FP) instead of fixed point. This option can be paired with --quantizer_float_bitwidth to indicate the bitwidth for FP (by default 32). If this option is enabled, then input list must not be provided and
                        --ignore_encodings must not be provided. The external quantization encodings (encoding file/FakeQuant encodings) might be missing quantization parameters for some interim tensors. First it will try to fill the gaps by propagating across math-invariant
                        functions. If the quantization parameters are still missing, then it will apply fallback to nodes to floating point.
  --quantization_algorithms QUANTIZATION_ALGORITHMS [QUANTIZATION_ALGORITHMS ...]
                        Use this option to select quantization algorithms. Usage is: --quantization_algorithms <algo_name1> ...
  --restrict_quantization_steps RESTRICT_QUANTIZATION_STEPS
                        Specifies the number of steps to use for computingquantization encodings E.g.--restrict_quantization_steps "-0x80 0x7F" indicates an example 8 bit range,
  --dump_encodings_json
                        Dump encoding of all the tensors in a json file
  --ignore_encodings    Use only quantizer generated encodings, ignoring any user or model provided encodings.

netrun arguments:
  --perf_profile {low_balanced,balanced,default,high_performance,sustained_high_performance,burst,low_power_saver,power_saver,high_power_saver,extreme_power_saver,system_settings}
                        Specifies performance profile to set. Valid settings are "low_balanced" , "balanced" , "default", high_performance" ,"sustained_high_performance", "burst", "low_power_saver", "power_saver", "high_power_saver", "extreme_power_saver", and "system_settings". Note:
                        perf_profile argument is now deprecated for HTP backend. User can specify performance profile through backend extension config now.
  --profiling_level PROFILING_LEVEL
                        Enables profiling and sets its level. For QNN executor, valid settings are "basic", "detailed" and "client" Default is detailed.
  --netrun_backend_extension_config NETRUN_BACKEND_EXTENSION_CONFIG
                        Path to config to be used with qnn-net-run

offline prepare arguments:
  --offline_prepare_backend_extension_config OFFLINE_PREPARE_BACKEND_EXTENSION_CONFIG
                        Path to config to be used with qnn-context-binary-generator.

oneshot-layerwise Snooping

This algorithm is designed to debug all layers of the model at a time by performing below steps:

  1. Execute framework runner to collect reference outputs from all intermediate tensors of a model in fp32 precision

  2. Execute inference engine to collect target outputs from all intermediate tensors of a model in provided target precision

  3. Execute verification for comparison of intermediate outputs from the above two steps

This algorithm can be used to get quick analysis to check if layers in the model are quantization sensitive.

../_static/resources/oneshot_diagram.png

Sample Commands

# Example for executing oneshot algorithm on a Android HTP device hosted on a Linux machine:
qairt-accuracy-debugger snooping \
                        --algorithm oneshot \
                        --backend htp \
                        --platform aarch64-android \
                        --input_model artifacts/mobilenet-v2.onnx \
                        --input_sample input_sample.txt \
                        --comparator mse \
                        --quantization_overrides artifacts/quantized_encoding.json

# Example for executing oneshot snooping on a WoS HTP target:
qairt-accuracy-debugger snooping ^
                        --algorithm oneshot ^
                        --backend htp ^
                        --platform wos ^
                        --input_model artifacts/mobilenet-v2.onnx ^
                        --input_sample input_sample.txt ^
                        --comparator mse ^
                        --calibration_input_list calib_list.txt

# Example for using external golden outputs dumped by any frameworks like ONNX:
qairt-accuracy-debugger snooping \
                        --algorithm oneshot \
                        --backend htp \
                        --platform aarch64-android \
                        --input_model artifacts/mobilenet-v2.onnx \
                        --input_sample input_sample.txt \
                        --comparator mse \
                        --quantization_overrides artifacts/quantized_encoding.json \
                        --golden_reference /path/to/goldens

# Example for using external golden outputs dumped by QNN:
qairt-accuracy-debugger snooping \
                        --algorithm oneshot \
                        --backend htp \
                        --platform aarch64-android \
                        --input_model artifacts/mobilenet-v2.onnx \
                        --input_sample input_sample.txt \
                        --comparator mse \
                        --quantization_overrides artifacts/quantized_encoding.json \
                        --golden_reference /path/to/goldens \
                        --is_qnn_golden_reference

Tip

Refer to inference-engine sample commands to understand usage of different platforms/backends

Output

Below is the output directory structure:

working_directory
└── oneshot_snooping
    ├── 2025-07-02_11-02-58
    │   ├── inference_engine
    │   ├── oneshot_layerwise.csv
    │   ├── plots
    │   └── reference_output
Once oneshot snooping is completed, a timestamped directory is generated under working_directory/oneshot_snooping containing :
  • inference_engine directory contains intermediate layer outputs generated by QNN stored in .raw format.

  • reference_output directory contains intermediate layer outputs generated by framework stored in .raw format.

  • oneshot_layerwise.csv, report for verification results of each layer output.

  • plots directory containing html plots of verification results of each layer output.

Snapshot of summary.csv file:

../_static/resources/oneshot_summary.png

Understanding the oneshot-layerwise summary report:

Column

Description

Source Name

Output name of the current layer in the framework graph.

Target Name

Output name of the current layer in the target graph.

Layer type

Type of current layer

Shape

Shape of this layer’s output

<Verifier name>

Verifier value of the current layer output compared to reference output

cumulative-layerwise Snooping

This algorithm is designed to debug one layer at a time by performing below steps:

  1. Execute framework runner to collect reference outputs from all intermediate tensors of a model in fp32 precision

  2. Execute inference engine and verification steps in iterative manner to perform below operations

    - Collect target outputs in target precision for each layer while removing the effect of its preceding layers on final output     - Compare intermediate outputs from framework runner and inference engine

It provides deeper analysis to identify sensitivity of layers of model causing accuracy deviation and can be used to measure quantization sensitivity of each layer/op in the model with regard to the final output of the model.

../_static/resources/cumulative_diagram.png

Debugging Accuracy issue with Quantized model using Cumulative Layerwise Snooping

  • With quantized models, it is expected to have some mismatch at most data intensive layers - arising due to quantization error.

  • The debugger can be used to identify operators which are most sensitive with high verifier score and run those at higher precision to improve overall accuracy.

  • The sensitivity is determined by the verifier score seen at that layer regarding the reference platform (like ONNXRT).

  • Note that Cumulative-layerwise debugging takes considerable time as the partitioned model shall be quantized and compiled at every layer that does not have a 100% match with reference.

  • Below is one strategy to debug larger models:

    • Run Oneshot-layerwise on the model which helps to identify the starting point of sensitivity in the model.

    • Run Cumulative-layerwise at different parts of the model using start-layer and end-layer options (if the model has 100 nodes, use start layer at starting node from Oneshot-layerwise run and end layer at the 25th node for run 1, start layer at 26th and end layer at 50th node for run 2, start layer at 51st node and end layer at 75th node for run 3 .. and so on).The final reports of all runs help to identify the most sensitive layers in the model. Let’s say node A,B,C have high verifier scores which indicates high sensitivity

      • Run the original model with those specific layers (A/B/C - one at a time or combinations) in FP16 and observe the improvement in accuracy.

Sample Commands

# Example for executing cumulative-layerwise on HTP Android device hosted on a Linux machine:
qairt-accuracy-debugger snooping
          --algorithm cumulative_layerwise \
          --backend htp \
          --platform aarch64-android \
          --input_model artifacts/mobilenet-v2.onnx \
          --calibration_input_list artifacts/list.txt \
          --input_sample input_sample.txt\
          --output_tensor "473" \
          --comparator mse \
          --quantization_overrides artifacts/quantized_encoding.json \

# Example for executing cumulative-layerwise snooping on a WoS HTP target:
qairt-accuracy-debugger snooping ^
          --algorithm cumulative_layerwise ^
          --backend htp ^
          --platform wos ^
          --input_model artifacts/mobilenet-v2.onnx ^
          --input_sample input_sample.txt ^
          --comparator mse ^
          --calibration_input_list calib_list.txt

# Example for using external golden outputs dumped by frameworks like ONNX:
qairt-accuracy-debugger snooping
          --algorithm cumulative_layerwise \
          --backend htp \
          --platform aarch64-android \
          --input_model artifacts/mobilenet-v2.onnx \
          --calibration_input_list artifacts/list.txt \
          --input_sample input_sample.txt \
          --output_tensor "473" \
          --comparator mse \
          --quantization_overrides artifacts/quantized_encoding.json \
          --golden_reference /path/to/goldens

# Example for using external golden outputs dumped by QNN:
qairt-accuracy-debugger snooping \
          --algorithm cumulative_layerwise \
          --backend htp \
          --platform aarch64-android \
          --input_model artifacts/mobilenet-v2.onnx \
          --calibration_input_list artifacts/list.txt \
          --input_sample input_sample.txt\
          --output_tensor "473" \
          --comparator mse \
          --quantization_overrides artifacts/quantized_encoding.json \
          --golden_reference /path/to/goldens \
          --is_qnn_golden_reference

Tip

Refer to inference-engine sample commands to understand usage of different platforms/backends

Output

Below is the output directory structure:

working_directory
└── cumulative_layerwise_snooping
     └── 2025-07-07_06-00-17
         ├── all_subgraphs.json
         ├── cumulative_layerwise.csv
         ├── encodings_converter
         ├── inference_engine
         ├── plots
         ├── reference_output
         └── sub_graph_node_precision_files
  • inference_engine directory contains intemediate outputs obtained from inference engine step stored in separate directories with respective layer names. Also it contains final report named cumulative_layerwise.csv which contains verifier scores for each layer. User can identify layers with most deviating scores as problematic nodes.

  • reference_output directory contains timestamped directory that contains the intermediate layers outputs stored in .raw format just as mentioned in Framework Runner step.

  • plots directory containing html plots of verification results of each layer output.

Snapshot of cumulative_layerwise.csv:

../_static/resources/cumulative_layerwise_report.png

Understanding the cumulative-layerwise report:

Column

Description

Source Name

Output name of the current layer in the framework graph.

Target Name

Output name of the current layer in the target graph.

Status

There are following possible values:
  • SKIP - This layer was not debugged as it was either MATH_INVARIENT or binary op with one constat tensor.

  • SUCCESS - Layer debugging was done successfully.

  • CONVERTER_FAILURE - If converter is failed at this layer.

  • QUANTIZER_FAILURE - If quantizer is failed at this layer.

  • SNPE_DLC_GRAPH_PREPARE_FAILURE - snpe-dlc-graph-prepare error occurred at this layer.

  • QNN_CONTEXT_BINARY_GENERATOR_FAILURE - context-bin-generator error occurred at this layer.

  • SNPE_NET_RUN_FAILURE - snpe-net-run failure occured at this layer.

  • QNN_NET_RUN_FAILURE - qnn-net-run failure occured at this layer.

Layer Type

Type of the current layer.

Framework Shape

Shape of this framework layer’s output.

Target Shape

Shape of this target layer’s output.

Framework(Min, Max, Median)

The Min, Max and Median of the outputs at this layer taken from reference execution.

Target(Min, Max, Median)

The Min, Max and Median of the outputs at this layer taken from target execution.

<Verifier name>(current_layer)

Absolute verifier value of the current layer compared to reference platform.

<Verifier name>(original model output name)

For each original model output, absolute verifier value of the original model output compared to reference platform.

Info

Displays information for the output verifiers, if the values are abnormal.

layerwise Snooping

This algorithm is designed to debug a single layer model at a time by performing the following steps:

  1. Get golden reference per layer outputs from an external tool or, if a golden reference is not given, run framework runner to collect reference outputs from all intermediate tensors of a model in fp32 precision

  2. Iteratively execute inference engine and verification to: - Collect target outputs in target precision for the layer under investigation and final model output by quantizing the specific subgraph and running rest of the model in floating point - Compare intermediate output from golden reference with target execution

Layer-wise snooping provides deeper analysis to identify all model layers causing accuracy deviation on hardware with respect to framework/simulation outputs. This algorithm can be used to identify kernel issues for layers/ops present in the model and for sensitivity analysis.

../_static/resources/layerwise_diagram.png

Note

Currently this algorithm is supported only for ONNX and TENSORFLOW models

Debugging Accuracy issue for models exhibiting Accuracy discrepancy between golden reference (for ex. - AIMET/framework runtime output) vs target output using Layerwise Snooping

  • One of the popular usecase for layerwise snooping is debugging accuracy difference between AIMET vs target
    • Though we are creating an exact simulation of hardware using tools like AIMET, still it is expected to have a very minute mismatch due to environment differences. This can be because simulation executes on GPU FP32 kernels and is simulating noise rather than actual execution on integer kernels in the case of hardware execution.

    • If we have a higher deviation between simulation and hardware, then layerwise snooping could be used to point out to the nodes having higher deviations. The nodes showing higher deviation as per layerwise.csv can be identified as the erroneous nodes.

  • Other usecases include debugging Framework runtime’s FP32 output vs target INT16 output deviations.

Sample Commands

# Example for executing cumulative-layerwise on a HTP Android device hosted on a Linux machine:
qairt-accuracy-debugger snooping
          --algorithm layerwise \
          --backend htp \
          --platform aarch64-android \
          --input_model artifacts/mobilenet-v2.onnx \
          --calibration_input_list artifacts/list.txt \
          --input_sample input_sample.txt \
          --output_tensor "473" \
          --comparator mse \
          --quantization_overrides artifacts/quantized_encoding.json \

# Example for executing layerwise snooping on a WoS HTP target:
qairt-accuracy-debugger snooping ^
          --algorithm layerwise ^
          --backend htp ^
          --platform wos ^
          --input_model artifacts/mobilenet-v2.onnx ^
          --input_sample input_sample.txt ^
          --comparator mse ^
          --calibration_input_list calib_list.txt

# Example for using external golden outputs dumped by any frameworks like ONNX, TF:
qairt-accuracy-debugger snooping \
          --algorithm layerwise \
          --backend htp \
          --platform aarch64-android \
          --input_model artifacts/mobilenet-v2.onnx \
          --calibration_input_list artifacts/list.txt \
          --input_sample input_sample.txt \
          --output_tensor "473" \
          --comparator mse \
          --quantization_overrides artifacts/quantized_encoding.json \
          --golden_reference /path/to/goldens

# Example for using external golden outputs dumped by QNN:
qairt-accuracy-debugger snooping \
          --algorithm layerwise \
          --backend htp \
          --platform aarch64-android \
          --input_model artifacts/mobilenet-v2.onnx \
          --calibration_input_list artifacts/list.txt \
          --input_sample input_sample.txt \
          --output_tensor "473" \
          --comparator mse \
          --quantization_overrides artifacts/quantized_encoding.json \
          --golden_reference /path/to/goldens \
          --is_qnn_golden_reference

Tip

Refer to inference-engine sample commands to understand usage of different runtimes/backends

Output

Below is the output directory structure:

working_directory
└── layerwise_snooping
    └──2025-07-07_05-58-26
       ├── all_subgraphs.json
       ├── encodings_converter
       ├── inference_engine
       ├── layerwise.csv
       ├── plots
       ├── reference_output
       └── sub_graph_node_precision_files
  • framework_runner directory contains timestamped directory that contains the intermediate layers outputs stored in .raw format just as mentioned in Framework Runner step.

  • snooping contains each single layer model outputs obtained from the inference engine stage stored in separate directories and the final report named layerwise.csv which contains verifier scores for each layer model. Users can identify layers with the most deviating scores as problematic nodes.

  • layerwise.csv is similar to the cumulative-layerwise report (cumulative_layerwise.csv), except that original outputs column will not be present in layerwise snooping. Please refer to cumulative-layerwise report for more details.

  • plots directory containing html plots of verification results of each layer output.

Snapshot of layerwise.csv:

../_static/resources/layerwise_report.png

Understanding the layerwise report:

Column

Description

Source Name

Output name of the current layer in the framework graph.

Target Name

Output name of the current layer in the target graph.

Status

There are following possible values:
  • SKIP - This layer was not debugged as it was either MATH_INVARIENT or binary op with one constat tensor.

  • SUCCESS - Layer debugging was done successfully.

  • CONVERTER_FAILURE - If converter is failed at this layer.

  • QUANTIZER_FAILURE - If quantizer is failed at this layer.

  • SNPE_DLC_GRAPH_PREPARE_FAILURE - snpe-dlc-graph-prepare error occurred at this layer.

  • QNN_CONTEXT_BINARY_GENERATOR_FAILURE - context-bin-generator error occurred at this layer.

  • SNPE_NET_RUN_FAILURE - snpe-net-run failure occured at this layer.

  • QNN_NET_RUN_FAILURE - qnn-net-run failure occured at this layer.

Layer Type

Type of the current layer.

Framework Shape

Shape of this framework layer’s output.

Target Shape

Shape of this target layer’s output.

Framework(Min, Max, Median)

The Min, Max and Median of the outputs at this layer taken from reference execution.

Target(Min, Max, Median)

The Min, Max and Median of the outputs at this layer taken from target execution.

<Verifier name>(current_layer)

Absolute verifier value of the current layer compared to reference platform.

<Verifier name>(original model output name)

For each original model output, absolute verifier value of the original model output compared to reference platform.

Info

Displays information for the output verifiers, if the values are abnormal.

qnn-platform-validator

qnn-platform-validator checks the QNN compatibility/capability of a device. The output is saved in a CSV file in the “output” directory, in a csv format. Basic logs are also displayed on the console.

DESCRIPTION:
------------
Helper script to set up the environment for and launch the qnn-platform-
validator executable.

REQUIRED ARGUMENTS:
-------------------
--backend            <BACKEND>          Specify the backend to validate: <gpu>, <dsp>
                                        <all>.

--directory          <DIR>              Path to the root of the unpacked SDK directory containing
                                        the executable and library files

--dsp_type           <DSP_VERSION>      Specify DSP variant: v66 or v68

OPTIONALS ARGUMENTS:
--------------------
--buildVariant       <TOOLCHAIN>        Specify the build variant
                                        aarch64-android or aarch64-windows-msvc to be validated.
                                        Default: aarch64-android

--testBackend                           Runs a small program on the runtime and Checks if QNN is supported for
                                        backend.

--deviceId           <DEVICE_ID>        Uses the device for running the adb command.
                                        Defaults to first device in the adb devices list..

--coreVersion                           Outputs the version of the runtime that is present on the target.

--libVersion                            Outputs the library version of the runtime that is present on the target.

--targetPath          <DIR>             The path to be used on the device.
                                        Defaults to /data/local/tmp/platformValidator

--remoteHost         <REMOTEHOST>       Run on remote host through remote adb server.
                                        Defaults to localhost.

--debug                                 Set to turn on Debug log
Additional details:
  • The following files need to be pushed to the device for the DSP to pass validator test.
    Note that the stub and skel libraries are specific to the DSP architecture version(e.g., v73):
    // Android
    bin/aarch64-android/qnn-platform-validator
    lib/aarch64-android/libQnnHtpV73CalculatorStub.so
    lib/hexagon-${DSP_ARCH}/unsigned/libCalculator_skel.so
    
    // Windows
    bin/aarch64-windows-msvc/qnn-platform-validator.exe
    lib/aarch64-windows-msvc/QnnHtpV73CalculatorStub.dll
    lib/hexagon-${DSP_ARCH}/unsigned/libCalculator_skel.so
    
  • The following example pushes the aarch64-android variant to /data/local/tmp/platformValidator

    adb push $SNPE_ROOT/bin/aarch64-android/snpe-platform-validator /data/local/tmp/platformValidator/bin/qnn-platform-validator
    adb push $SNPE_ROOT/lib/aarch64-android/ /data/local/tmp/platformValidator/lib
    adb push $SNPE_ROOT/lib/dsp /data/local/tmp/platformValidator/dsp
    

qnn-profile-viewer

The qnn-profile-viewer tool is used to parse profiling data that is generated using qnn-net-run. Additionally, the same data can be saved to a csv file.

usage: qnn-profile-viewer --input_log PROFILING_LOG [--help] [--output=CSV_FILE] [--extract_opaque_objects] [--reader=CUSTOM_READER_SHARED_LIB] [--schematic=SCHEMATIC_BINARY] [--standardized_json_output]

Reads profiling logs and outputs the contents to stdout

Note: The IPS calculation takes the following into account: graph execute time, tensor file IO time, and misc. time for quantization, callbacks, etc.

required arguments:
  --input_log                     PROFILING_LOG1,PROFILING_LOG2
                                  Provides a comma-separated list of Profiling log files

optional arguments:
  --output                        PATH
                                  Output file with processed profiling data. File formats vary depending upon the reader used
                                  (see --reader). If not provided, not output is created.

  --help                          Displays this help message.

  --reader                        CUSTOM_READER_SHARED_LIB
                                  Path to a reader library. If not specified, the default reader outputs a CSV file.

  --schematic                     SCHEMATIC_BINARY
                                  Path to the schematic binary file.
                                  Please note that this option is specific to the QnnHtpOptraceProfilingReader library.

  --config                        CONFIG_JSON_FILE
                                  Path to the config json file.
                                  Please note that this option is specific to the QnnHtpOptraceProfilingReader library.

  --dlc                           DLC_FILE
                                  Path to the dlc file.
                                  Please note that this option is specific to the QnnHtpOptraceProfilingReader library.

  --zoom_start                    PROFILE_SUBMODULE_START_NODE
                                  Name of starting node for a profile submodule optrace. If you specify this option you must also specify --zoom_end.
                                  Please note that this option is specific to the QnnHtpOptraceProfilingReader library.

  --zoom_end                      PROFILE_SUBMODULE_END_NODE
                                  Name of ending node for a profile submodule optrace. If you specify this option you must also specify --zoom_start.
                                  Please note that this option is specific to the QnnHtpOptraceProfilingReader library.

  --version                       Displays version information.

  --extract_opaque_objects        Specifies that the opaque objects will be dumped to output files

  --standardized_json_output      Specifies that the JSON output will be standardized for consumption by other tools within the SDK ecosystem.
                                  Please note that this option is specific to the QnnJsonProfilingReader library.

Warning

qnn-netron Deprecation Notice: qnn-netron has been deprecated and will be removed in 2.40.

qnn-netron (Beta)

Overview

QNN Netron tool is making model debugging and visualization less daunting. qnn-netron is an extension of the netron graph tool. It provides for easier graph debugging and convenient runtime information. There are currently two key functionalities of the tool:

  1. The Visualize section allows customers to view their desired models after using the QNN Converter by importing the JSON representation of the model

  2. The Diff section allows customers to run networks of their choosing on different runtimes in order to compare network accuracy and performance

Launching Tool

Dependencies

The QNN netron tool leverages electron JS framework for building GUI frontend and depends on npm/node_js to be available in system. Additionally, python libraries for accuracy analysis are required by backend of tool. A convenient script is available in the QNN SDK to download necessary dependencies for building and running the tool.

# Note: following command should be run as administrator/root to be able to install system libraries
$ sudo bash ${QNN_SDK_ROOT}/bin/check-linux-dependency.sh
$ ${QNN_SDK_ROOT}/bin/check-python-dependency

Launching Application

qnn-netron script is used to be able to launch the QNN Netron application. This script:

  1. Clones vanilla netron git project

  2. Applies custom patches for enabling Netron for QNN

  3. Build the npm project

  4. Launches application

$ qnn-netron -h
usage: qnn-netron [-h] [-w <working_dir>]
Script to build and launch QNN Netron tool for visualizing and running analysis on Qnn Models.

Optional argument(s):
 -w <working_dir>                      Location for building QNN Netron tool. Default: current_dir


# To build and run application use
$ qnn-netron -w <my_working_dir>

QNN Netron Visualize Deep Dive

First, the user is prompted to open a JSON file that represents their converted model. This JSON comes from the converter tool. Please refer to this Overview for more details.

../_static/resources/landing_page_netron.jpg

Once the file is loaded into the tool, the graph should be displayed in the UI as shown below:

After loading in the model, the user can click on any of the nodes and a side pop-up section will display node information such as the type and name as well as vital parameter information such as inputs and outputs (datatypes, encodings, and shapes)

../_static/resources/netron_detailed_nodes_visualization.jpg

Netron Diff Customization Deep Dive

Limitations

  1. Diff Tool comparison between source framework goldens only works for framework goldens that are spatial first axis order. (NHWC)

  2. For usecases where source framework golden is used for comparison, Diff Tool is only tested to work for tensorflow and tensorflow variant frameworks.

In order for the user to open the Diff Customization tool, they can either click file and then “Open Diff…” or on tool startup by clicking “Diff…” as shown below:

../_static/resources/netron_diff_ui_opening.jpg
../_static/resources/open_diff_tool_netron.png

Upon launch of the Diff Customization tool, at the top, the user is prompted to select a use case for the tool. There are 3 options to choose from:

../_static/resources/use_case_netron.png

For the purposes of this documentation, only inference vs inference will be detailed. The setup procedure for the other use cases is similar. The other two use cases are explained below:

  1. Golden vs Inference: Used to test inference run using goldens from a particular ML framework and comparing against the output of a QNN backend

  2. Output vs Output: Used to test existing inference results against ML framework goldens OR used to test differences between two existing inference results

  3. Inference Vs Inference: Used to test inference between two converted QNN models or the same QNN model on different QNN backends

Inference vs Inference

If this use case is selected, the user is presented with various form fields for the purposes of running two jobs asynchronously with the option of choosing different runtimes for each QNN network being run.

qnn-netron

A more detailed view of what the user is prompted is displayed below:

qnn-netron

In order to execute the networks, the user has two options:

Running on Host machine

When the Target Device is selected as “host”, the user can only use the CPU as a runtime. In addition, the user can only select “x86_64-linux-clang” as the architecture in this use case.

qnn-netron

Running On-Device

When the Target Device is selected as “on-device”, a Device ID is required to connect to the device via adb. Thereafter, the user can select any of the three QNN backend runtimes available (CPU, GPU, or DSPv[68, 69, 73]) and the user can select architecture “aarch64-android”

qnn-netron

After choosing the desired target device and runtime configurations, the rest of the fields are explained in detail below:


Note

Users are able to click again and change the location to any of the path fields


Setup Parameters

Configurations to Select

The options for what verifier to run on the outputs of the model are (See Note below table for custom verifier (accuracy + performance) thresholds and see table below for providing custom accuracy verifier hyperparameters):

RtolAtol, AdjustedRtolAtol, TopK, MeanIOU, L1Error, CosineSimilarity, MSE, SQNR

Model JSON

upload <model>_net.json file that was outputted from the QNN converters.

Model Cpp

upload <model>.cpp that was outputted from the QNN converters.

Model Bin

upload <model>.bin that was outputted from the QNN converters.

NDK Path

upload the path to your Android NDK

Devices Engine Path

upload the path to the top-level of the unzipped qnn-sdk

Input List

provide a path to the input file for the model

Save Run Configurations

provide a location where the inference and runtime results from the Diff customization tool will be stored

Note

Users have the option of providing a custom accuracy and performance verifier threshold when running diff. A custom accuracy verifier threshold can be provided for any of the accuracy verifiers. By default the verifier thresholds are 0.01. The custom thresholds can be provided in the text boxes labelled “Accuracy Threshold” and “Perf Threshold”.

Users now have the option to enter accuracy verifier specific hyperparameters inside textboxes. The Default Values are displayed inside the text-boxes and can be customized as per user needs. The table below highlights the hyperparameters for each verifier that can be customized.

Verifier

Hyperparameters

AdjustedRtolAtol

Number of Levels

RtolAtol

Rtol Margin, Atol Margin

Topk

K, Ordered

MeanIOU

Background Classification

L1Error

Multiplier, Scale

CosineSimilarity

Multiplier, Scale

MSE (Mean Square Error)

N/A

SQNR (Signal-To-Noise Ratio)

N/A

Below is an example of what the fields should look like once filled to completion:

qnn-netron

After running the Diff Customization tool, the output directories/files should be present in the working directory file path provided in the last field

qnn-netron

Results and Outputs:

After pressing the Run button as mentioned above, the visualization of the network should pop-up. Nodes will be highlighted if there are any accuracy and/or performance variations. Clicking on each node will show more information about the accuracy and performance diff information as shown below.

qnn-netron

Performance and Accuracy Diff Visualizations:

qnn-netron

As seen above, the performance and accuracy diff information is shown under the Diff section of any given node. The color of the node boundary in the viewer represents whether a performance or accuracy error (above the default verifier threshold of 0.01) was reported. For example, in the Conv2d node shown below, there are two boundaries of orange and red indicating that this node has both an accuracy and performance difference across the runs. The FullyConnected node shown only has a yellow boundary indicating that only a performance difference was found.

qnn-netron
qnn-netron

QNN Netron Diff Navigation

QNN Netron has the ability to locate the first node in the graph with any performance or accuracy diffs. When the user clicks on the next and previous arrows, the visualization of the graph will zoom into the desired node with the first performance or accuracy difference. This makes model debugging much easier for larger models as the user doesn’t have to look for the nodes themselves to find where the network performance and accuracy errors starts to diverge.

qnn-netron

qnn-context-binary-utility

The qnn-context-binary-utility tool validates and serializes the metadata of context binary into a json file. This json file can then be used for inspecting the context binary aiding in debugging. A QNN context can be serialized to binary using QNN APIs or qnn-context-binary-generator tool.

usage: qnn-context-binary-utility --context_binary CONTEXT_BINARY_FILE --json_file JSON_FILE_NAME [--help] [--version]

Reads a serialized context binary and validates its metadata.
If --json_file is provided, it outputs the metadata to a json file

required arguments:
  --context_binary  CONTEXT_BINARY_FILE
                    Path to cached context binary from which the binary info will be extracted
                    and written to json.

  --json_file       JSON_FILE_NAME
                    Provide path along with the file name <DIR>/<FILE_NAME> to serialize
                    context binary info into json.
                    The directory path must exist. File with the FILE_NAME will be created at DIR.

optional arguments:
  --help          Displays this help message.

  --version       Displays version information.

Additional explanation

Accessing Graph Blob Info V2 Struct

Graph Blob Info V2 struct is present in serialized binary right after V1 struct (in context binaries prepared in QNN SDK 2.37 or later) and it can be accessed like below:

uint8_t* array = static_cast<uint8_t*>(graphBlobInfo);
QnnHtpSystemContext_GraphBlobInfoV2_t* v2 = array + sizeof(QnnHtpSystemContext_GraphBlobInfo_t);
Note: Users must add a check for null pointer before dereferencing V2
Parameters Description

Below is a table representing the meanings of various parameters.

Parameters

Description

nativeKChannelSize

The nativeK channel tile size used by each of the graphs

nativeVChannelSize

The nativeV channel tile size used by each of the graphs

isSafeShareIO

It is safe to share the buffer between inputs and outputs, 1: True, 0: False
Client is responsible for ensuring no clash between input and output when flag is set

graphIOTensorSize

Graph input/output tensors size(bytes)

DDRTensorSize

Size of DDR-tensor(bytes)

OpDataSize

Memory size inlcuding op data like runlists(bytes)

constSize

Size of const data in the graph(bytes)

SharedWeightSize

Shared weights size(bytes)

spillFillBufferSize

The spill-fill buffer size used by each of the graphs

vtcmSize

HTP vtcm size (MB)

optimizationLevel

Optimization level

htpDlbc

Htp Dlbc

numHvxThreads

Number of HVX Threads to reserve

Memory Usage Scenarios

Use Case 1: Single Model Inference

Total RAM = OpDataSize + constSize + DDRTensorSize + spillFillBufferSize + graphIOTensorSize + vtcmSize

Use Case 2: Large Language Model (LLM) with Weight Sharing

Total RAM = (OpDataSize₁ + constSize₁ + DDRTensorSize₁ + spillFillBufferSize₁ +graphIOTensorSize₁ +vtcmSize₁) + Shared Weights + (OpDataSize₂ + constSize₂ + DDRTensorSize₂ + spillFillBufferSize₂ + graphIOTensorSize₂ +vtcmSize₂)

Accuracy Evaluator plugins

File-based plugins

This section lists the built-in file-based plugins.

Dataset plugins

create_squad_examples - Extracts examples from given squad dataset file and save them to a file.

Parameters

Description

Type

Default

squad_version

Squad version 1 or 2

Integer

1

filter_dataset - Filters the dataset including the input list, calibration and annotation files.

Parameters

Description

Type

Default

max_inputs

Maximum number of inputs in inputlist to be considered for execution

Integer

Mandatory

max_calib

Maximum number of inputs in calibration to be considered for execution

Integer

Mandatory

random

Shuffles the inputlist and calibration files

Boolean

False

gpt2_tokenizer - Tokenizes data from files using GPT2TokenizerFast.

Parameters

Description

Type

Default

vocab_file

Path to the vocabulary file

String

Mandatory

merges_file

Path to the merges file

String

Mandatory

seq_length

Sequence length for the generated model inputs

Integer

Mandatory

past_seq_length

Sequence length for the “past” inputs

Integer

Mandatory

past_shape

Shape of the ‘past’ inputs

List

num_past

Number of ‘past’ inputs

Integer

0

split_txt_data - Saves individual text files for each line present in the given input text file.

Preprocessing plugins

centernet_preproc - Performs preprocessing on CenterNet dataset examples.

Parameters

Description

Type

Default

dims

Height and width; comma delimited, e.g., 416,416

String

Mandatory

scale

Scale factor for image

Float

1.0

fix_res

Resolution of the image

Boolean

True

pad

Image padding

Integer

0

convert_nchw - Transposes WHC to CHW or CHW to WHC and adds an extra N dimension.

Parameters

Description

Type

Default

expand-dims

Add the Nth dimension

Boolean

True

create_batch - Concatenates raw input files into a single file using numpy.

Parameters

Description

Type

Default

delete_prior

To delete prior unbatched data to save space

Boolean

True

truncate

If num inputs are not a multiple of batch size, then truncate left over inputs in the last batch or not

Boolean

False

crop - Center crops an image to the given dimensions using numpy or torchvision based on the library parameter.

Parameters

Description

Type

Default

dims

Height and width; comma delimited, e.g., 640,640

String

Mandatory

library

Python library used to crop the given input; valid values are: numpy | torchvision

String

numpy

typecasting_required

To convert final output to numpy or not. Note: This option is specific to torchvision library

Boolean

True

expand_dims - Adds the N dimension for images, e.g., HWC to NHWC.

image_transformers_input - Creates input files with image and/or text for image transformer models like ViT and CLIP.

Parameters

Description

Type

Default

dims

Expected processed output dimension in CHW format

String

Mandatory

num_base_class

Number of base classes in classification; used in the scenario where text input is also provided

Integer

Total classes available

num_prompt

Number of prompts for text classes; used in the scenario where text input is also provided

Integer

Total classes available

image_only

Data type of raw data

Boolean

False

normalize - Normalizes input per the given scheme; data must be of NHWC format.

Parameters

Description

Type

Default

library

Python library used to crop the given input; valid values are: numpy | torchvision

String

numpy

norm

Normalization factor, all values divided by norm

float32

255

means

Dictionary of means to be subtracted, e.g., {“R”:0.485, “G”:0.456, “B”:0.406}

RGB dictionary

{“R”:0, “G”:0, “B”:0}

std

Dictionary of std-dev for rescaling the values, e.g., {“R”:0.229, “G”:0.224, “B”:0.225}

RGB dictionary

{“R”:1, “G”:1, “B”:1}

channel_order

Channel order to specify means and std values per channel - RGB | BGR

String

RGB

normalize_first

To perform normalization before or after mean subtraction and standard deviation.
normalize_first=True means perform normalization before.
Note: torchvision library does not use this option

Boolean

True

typecasting_required

To convert final output to numpy or not. Note: This option is specific to the Torchvision library

Boolean

True

pil_to_tensor_input

To convert input to tensor before normalization. Note: This option is specific to the Torchvision library

Boolean

True

onmt_preprocess - Performs preprocessing on WMT dataset for FasterTransformer OpenNMT model

Parameters

Description

Type

Default

vocab_path

Path to OpenNMT model vocabulary file (pickle file)

String

Mandatory

src_seq_len

The maximum total input sequence length

Integer

128

skip_sentencepiece

Skip sentencepiece encoding

Boolean

True

sentencepiece_model_path

Path to sentencepiece model for WMT dataset (mandatory when “skip_sentencepiece” is False)

String

None

pad - Image padding with constant pad size or based on target dimensions

Parameters

Description

Type

Default

type

Type of padding. Valid options:
  • constant: Add padding of constant sides on 4 sides (pad_size must be provided)

  • target_dims: Add padding based on difference in image size and target size (dims param must be provided)

String

Mandatory

dims

Height and width comma delimited, e.g., 416,416 for ‘target-dims’ type of padding

String

Mandatory

pad_size

Size of padding for ‘constant’ type of padding

Integer

None

img_position

Parameter to specify position of image, either ‘center’ or ‘corner’ (top-left). Padding is added accordingly. Currently used for ‘target_dims’ type padding

String

center

color

Padding value for all planes

Integer

114

resize - Resizes an image using the specified library parameter: cv2(Default), pillow or torchvision

Parameters

Description

Type

Default

dims

Height and width; comma delimited, e.g., 640,640

String

Mandatory

library

Python library to be used for resizing a given input; valid values are: opencv | pillow | torchvision

String

opencv

channel_order

Convert image to specified channel order. At present this parameter only takes the ‘RGB’ value

String

RGB

interp

Interpolation Type. Options:
  • bilinear (supported by opencv, Torchvision, pillow)

  • area (supported by opencv only)

  • nearest (supported by opencv, Torchvision, pillow)

  • bicubic (supported by Torchvision, pillow)

  • box (supported by pillow only)

  • hamming (supported by pillow only)

  • lanczos (supported by pillow only)

String

For opencv and torchvision: bilinear
For pillow: bicubic

type

Type of resize to be done. Note: Torchvision does not use this option. Options:

  • letterbox : Used for YOLO models.

  • imagenet : Scale followed by resize.

  • aspect_ratio : Resize while keeping aspect ratio.

  • None : The default behavior is to auto-resize the image to the target dims.

String

auto-resize

resize_before_typecast

To resize before or after conversion to target datatype e.g., fp32

Boolean

True

typecasting_required

To convert final output to numpy or not. Note: This option is specific to the Torchvision library

Boolean

True

mean

Dictionary of means to be subtracted, e.g., {“R”:0.485, “G”:0.456, “B”:0.406}. Note: This option is specific to the Tensorflow library

RGB dictionary

{“R”:0, “G”:0, “B”:0}

std

Dictionary of std-dev for rescaling the values, e.g., {“R”:0.229, “G”:0.224, “B”:0.225}. Note: This option is specific to the Tensorflow library

RGB dictionary

{“R”:0, “G”:0, “B”:0}

normalize_before_resize

To perform normalization before or after mean subtraction and standard deviation. Note: This option is specific to the Tensorflow library

Boolean

False

crop_before_resize

To perform cropping before resize. Note: This option is specific to the Tensorflow library

Boolean

False

squad_read - Reads the SQuAD dataset JSON file. Preprocesses the question-context pairs into features for language models like BERT-Large

Parameters

Description

Type

Default

vocab_path

Path for local directory containing vocabulary files

String

Mandatory

max_seq_length

The maximum total input sequence length after WordPiece tokenization. Sequences longer than this will be truncated, and sequences shorter than this will be padded

Integer

384

max_query_length

The maximum number of tokens for the question. Questions longer than this will be truncated to this length

Integer

64

doc_stride

When splitting up a long document into chunks, how much stride to take between chunks

Integer

128

packing_strategy

Set this flag when using packing strategy for bert based models

Boolean

False

max_sequence_per_pack

The maximum number of sequences which can be packed together

Integer

3

mask_type

This can take either of three values - ‘None’, ‘Boolean’ or ‘Compressed’ depending on the masking to be done on input_mask

String

None

compressed_mask_length

Set this value if mask_type is set to compressed

Integer

None

Postprocessing plugins

bert_predict - Predicts answers for a SQuAD dataset given start and end logits.

Parameters

Description

Type

Default

vocab_path

Path for a local directory containing vocabulary files

String

Mandatory

max_seq_length

The maximum total input sequence length after WordPiece tokenization. Sequences longer than this will be truncated, and sequences shorter than this will be padded (optional if preprocessing is run)

Integer

384

doc_stride

When splitting up a long document into chunks, how much stride to take between chunks (optional if preprocessing is run)

Integer

128

max_query_length

The maximum number of tokens for the question. Questions longer than this will be truncated to this length (optional if preprocessing is run)

Integer

64

n_best_size

The total number of n-best predictions to generate in the post.json output file

Integer

20

max_answer_length

The maximum length of an answer that can be generated. This is needed because the start and end predictions are not conditioned on one another

Integer

30

packing_strategy

This flag is set to True if using packing strategy

Boolean

False

centerface_postproc - Processes the inference outputs to parse detections and generates a detections file for the metric evaluator. Used for processing CenterFace face detector.

Parameters

Description

Type

Default

dims

Height and width; comma delimited, e.g., 640,640

String

Mandatory

dtypes

List of datatypes to be used for bounding boxes, scores, and labels (in order), e.g., [float32, float32, int64]. Defaults to the datatypes fetched from the ‘outputs_info’ for the model’s config.yaml

List

Datatypes from the outputs_info section of the model config.yaml

heatmap_threshold

User input for heatmap threshold

Float

0.05

nms_threshold

User input for nms threshold

Float

0.3

centernet_postprocess - Processes the inference outputs to parse detections and generate a detections file for the metric evaluator. Used for processing CenterNet detector.

Parameters

Description

Type

Default

dtypes

List of datatypes (at least 3) to be used to infer outputs

String

Mandatory

output_dims

Height and width; comma delimited, e.g., 640,640

String

Mandatory

top_k

Top K proposals are given from the postprocess plugin

Integer

100

num_classes

Number of classes

Integer

1

score

Threshold to purify the detections

Integer

1

lprnet_predict - Used for LPRNET license plate prediction.

object_detection - Processes the inference outputs to parse detections and generate a detections file for metric evaluator

Parameters

Description

Type

Default

dims

Height and width; comma delimited, e.g., 640,640

String

Mandatory

type

Type of post-processing (e.g., letterbox, stretch)

String

None

label_offset

Offset for the labels information

Integer

0

score_threshold

Threshold limit for the detection scores

Float

0.001

xywh_to_xyxy

Convert bounding box format from box center (xywh) to box corner (xyxy) format

Boolean

False

xy_swap

Swap the X and Y coordinates of bbox

Boolean

False

dtypes

List of datatypes used for bounding boxes, scores, and labels in order, e.g., [float32, float32, int64]. Defaults to the datatypes fetched from the ‘outputs_info’ for the model’s config.yaml.

List

Datatypes from the outputs_info section of the model config.yaml

mask

Do postprocessing on mask

Boolean

False

mask_dims

Output dims of model. Provide this only if mask = True. E.g., 100,80,28,28

String

None

padded_outputs

Pad the outputs

Boolean

False

scale

Comma separated scale values

String

‘1’

skip_padding

Skip padding while rescaling to original image shape

Boolean

False

onmt_postprocess - Performs preprocessing for OpenNMT model outputs

Parameters

Description

Type

Default

sentencepiece_model_path

Path to sentencepiece model for WMT dataset

String

Mandatory

unrolled_count

Upper limit on the unrolls required for the output (no. of output tokens to be considered for metric)

Integer

26

vocab_path

Path to OpenNMT model vocabulary file (pickle file), optional if preprocessing is run

String

None

skip_sentencepiece

Skip sentencepiece encoding, optional if preprocessing is run

Boolean

None

Metric plugins

bleu - Evaluates bleu score using sacrebleu library

Parameters

Description

Type

Default

round

Number of decimal places to round the result to

Integer

1

map_coco - Evaluates the mAP score 50 and 50:05:95 for COCO dataset

Parameters

Description

Type

Default

map_80_to_90

Mapping of classes in range 0-80 to 0-90

Boolean

False

segm

Flag to calculate mAP for mask

Boolean

False

keypoint_map

Flag to calculate mAP for keypoint

Boolean

False

perplexity - Calculates the perplexity metric. Model outputs are expected to be the logits of proper shape. Ground truth data is expected to be in tokenized format and in the form of token IDs. The ground truth will be automatically generated, if using the “gpt2_tokenizer” dataset plugin.

Parameters

Description

Type

Default

logits_index

Index of the logits output if the model has multiple outputs

Integer

0

precision - Calculates the precision metric, i.e., (correct predictions / total predictions). Ground truth data is expected in the format “filename <space> correct_text”. The postprocessed model outputs are expected to be text files with just the “predicted_text”.

Parameters

Description

Type

Default

round

Number of decimal places to round the result to

Integer

7

input_image_index

For multi input models, the index of image file in input file list csv

Integer

0

squad_em - Calculates the exact match for SQuAD v1.1 dataset predictions and ground truth.

squad_f1 - Calculates F1 score for SQuAD v1.1 dataset predictions and ground truth.

topk - Evaluates topk value by comparing results and annotations.

Parameters

Description

Type

Default

kval

Top k values, e.g., 1,5 evaluates top1 and top5

String

5

softmax_index

Index of the softmax output in the results file list

Integer

0

label_offset

Offset required in the labels’ scores, e,g., if shape is 1x1001, then labels_offset=1

Integer

0

round

Number of decimal places to round the result to

Integer

3

input_image_index

For multi input models, the index of image file in input file list csv

Integer

0

widerface_AP - Computes average precision for easy, medium, and hard cases.

Parameters

Description

Type

Default

IoU_threshold

User input for IoU threshold

Float

0.4

Memory-based plugins

This section lists the built-in memory-based plugins.

Dataset plugins

SQUADDataset - The Stanford Question Answering Dataset (SQuAD) is a widely used benchmark dataset for question-answering tasks, featuring over 100,000 questions annotated on more than 500 Wikipedia articles. This dataset allows us to load and extract examples from a specified SQuAD dataset file.

Parameters

Description

Type

Default

tokenizer_model_name_or_path

The name or path to the model used for tokenization. Can be any one of the below:
  • A string, the model ID of a predefined tokenizer hosted inside a model repo on huggingface.co.

  • A string, the model ID of a predefined tokenizer from huggingface.co (user-uploaded) and cache (e.g., “deepset/roberta-base-squad2”)

  • A path to a directory containing vocabulary files required by the tokenizer, for instance saved using the save_pretrained() method, e.g., ./my_model_directory/.

os.PathLike | str

Mandatory

annotation_path

Path to the SQUAD annotation file.

Optional[os.PathLike | str]

None

calibration_path

Path to the SQUAD calibration file.

Optional[os.PathLike | str]

None

max_samples

The maximum number of samples to load.

Optional[int]

None

use_calibration

Whether to use calibration data or not.

Optional[bool]

False

max_seq_length

The maximum sequence length.

int

384

max_query_length

The maximum query length.

int

64

doc_stride

The document stride.

int

128

threads

The number of threads to use.

int

8

do_lower_case

Whether to perform lower-casing on the data.

bool

True

model_inputs_count

The number of input fields in the PackedInputs tuple.

int

2

use_packing_strategy

Whether to pack features or not.

bool

False

max_sequence_per_pack

The maximum number of sequences per pack.

int

3

mask_type

The type of mask to use.

Optional[Literal[‘boolean’, ‘compressed’]]

None

compressed_mask_length

The length of the compressed mask.

Optional[int]

None

squad_version

The version of the SQUAD dataset.

int

1

WikiText2Dataset - The WikiText-2 dataset is a comprehensive collection of Wikipedia articles used to evaluate text generation and language modeling systems. It contains 17 million tokens from around 22,000 documents. This dataset allows us to tokenize the WikiText-2 data from files into model inputs.

Parameters

Description

Type

Default

tokenizer_model_name_or_path

The name or path to the model used for tokenization. Can be any one of the below:
  • A string, the model ID of a predefined tokenizer hosted inside a model repo on huggingface.co.

  • A string, the model ID of a predefined tokenizer from huggingface.co (user-uploaded) and cache (e.g., “deepset/roberta-base-squad2”)

  • A path to a directory containing vocabulary files required by the tokenizer, for instance saved using the save_pretrained() method, e.g., ./my_model_directory/.

os.PathLike | str

Mandatory

input_list_path

Path to the file containing text files.

os.PathLike | str

Mandatory

sequence_length

Length of each sequence.

int

Mandatory

past_shape

Shape of past sequences.

List[int]

None

calibration_indices

List containing the indices from input list to be used as calibration data.

Optional[List[int]]

None

max_samples

Maximum number of samples to be loaded.

Optional[int]

None

use_calibration

Flag to choose whether to use calibration data.

bool

False

past_sequence_length

Length of past sequences.

int

0

num_past

Number of past sequences.

int

0

position_id_required

Whether position IDs are required.

bool

True

mask_dtype

Data type for masks.

Literal[“int64”, “float32”]

“float32”

ImagenetDataset - The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) dataset, commonly referred to as the ImageNet dataset, is a vast and influential collection of over 14 million annotated images, making it one of the largest and most widely used benchmark datasets for computer vision research.

COCO2017Dataset - The Common Objects in Context (COCO) 2017 Dataset is a large-scale, fine-grained image dataset containing over 120,000 images and 2 million object instances from various categories, including animals, vehicles, furniture, and man-made objects, annotated with precise pixel-level masks.

Parameters

Description

Type

Default

inputlist_path

The path to the input list file. Paths in the file can be either relative to the file list’s location or an absolute path.

Optional[os.PathLike| str]

Mandatory

annotation_path

The path to the annotation file. If set to None, no annotations will be used. Annotation must be provided if metrics are to be computed.

Optional[os.PathLike| str]

None

calibration_path

The path to the calibration file. If set to None, no calibration data will be used.

Optional[os.PathLike | str]

None

calibration_indices

A list of indices from the input dataset that will be utilized for calibration purposes. User can provide a file containing comma-separated values representing the selected indices.

Optional[list[int] | str]

None

use_calibration

Flag to determine whether to use calibration data or not.

bool

False

image_backend

Image Backend to be used for loading images from disk.

Literal[‘opencv’,’pillow’]

‘opencv’

max_samples

Maximum number of samples to be loaded.

Optional[int]

None

SYN_CHINESE_LP_Dataset - The SYN_CHINESE_LP dataset is a synthetic collection of Chinese license plate images with varying levels of quality, noise, and distortion, designed to simulate real-world challenges in automatic license plate recognition (ALPR) tasks for traffic management applications.

Parameters

Description

Type

Default

inputlist_path

The path to the input list file. Paths in the file can be either relative to the file list’s location or an absolute path.

Optional[os.PathLike| str]

Mandatory

annotation_path

The path to the annotation file. If set to None, no annotations will be used. Annotation must be provided if metrics are to be computed.

Optional[os.PathLike| str]

None

calibration_path

The path to the calibration file. If set to None, no calibration data will be used.

Optional[os.PathLike | str]

None

calibration_indices

A list of indices from the input dataset that will be utilized for calibration purposes. User can provide a file containing comma-separated values representing the selected indices.

Optional[list[int] | str]

None

use_calibration

Flag to determine whether to use calibration data or not.

bool

False

image_backend

Image Backend to be used for loading images from disk.

Literal[‘opencv’,’pillow’]

‘opencv’

max_samples

Maximum number of samples to be loaded.

Optional[int]

None

WIDERFaceDataset - The WIDERFace dataset is a large-scale facial landmark detection benchmark with more than 24 million annotated facial landmarks, making it one of the most comprehensive and challenging datasets for face localization tasks.

Parameters

Description

Type

Default

inputlist_path

The path to the input list file. Paths in the file can be either relative to the file list’s location or an absolute path.

Optional[os.PathLike| str]

Mandatory

annotation_path

The path to the folder containing annotation *.mat files. If set to None, no annotations will be used. Annotation must be provided if metrics are to be computed.

Optional[DirectoryPath]

None

calibration_path

The path to the calibration file. If set to None, no calibration data will be used.

Optional[os.PathLike | str]

None

calibration_indices

A list of indices from the input dataset that will be utilized for calibration purposes. User can provide a file containing comma-separated values representing the selected indices.

Optional[list[int] | str]

None

use_calibration

Flag to determine whether to use calibration data or not.

bool

False

image_backend

Image Backend to be used for loading images from disk.

Literal[‘opencv’,’pillow’]

‘opencv’

max_samples

Maximum number of samples to be loaded.

Optional[int]

None

WMT20Dataset - The WMT20 dataset is a collection of machine translation benchmarks, consisting of parallel corpora in 46 language pairs with millions of sentence pairs, used to evaluate and improve the performance of machine translation systems for multilingual applications.

Parameters

Description

Type

Default

inputlist_path

The path to the input list file. Paths in the file can be either relative to the file list’s location or an absolute path.

Optional[os.PathLike| str]

Mandatory

annotation_path

The path to the annotation file. If set to None, no annotations will be used. Annotation must be provided if metrics are to be computed.

Optional[os.PathLike| str]

None

calibration_indices

A list of indices from the input dataset that will be utilized for calibration purposes. User can provide a file containing comma-separated values representing the selected indices.

Optional[list[int] | str]

None

use_calibration

Flag to determine whether to use calibration data or not.

bool

False

max_samples

Maximum number of samples to be loaded.

Optional[int]

None

Preprocessing memory plugins

CenternetPreprocessor - Performs preprocessing on CenterNet dataset examples.

Parameters

Description

Type

Default

output_dimensions

Output dimensions of the processed image output. Height and width; e.g., [640 , 640]

list[int]

Mandatory

scale

Scale factor for image

Float

1.0

ConvertNCHW - Transposes WHC to CHW or CHW to WHC and adds an extra N dimension.

Parameters

Description

Type

Default

expand_dims

Add the Nth dimension

Boolean

True

CropImage - Center crops an image to the given dimensions using numpy or torchvision based on the library parameter.

Parameters

Description

Type

Default

image_dimensions

Output dimensions of the processed image output. Height and width; e.g., [640 , 640]

list[int]

Mandatory

library

Python library used to crop the given input; valid values are: numpy | torchvision

String

numpy

typecasting_required

To convert final output to numpy or not. Note: This option is specific to torchvision library

Boolean

True

ExpandDimensions - Adds a new dimension for images at the given axis, e.g., HWC to NHWC.

Parameters

Description

Type

Default

axis

The index of the axis to expand

Integer

0

FlipImage - Flips the input image horizontally or vertically based on given axis.

Parameters

Description

Type

Default

axis

The axis along which the image is flipped. Default: 3, indicating a horizontal flip for RGB images.

Integer

3

MlCommonsRetinaNetPreprocessor - Preprocessor for the RetinaNet model. Normalize image based on mean and standard deviation and interpolate to provided image_size.

Parameters

Description

Type

Default

image_size

Expected size to which images should be resized in [Height, Width] format; e.g., [299 299]

list[int, int]

(800, 800)

mean

The mean values for normalization.

list[float]

[0.485, 0.456, 0.406]

std

The standard deviation values for normalization.

list[float]

[0.229, 0.224, 0.225]

OpenNMTPreprocessor - A preprocessor for OpenNMT that reads text data and applies required preprocessing for ONMT models.

Parameters

Description

Type

Default

vocab_path

The path to the vocabulary file to be used for processing.

os.PathLike

128

src_seq_len

The source sequence length.

Integer

128

CLIPPreprocessor - Creates input files with image and/or text for image transformer models like ViT and CLIP. (Note: This plugin requires Pillow package version:10.0.0)

Parameters

Description

Type

Default

image_dimensions

Expected processed output dimension in [Height, Width] format; e.g., [299 299]

list[int]

Mandatory

image_only

Whether to process only image tokens

Boolean

True

image_input_index

Index of the input image data in the input provided

Integer

0

NormalizeImage - Normalizes input per the given scheme; data must be of NHWC format.

Parameters

Description

Type

Default

library

Python library used to crop the given input; valid values are: numpy | torchvision

String

numpy

norm

Normalization factor, all values divided by norm

float32

255.0

means

Dictionary of means to be subtracted, e.g., {“R”:0.485, “G”:0.456, “B”:0.406}

RGB dictionary

{“R”:0, “G”:0, “B”:0}

std

Dictionary of std-dev for rescaling the values, e.g., {“R”:0.229, “G”:0.224, “B”:0.225}

RGB dictionary

{“R”:1, “G”:1, “B”:1}

channel_order

Channel order to specify means and std values per channel - RGB | BGR

String

‘RGB’

normalize_first

To perform normalization before or after mean subtraction and standard deviation.
normalize_first=True means perform normalization before.
Note: torchvision library does not use this option

Boolean

True

typecasting_required

To convert final output to numpy or not. Note: This option is specific to the Torchvision library

Boolean

True

PadImage - Image padding with constant pad size or based on target dimensions

Parameters

Description

Type

Default

target_dimensions

Height and width of the processed image output. e.g., [640 , 640] for ‘target-dims’ type of padding

list[int]

Mandatory

pad_type

Type of padding. Valid options:
  • constant: Add padding of constant sides on 4 sides (pad_size must be provided)

  • target_dims: Add padding based on difference in image size and target size (dims param must be provided)

String

Mandatory

constant_pad_size

Size of padding for ‘constant’ type of padding

Integer

None

image_position

Parameter to specify position of image, either ‘center’ or ‘corner’ (top-left). Padding is added accordingly. Currently used for ‘target_dims’ type padding

String

‘center’

color_value

Padding value for all planes

Integer

114

ResizeImage - Resizes an image using the specified library parameter: cv2(Default), pillow or torchvision

Parameters

Description

Type

Default

image_dimensions

Height and width of the processed image output. e.g., [640 , 640]

list[int]

Mandatory

library

Python library to be used for resizing a given input; valid values are: opencv | pillow | torchvision

String

opencv

channel_order

Convert image to specified channel order. At present this parameter only takes the ‘RGB’ value

String

RGB

interpolation_method

Interpolation Type. Options:
  • bilinear (supported by opencv, Torchvision, pillow)

  • area (supported by opencv only)

  • nearest (supported by opencv, Torchvision, pillow)

  • bicubic (supported by Torchvision, pillow)

  • box (supported by pillow only)

  • hamming (supported by pillow only)

  • lanczos (supported by pillow only)

String

For opencv and torchvision: bilinear
For pillow: bicubic

resize_type

Type of resize to be done. Note: Torchvision does not use this option. Options:
  • letterbox : Used for YOLO models.

  • imagenet : Scale followed by resize.

  • aspect_ratio : Resize while keeping aspect ratio.

  • None : The default behavior is to auto-resize the image to the target dims.

String

None

resize_before_typecast

To resize before or after conversion to target datatype e.g., fp32

Boolean

True

typecasting_required

To convert final output to numpy or not. Note: This option is specific to the Torchvision library

Boolean

True

mean

Dictionary of means to be subtracted, e.g., {“R”:0.485, “G”:0.456, “B”:0.406}. Note: This option is specific to the Tensorflow library

RGB dictionary

{“R”:0, “G”:0, “B”:0}

std

Dictionary of std-dev for rescaling the values, e.g., {“R”:0.229, “G”:0.224, “B”:0.225}. Note: This option is specific to the Tensorflow library

RGB dictionary

{“R”:0, “G”:0, “B”:0}

normalize_before_resize

To perform normalization before or after mean subtraction and standard deviation. Note: This option is specific to the Tensorflow library

Boolean

False

norm

Normalization factor, all values divided by norm

float32

255.0

normalize_first

To perform normalization before or after mean subtraction and standard deviation.
normalize_first=True means perform normalization before.
Note: torchvision library does not use this option

Boolean

True

Adapter

ClassificationOutputAdapter - Transforms the output of a classification model into a single output (softmax only), assuming the model provides a list of outputs. Used along with TopKMetric for classification models.

Parameters

Description

Type

Default

softmax_index

The index of the softmax output in the model’s outputs.

Integer

0

BoundingBoxOutputAdapter - Transforms the bounding box output of a object detection model based on user’s inputs. It allows for conversion from (x, y, w, h) format to (x1, y1, x2, y2) format and swapping of X and Y coordinates. Used along with ObjectDetectionPostProcessor.

Parameters

Description

Type

Default

xywh_to_xyxy

Whether to convert output from box center (xywh) to box corner (xyxy) format.

Boolean

False

xy_swap

Whether to swap X and Y coordinates of bounding boxes.

Boolean

False

Postprocessing memory plugins

SquadPostProcessor - Predicts answers for a SQuAD dataset for the given start and end scores.

Parameters

Description

Type

Default

do_unpacking

This flag is set to True if using packing strategy

Boolean

False

CenterFacePostProcessor - Processes the inference outputs to parse detections and generates detections for the metric evaluation. Used for processing CenterFace face detector.

Parameters

Description

Type

Default

image_dimensions

Output dimensions of the model. Height and width; e.g., [640 , 640]

list[int]

Mandatory

heatmap_threshold

User input for minimum confidence score to consider a detection as valid.

Float

0.05

nms_threshold

User input for nonmaximum suppression threshold for detecting multiple detections per object.

Float

0.3

CenterNetPostProcessor - Processes the inference outputs to parse detections and generate detections for metric evaluation. Used for processing CenterNet detector.

Parameters

Description

Type

Default

output_dimensions

Output dimensions of the model. Height and width; e.g., [640 , 640]

list[int]

Mandatory

top_k

Top K proposals are given from the postprocess plugin

Integer

100

num_classes

Number of classes

Integer

1

score_threshold

Threshold to purify the detections

Integer

1

LPRNETPostProcessor - Used for LPRNET license plate prediction.

Parameters

Description

Type

Default

class_axis

Axis along which the model output is expected.

Integer

-1

ObjectDetectionPostProcessor - Processes the inference outputs to parse detections and generate detections for metric evaluation

Parameters

Description

Type

Default

image_dimensions

Output dimensions of the model. Height and width; e.g., [640 , 640]

list[int]

Mandatory

type

Type of post-processing (e.g., ‘letterbox’)

Literal[‘letterbox’, ‘stretch’, ‘aspect_ratio’, ‘orgimage’]

None

label_offset

The offset to apply to the label indices.

Integer

0

score_threshold

Threshold limit for the detection scores

Float

0.001

mask

Do postprocessing on mask

Boolean

False

mask_dims

Output dims of model. Provide this only if mask = True. E.g., 100,80,28,28

String

None

scale

Comma separated scale values

String

‘1’

skip_padding

Skip padding while rescaling to original image shape

Boolean

False

OpenNMTPostprocessor - Postprocessor for OpenNMT Model with the WMT20 test dataset

Parameters

Description

Type

Default

sentencepiece_model_path

The path to the SentencePiece model.

str | os.PathLike

Mandatory

unrolled_count

The count for unfolding

Optional[Integer]

26

MlCommonsRetinaNetPostProcessor - Postprocessor for MlCommons RetinaNet Model

Parameters

Description

Type

Default

image_dimensions

Output dimensions of the model. Height and width; e.g., [1200 , 1200]

list[int]

Mandatory

prior_boxes_file_path

Path to the file containing prior boxes.

os.PathLike

Mandatory

score_threshold

Path to the file containing prior boxes.

Float

Mandatory

nms_threshold

Path to the file containing prior boxes.

Float

Mandatory

max_detections_per_image

Path to the file containing prior boxes.

Integer

Mandatory

num_classes_in_dataset

Path to the file containing prior boxes.

Integer

Mandatory

feature_map_dimensions

Dimensions of feature maps from FPN.

list[int]

Mandatory

Metric memory plugins

MAP_COCOMetric - Evaluates the mAP score 50 and 50:05:95 for COCO dataset

Parameters

Description

Type

Default

map_80_to_90

Mapping of classes in range 0-80 to 0-90

Boolean

False

seg_map

Flag to calculate mAP for mask

Boolean

False

keypoint_map

Flag to calculate mAP for keypoint

Boolean

False

dataset_type

Dataset used for evaluation. data must be one of ‘openimages’ or ‘coco’

String

‘coco’

perplexity - Calculates the perplexity metric. Model outputs are expected to be the logits of proper shape. Ground truth data is expected to be in tokenized format and in the form of token IDs. The ground truth will be automatically generated, if using the “gpt2_tokenizer” dataset plugin.

Parameters

Description

Type

Default

logits_index

Index of the logits output if the model has multiple outputs

Integer

0

precision - Calculates the precision metric, i.e., (correct predictions / total predictions).

Parameters

Description

Type

Default

round

Number of decimal places to round the result to

Integer

7

output_index

Index of the output to be used from the data provided.

Integer

0

SquadEvaluation - Calculates F1 score and exact match scores for SQuAD dataset based on predictions and ground truth.

Parameters

Description

Type

Default

tokenizer_model_name_or_path

The name or path to the model used for tokenization. Can be any one of the below:
  • A string, the model ID of a predefined tokenizer hosted inside a model repo on huggingface.co.

  • A string, the model ID of a predefined tokenizer from huggingface.co (user-uploaded) and cache (e.g., “deepset/roberta-base-squad2”)

  • A path to a directory containing vocabulary files required by the tokenizer, for instance saved using the save_pretrained() method, e.g., ./my_model_directory/.

os.PathLike | str

Mandatory

max_answer_length

The maximum length of an answer, after tokenization. In SQuAD v2 this was set to 30 tokens; in SQuAD v1 it was not specified so a default value of 30 was used.

Integer

30

n_best_size

Specifies how many of the possible answers to return for a given question along with corresponding confidence scores.

Integer

20

do_lower_case

Whether or not to lowercase all text before processing.

Bool

False

squad_version

Indicates which version of SQuAD style questions and answers we’re dealing with (“v1” or “v2”).

Integer

1

decimal_places

Number of decimal places to round the result to

Integer

6

TopKMetric - Calculate the number of times where the correct label is among the top k predicted labels.

Parameters

Description

Type

Default

k

Top k values, e.g., 1,5 evaluates top1 and top5

list[int]

[1 , 5]

label_offset

Offset required in the labels’ scores, e,g., if shape is 1x1001, then labels_offset=1

Integer

0

decimal_places

Number of decimal places to round the result to

Integer

7

WiderFaceAPMetric - Computes average precision for easy, medium, and hard cases.

Parameters

Description

Type

Default

iou_threshold

User input for IoU threshold to be used for evaluation.

Float

0.4

SDK Compatibility Verification

The model generated by the converter should be inferred by net-run tools from the same SDK as the converter. We can quickly check the SDK info of model.cpp/model.so by running these string grep commands:

strings model.cpp  | grep qaisw
strings libqnn_model.so  | grep qaisw