snpe-dlc-graph-prepare

snpe-dlc-graph-prepare is used to perform offline graph preparation on quantized dlcs to run on DSP/HTP runtimes.

Command Line Options:
  [ -h, --help ]        Displays this help message.
  [ --version ]         Displays version information.
  [ --verbose ]         Enable verbose user messages.
  [ --quiet ]           Disables some user messages.
  [ --silent ]          Disables all but fatal user messages.
  [ --debug=<val> ]     Sets the debug log level.
  [ --debug1 ]          Enables level 1 debug messages.
  [ --debug2 ]          Enables level 2 debug messages.
  [ --debug3 ]          Enables level 3 debug messages.
  [ --log-mask=<val> ]  Sets the debug log mask to set the log level for one or more areas.
                        Example: ".*=USER_ERROR, .*=INFO, NDK=DEBUG2, NCC=DEBUG3"
  [ --log-file=<val> ]  Overrides the default name for the debug log file.
  [ --log-dir=<val> ]   Overrides the default directory path where debug log files are written.
  [ --log-file-include-hostname ]
                        Appends the name of this host to the log file name.
  --input_dlc=<val>     Path to the dlc container containing the model for which graph cache
                        should be generated. This argument is required.
  [ --output_dlc=<val> ]
                        Path at which the cached data included model container should be written.
                        If this argument is omitted, the quantized model will be written at
                        <input_model_name>_cached.dlc.
  [ --set_output_tensors=<val> ]
                        Specifies a comma separated list of tensors to be output after execution
                        without whitespace.
  [ --set_output_layers=<val> ]
                        Specifies a comma separated list of layers whose output buffers should be
                        output after execution, without whitespace.
  [ --input_list=<val> ]
                        Path to a file specifying input images as passed to snpe-net-run. Only
                        the graph output buffers information specified in the input list (line starting
                        with # or %, if any) will be used. Paths to the input images will be ignored
  [ --htp_socs=<val> ]  Specify SoC(s) to generate HTP Offline Cache for. SoCs are specified with an
                        ASIC identifier, in a comma seperated list without whitespace.
                        For example --htp_socs sm8350,sm8450,sm8550,sm8650,qcs6490,qcs8550.
                        This flag and --htp_archs are mutually exclusive.
                        Default ASIC identifier: sm8650
  [ --htp_archs=<val> ]
                        Specify DSP Architecture(s) to generate general HTP Offline Cache for.
                        Architectures are specified with an ASIC identifier, in a comma seperated list
                        without whitespace. For example, --htp_archs v68,v73. This flag cannot be
                        coupled with --htp_socs or --vtcm_override
  [ --vtcm_override=<val> ]
                        Specify a single value representing the VTCM size in MB for the generated HTP Offline Caches.
                        For example, --vtcm_override 4. When set to 0, the SoC maximum VTCM size is used and if cache
                        compatibility mode is set to STRICT the maximum value is checked. This flag can be used with
                        --htp_socs to override the default SOC VTCM size setting
  [ --optimization_level=<val> ]
                        Specify an optimization level. Valid values are 1, 2 and 3. Default is 2. Higher optimization levels incur
                        longer offline prepare time but yield more optimal graph and hence faster execution time for most graphs
  [ --optimization_preset==<val> ]
                        Specify an optimization preset. Valid values are any integer value greater than or equal to zero. Default is 0.
                        These are experimental HTP graph compiler settings that typically affect latency and DRAM bandwidth.
                        These presets are intended for use with optimization_level=3. Unlike optimization levels, preset values do not follow
                        a consistent performance pattern. Results may vary depending on the network architecture and software release.
  [ --buffer_data_type=<val> ]
                        Sets data type of IO buffers during prepare. Data Type can be the following:
                        float32, fixedPoint8, fixedPoint16. Arguments should be formatted as follows:
                        --buffer_data_type buffer_name1=buffer_name1_data_type
                        --buffer_data_type buffer_name2=buffer_name2_data_type
                        (Note: deprecated)
  [ --overwrite_cache_records ]
                        Allow this tool to overwrite over any cache record that exactly matches the requested SoC(s).
                        Default behavior is to skip (re)generating cache records when a matching cache already exists
  [ --use_float_io ]    Prepare quantized HTP Graph to operate with floating point inputs/outputs (Note: deprecated)
  [ --htp_dlbc=<val> ]  Specify Deep Learning Bandwidth Compression (DLBC) for this HTP graph. The default setting is OFF.
                        To turn on, specify it as --htp_dlbc=true
  [ --num_hvx_threads=<val> ]
                        Specify the number of HVX threads to reserve for this HTP graph. Must be greater than 0.
  [ --input_name=<val> ]
                        Specifies the name of input for which dimensions are specified
                        e.g. --input_name=<input name>
                        If the DLC has multiple graphs, graph names are required.
                        Quotes are required if graph names are specified. Graph names can be found in snpe-dlc-info.
                        e.g. --input_name="<graph name> <input name>"
  [ --input_dimensions=<val> ]
                        Specifies new dimensions for input whose name is specified in input_name.
                        e.g. --input_dimension=1,224,224,3
                        For multiple inputs, specify --input_name=<input name> and --input_dimensions=<input dimensions> multiple times.
                        If the DLC has multiple graphs, graph names are required.
                        Quotes are required if graph names are specified. Graph names can be found in snpe-dlc-info.
                        e.g. --input_dimensions="<graph name> 1,224,224,3"
  [ --memorymapped_buffer_hint=<val> ]
                        Specifies memory-mapped buffers hint. The default setting is OFF.
                        To turn on, specify it as --memorymapped_buffer_hint=true
  [ --udo_package_path=<val> ]
                        Use this option to specify path to the Registration Library for UDO Package(s). Usage is:
                        --udo_package_path=<path_to_reg_lib>
                        Optionally, user can provide multiple packages as a comma-separated list.
                        This option must be specified for Networks with UDO. All UDO's in Network must have host executable CPU Implementation

For detailed information on how to use the tool, please refer to Offline Graph Caching for DSP Runtime on HTP

snpe-dlc-quant

snpe-dlc-quant converts non-quantized DLC models into quantized DLC models.

Command Line Options:
    [ -h,--help ]         Displays this help message.
    [ --version ]         Displays version information.
    [ --verbose ]         Enable verbose user messages.
    [ --quiet ]           Disables some user messages.
    [ --silent ]          Disables all but fatal user messages.
    [ --debug=<val> ]     Sets the debug log level.
    [ --debug1 ]          Enables level 1 debug messages.
    [ --debug2 ]          Enables level 2 debug messages.
    [ --debug3 ]          Enables level 3 debug messages.
    [ --log-mask=<val> ]  Sets the debug log mask to set the log level for one or more areas.
                        Example: ".*=USER_ERROR, .*=INFO, NDK=DEBUG2, NCC=DEBUG3"
    [ --log-file=<val> ]  Overrides the default name for the debug log file.
    [ --log-dir=<val> ]   Overrides the default directory path where debug log files are written.
    [ --log-file-include-hostname ]
                        Appends the name of this host to the log file name.
    [ --input_dlc=<val> ]
                        Path to the dlc container containing the model for which fixed-point encoding
                        metadata should be generated. This argument is required.
    [ --input_list=<val> ]
                        Path to a file specifying the trial inputs. This file should be a plain text file,
                        containing one or more absolute file paths per line. These files will be taken to constitute
                        the trial set. Each path is expected to point to a binary file containing one trial input
                        in the 'raw' format, ready to be consumed by the tool without any further modifications.
                        This is similar to how input is provided to snpe-net-run application.
    [ --no_weight_quantization ]
                        Note: Deprecated.
    [ --output_dlc=<val> ]
                        Path at which the metadata-included quantized model container should be written.
                        If this argument is omitted, the quantized model will be written at <unquantized_model_name>_quantized.dlc.
    [ --use_enhanced_quantizer ]
                        Note: Deprecated; use --param_quantizer and/or --act_quantizer.
                        Use the enhanced quantizer feature when quantizing the model.  Regular quantization determines the range using the actual
                        values of min and max of the data being quantized.  Enhanced quantization uses an algorithm to determine optimal range.  It can be
                        useful for quantizing models that have long tails in the distribution of the data being quantized.
    [ --use_adjusted_weights_quantizer ]
                        Note: Deprecated; use --param_quantizer.
                        Use the adjusted tf quantizer for quantizing the weights only. This might be helpful for improving the accuracy of some models,
                        such as denoise model as being tested. This option is only used when quantizing the weights with 8 bit.
    [ --optimizations=<val> ]
                        Note: Deprecated; use --algorithms.
                        Enables new optimization algorithms. Usage is:
                            --optimizations <algo_name1> --optimizations <algo_name2>
                        Available optimization algorithms are:
                        "cle" - Cross layer equalization includes a number of methods for equalizing weights
                        and biases across layers in order to rectify imbalances that cause quantization errors.
    [ --algorithms=<val> ]
                        Enables new optimization algorithms. Usage is:
                            --algorithms <algo_name1> --algorithms <algo_name2>
                        Available optimization algorithms are:
                        "cle" - Cross layer equalization includes a number of methods for equalizing weights
                        and biases across layers in order to rectify imbalances that cause quantization errors.
    [ --override_params ]
                        Use this option to override quantization parameters when quantization was provided from the original source framework (eg TF fake quantization).
                        Note: Quantizer throws an error if overridden encodings contain unsupported bitwidths.
    [ --use_encoding_optimizations ]
                        Note: Deprecated.
    [ --udo_package_path=<val> ]
                        Specifies the path to the registration library for UDO package(s). Usage is:
                            --udo_package_path=<path_to_reg_lib>
                        You can (optionally) provide multiple packages as a comma-separated list.
                        This option must be specified for networks with UDO. All UDO's in a network must have a host-executable CPU implementation.
    [ --use_symmetric_quantize_weights ]
                        Note: Deprecated, use --param_quantizer.
                        Use the symmetric quantizer feature when quantizing the weights of the model. It makes sure min and max have the
                        same absolute values about zero. Symmetrically quantized data will also be stored as int#_t data such that the offset is always 0.
    [ --use_native_dtype ]
                        Note: This option is deprecated, use --use_native_input_files option in future.
                          Use this option to indicate how to read input files,
                           1. float (default): reads inputs as floats and quantizes if necessary based on quantization parameters in the model.
                           2. native:          reads inputs assuming the data type to be native to the model. For ex., uint8_t.
    [ --use_native_input_files ]
                        Use this option to indicate how to read input files,
                           1. float (default): reads inputs as floats and quantizes if necessary based on quantization parameters in the model.
                           2. native:          reads inputs assuming the data type to be native to the model. For ex., uint8_t.
    [ --use_native_output_files ]
                        Use this option to indicate the data type of the output files,
                           1. float (default): generates the output file as float data.
                           2. native:          generates the output file as datatype native to the source model. i.e. uint8_t.
    [ --float_fallback ]
                        Enables fallback to floating point (FP) instead of fixed point.
                        This option can be paired with --float_bitwidth to indicate the bitwidth for FP (by default 32).
                        If this option is enabled, then input list must not be provided and --override_params must be provided.
                        The external quantization file (encoding file) might be missing quantization parameters for some interim tensors.
                        First it will try to fill the gaps by propagating across math-invariant functions. If the quantization parameters are
                        still missing, it applies fallback to nodes to floating point
    [ --param_quantizer=<val> ]
                        Indicates the weight/bias quantizer to use. Optional and must be followed by one of the following options:
                        "tf": Uses the real min/max of the data and specified bitwidth (default).
                        "enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
                        "symmetric": Ensures min and max have the same absolute values about zero. Data will be stored as int#_t data such that the offset is always 0.
    [ --act_quantizer=<val> ]
                        Indicates the activation quantizer to use. Optional and must be followed by one of the following options:
                        "tf": Uses the real min/max of the data and specified bitwidth (default).
                        "enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
                        "symmetric": Ensures min and max have the same absolute values about zero. Data will be stored as int#_t data such that the offset is always 0.
    [ --bitwidth=<val> ]
                        Note: Deprecated.
                        Selects the bitwidth to use when quantizing the weights/activations/biases; 8 (default) or 16.
                        Cannot be mixed with --weights_bitwidth or --act_bitwidth or --bias_bitwidth.
    [ --weights_bitwidth=<val> ]
                        Selects the bitwidth to use when quantizing the weights; either 4, 8 (default) or 16.
                        8w/16a is only supported by HTA currently.
                        Cannot be mixed with --bitwidth.
    [ --act_bitwidth=<val> ]
                        Selects the bitwidth to use when quantizing the activations; either 8 (default) or 16.
                        8w/16a is only supported by HTA currently.
                        Cannot be mixed with --bitwidth.
    [ --float_bitwidth=<val> ]
                        Selects the bitwidth to use when using float for parameters (weights/biases) and activations for
                        all ops or a specific op (via encodings) selected through encoding; either 32 (default) or 16.
    [ --bias_bitwidth=<val> ]
                        Selects the bitwidth to use when quantizing the biases; either 8 (default) or 32.
                        Using 32-bit biases may sometimes provide a small improvement in accuracy.
                        Cannot be mixed with --bitwidth.
    [ --float_bias_bitwidth=<val> ]
                        Specifies the bitwidth for float bias tensors; either 32 or 16.
                        If not provided and bias is overridden to float in the quantizer, the overriding float tensor's bitwidth will be used.
    [ --axis_quant ]    Note: Deprecated; use --use_per_channel_quantization.
                        Selects per-axis-element quantization for the weights and biases of certain layer types.
                        Only Convolution, Deconvolution, and FullyConnected are supported.
    [ --use_per_channel_quantization ]
                        Selects per-axis-element quantization for the weights and biases of certain layer types.
                        Only Convolution, Deconvolution, and FullyConnected are supported.
    [ --use_per_row_quantization ]
                        Enables row wise quantization of Matmul and FullyConnected ops.
    [ --enable_per_row_quantized_bias ]
                        Enables row wise quantization of bias for FullyConnected op, when weights are per-row quantized.
    [ --restrict_quantization_steps=<val> ]
                        Specifies the number of steps to use for computing quantization encodings such that scale = (max - min) / number of quantization steps.
                        The option should be passed as a comma separated pair of hexadecimal string minimum and maximum values,
                        i.e., --restrict_quantization_steps "MIN,MAX".
                        Note that this is a hexadecimal string literal and not a signed integer, to supply a negative value an explicit minus sign is required,
                        e.g., --restrict_quantization_steps "-0x80,0x7F" indicates an example 8-bit range.
                              --restrict_quantization_steps "-0x8000,0x7F7F" indicates an example 16-bit range.
                        This option only applies to symmetric parameter quantization.


Description:
Generate 8 or 16 bit TensorFlow style fixed point weight and activations encodings for a floating point DLC.

Additional details:
  • For specifying input_list, refer to input_list argument in snpe-net-run for supported input formats (in order to calculate output activation encoding information for all layers, do not include the line which specifies desired outputs).

  • The tool requires the batch dimension of the DLC input file to be set to 1 during the original model conversion step.

    • An example of quantization using snpe-dlc-quant can be found in the C/C++ Tutorial section: Running the Inception v3 Model. For details on quantization see Quantized vs Non-Quantized Models.

    • Outputs can be specified for snpe-dlc-quant by modifying the input_list in the following ways:

      #<output_layer_name>[<space><output_layer_name>]
      %<output_tensor_name>[<space><output_tensor_name>]
      <input_layer_name>:=<input_layer_path>[<space><input_layer_name>:=<input_layer_path>]
      …
      

      Note: Output tensors and layers can be specified individually, but when specifying both, the order shown must be used to specify each.

    • When using the Qualcomm® Neural Processing SDK API:

      • Any output layers specified when snpe-dlc-quant was called, need to be specified using the Snpe_SNPEBuilder_SetOutputLayers() function.

      • Any output tensors specified when snpe-dlc-quant was called, need to be specified using the Snpe_SNPEBuilder_SetOutputTensors() function.


snpe-dlc-quantize

snpe-dlc-quantize converts non-quantized DLC models into quantized DLC models.

Command Line Options:
    [ -h,--help ]         Displays this help message.
    [ --version ]         Displays version information.
    [ --verbose ]         Enable verbose user messages.
    [ --quiet ]           Disables some user messages.
    [ --silent ]          Disables all but fatal user messages.
    [ --debug=<val> ]     Sets the debug log level.
    [ --debug1 ]          Enables level 1 debug messages.
    [ --debug2 ]          Enables level 2 debug messages.
    [ --debug3 ]          Enables level 3 debug messages.
    [ --log-mask=<val> ]  Sets the debug log mask to set the log level for one or more areas.
                        Example: ".*=USER_ERROR, .*=INFO, NDK=DEBUG2, NCC=DEBUG3"
    [ --log-file=<val> ]  Overrides the default name for the debug log file.
    [ --log-dir=<val> ]   Overrides the default directory path where debug log files are written.
    [ --log-file-include-hostname ]
                        Appends the name of this host to the log file name.
    [ --input_dlc=<val> ]
                        Path to the dlc container containing the model for which fixed-point encoding
                        metadata should be generated. This argument is required.
    [ --input_list=<val> ]
                        Path to a file specifying the trial inputs. This file should be a plain text file,
                        containing one or more absolute file paths per line. These files will be taken to constitute
                        the trial set. Each path is expected to point to a binary file containing one trial input
                        in the 'raw' format, ready to be consumed by the tool without any further modifications.
                        This is similar to how input is provided to snpe-net-run application.
    [ --no_weight_quantization ]
                        Note: Deprecated.
    [ --output_dlc=<val> ]
                        Path at which the metadata-included quantized model container should be written.
                        If this argument is omitted, the quantized model will be written at <unquantized_model_name>_quantized.dlc.
    [ --enable_htp ]      Pack HTP information in quantized DLC.
    [ --htp_socs=<val> ]  Specify SoC to generate HTP Offline Cache for.
                        SoCs are specified with an ASIC identifier, in a comma separated list.
                        For example, --htp_socs sm8650
    [ --overwrite_cache_records ]
                        Overwrite HTP cache records present in the DLC.
    [ --use_float_io ]
                        Pack HTP information in quantized DLC (Note: deprecated).
    [ --use_enhanced_quantizer ]
                        Note: Deprecated; use --param_quantizer and/or --act_quantizer.
                        Use the enhanced quantizer feature when quantizing the model.  Regular quantization determines the range using the actual
                        values of min and max of the data being quantized.  Enhanced quantization uses an algorithm to determine optimal range.  It can be
                        useful for quantizing models that have long tails in the distribution of the data being quantized.
    [ --use_adjusted_weights_quantizer ]
                        Note: Deprecated; use --param_quantizer.
                        Use the adjusted tf quantizer for quantizing the weights only. This might be helpful for improving the accuracy of some models,
                        such as denoise model as being tested. This option is only used when quantizing the weights with 8 bit.

    [ --optimizations=<val> ]
                        Note: Deprecated; use --algorithms.
                        Enables new optimization algorithms. Usage is:
                            --optimizations <algo_name1> --optimizations <algo_name2>
                        Available optimization algorithms are:
                        "cle" - Cross layer equalization includes a number of methods for equalizing weights
                        and biases across layers in order to rectify imbalances that cause quantization errors.
    [ --algorithms=<val> ]
                        Enables new optimization algorithms. Usage is:
                            --algorithms <algo_name1> --algorithms <algo_name2>
                        Available optimization algorithms are:
                        "cle" - Cross layer equalization includes a number of methods for equalizing weights
                        and biases across layers in order to rectify imbalances that cause quantization errors.
    [ --override_params ]
                        Use this option to override quantization parameters when quantization was provided from the original source framework (eg TF fake quantization).
                        Note: Quantizer throws an error if overridden encodings contain unsupported bitwidths.
    [ --use_encoding_optimizations ]
                        Note: Deprecated.
                        Use this option to enable quantization encoding optimizations. This can reduce requantization in the graph and may improve accuracy for some models.
    [ --udo_package_path=<val> ]
                        Specifies the path to the registration library for UDO package(s). Usage is:
                            --udo_package_path=<path_to_reg_lib>
                        You can (optionally) provide multiple packages as a comma-separated list.
                        This option must be specified for networks with UDO. All UDO's in a network must have a host-executable CPU implementation.
    [ --use_symmetric_quantize_weights ]
                        Note: Deprecated, use --param_quantizer.
                        Use the symmetric quantizer feature when quantizing the weights of the model. It makes sure min and max have the
                        same absolute values about zero. Symmetrically quantized data will also be stored as int#_t data such that the offset is always 0.
    [ --use_native_dtype ]
                        Note: This option is deprecated, use --use_native_input_files option in future.
                          Use this option to indicate how to read input files,
                           1. float (default): reads inputs as floats and quantizes if necessary based on quantization parameters in the model.
                           2. native:          reads inputs assuming the data type to be native to the model. For ex., uint8_t.
    [ --use_native_input_files ]
                        Use this option to indicate how to read input files,
                           1. float (default): reads inputs as floats and quantizes if necessary based on quantization parameters in the model.
                           2. native:          reads inputs assuming the data type to be native to the model. For ex., uint8_t.
    [ --use_native_output_files ]
                        Use this option to indicate the data type of the output files,
                           1. float (default): generates the output file as float data.
                           2. native:          generates the output file as datatype native to the source model. i.e. uint8_t.
    [ --float_fallback ]
                        Enables fallback to floating point (FP) instead of fixed point.
                        This option can be paired with --float_bitwidth to indicate the bitwidth for FP (by default 32).
                        If this option is enabled, then input list must not be provided and --override_params must be provided.
                        The external quantization file (encoding file) might be missing quantization parameters for some interim tensors.
                        First it will try to fill the gaps by propagating across math-invariant functions. If the quantization parameters are
                        still missing, it applies fallback to nodes to floating point
    [ --param_quantizer=<val> ]
                        Indicates the weight/bias quantizer to use. Optional and must be followed by one of the following options:
                        "tf": Uses the real min/max of the data and specified bitwidth (default).
                        "enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
                        "symmetric": Ensures min and max have the same absolute values about zero. Data will be stored as int#_t data such that the offset is always 0.
    [ --act_quantizer=<val> ]
                        Indicates the activation quantizer to use. Optional and must be followed by one of the following options:
                        "tf": Uses the real min/max of the data and specified bitwidth (default).
                        "enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
                        "symmetric": Ensures min and max have the same absolute values about zero. Data will be stored as int#_t data such that the offset is always 0.
    [ --bitwidth=<val> ]
                        Note: Deprecated.
                        Selects the bitwidth to use when quantizing the weights/activations/biases; 8 (default) or 16.
                        Cannot be mixed with --weights_bitwidth or --act_bitwidth or --bias_bitwidth.
    [ --weights_bitwidth=<val> ]
                        Selects the bitwidth to use when quantizing the weights; either 4, 8 (default) or 16.
                        8w/16a is only supported by HTA currently.
                        Cannot be mixed with --bitwidth.
    [ --act_bitwidth=<val> ]
                        Selects the bitwidth to use when quantizing the activations; either 8 (default) or 16.
                        8w/16a is only supported by HTA currently.
                        Cannot be mixed with --bitwidth.
    [ --float_bitwidth=<val> ]
                        Selects the bitwidth to use when using float for parameters (weights/biases) and activations for
                        all ops or a specific op (via encodings) selected through encoding; either 32 (default) or 16.
    [ --bias_bitwidth=<val> ]
                        Selects the bitwidth to use when quantizing the biases; either 8 (default) or 32.
                        Using 32-bit biases may sometimes provide a small improvement in accuracy.
                        Cannot be mixed with --bitwidth.
    [ --float_bias_bitwidth=<val> ]
                        Specifies the bitwidth for float bias tensors; either 32 or 16.
                        If not provided and bias is overridden to float in the quantizer, the overriding float tensor's bitwidth will be used.
    [ --axis_quant ]    Note: Deprecated; use --use_per_channel_quantization.
                        Selects per-axis-element quantization for the weights and biases of certain layer types.
                        Only Convolution, Deconvolution, and FullyConnected are supported.
    [ --use_per_channel_quantization ]
                        Selects per-axis-element quantization for the weights and biases of certain layer types.
                        Only Convolution, Deconvolution, and FullyConnected are supported.
    [ --use_per_row_quantization ]
                        Enables row wise quantization of Matmul and FullyConnected ops.
    [ --enable_per_row_quantized_bias ]
                        Enables row wise quantization of bias for FullyConnected op, when weights are per-row quantized.
    [ --restrict_quantization_steps=<val> ]
                        Specifies the number of steps to use for computing quantization encodings such that scale = (max - min) / number of quantization steps.
                        The option should be passed as a comma separated pair of hexadecimal string minimum and maximum values,
                        i.e., --restrict_quantization_steps "MIN,MAX".
                        Note that this is a hexadecimal string literal and not a signed integer, to supply a negative value an explicit minus sign is required,
                        e.g., --restrict_quantization_steps "-0x80,0x7F" indicates an example 8-bit range.
                              --restrict_quantization_steps "-0x8000,0x7F7F" indicates an example 16-bit range.
                        This option only applies to symmetric parameter quantization.


Description:
Generate 8 or 16 bit TensorFlow style fixed point weight and activations encodings for a floating point DLC model.

Additional details:
  • For specifying input_list, refer to input_list argument in snpe-net-run for supported input formats (in order to calculate output activation encoding information for all layers, do not include the line which specifies desired outputs).

  • The tool requires the batch dimension of the DLC input file to be set to 1 during the original model conversion step.

  • An example of quantization using snpe-dlc-quantize can be found in the C/C++ Tutorial section: Running the Inception v3 Model. For details on quantization see Quantized vs Non-Quantized Models.

  • Using snpe-dlc-quantize is mandatory for running on HTA.

  • Using snpe-dlc-quantize is mandatory for running on DSP runtime on Snapdragon 865. It is recommended that offline cache generation be used. It is specified by using –enable_htp option for snpe-dlc-quantize.

  • When using offline cache generation for HTP, the same input(s) tensors or layers and output(s) tensors or layers should be specified when using snpe-dlc-quantize and to run inference on the model using Qualcomm® Neural Processing SDK APIs or snpe-net-run. Not doing so will cause the cache to be invalidated, and graph initialization will take longer.

  • Outputs can be specified for snpe-dlc-quantize by modifying the input_list in the following ways:

    #<output_layer_name>[<space><output_layer_name>]
    %<output_tensor_name>[<space><output_tensor_name>]
    <input_layer_name>:=<input_layer_path>[<space><input_layer_name>:=<input_layer_path>]
    …
    

    Note: Output tensors and layers can be specified individually, but when specifying both, the order shown must be used to specify each.

  • When running a model with an offline generated cache using snpe-net-run:

    • Any output layers specified when snpe-dlc-quantize was called, need to be specified using the input list as shown in the input_list argument to snpe-net-run.

    • Any output tensors specified when snpe-dlc-quantized was called, need to be specified using the –set_output_tensors argument to snpe-net-run. Refer to snpe-net-run for documentation.

  • When using the Qualcomm® Neural Processing SDK API:

    • Any output layers specified when snpe-dlc-quantize was called, need to be specified using the Snpe_SNPEBuilder_SetOutputLayers() function.

    • Any output tensors specified when snpe-dlc-quantize was called, need to be specified using the Snpe_SNPEBuilder_SetOutputTensors() function.


snpe-udo-package-generator

DESCRIPTION:
------------
This tool generates a UDO (User Defined Operation) package using a
user provided config file.

USAGE:
------------
snpe-udo-package-generator [-h] --config_path CONFIG_PATH [--debug]
                                    [--output_path OUTPUT_PATH] [-f]
OPTIONAL ARGUMENTS:
-------------------
    -h, --help            show this help message and exit
    --debug               Returns debugging information from generating the package
    --output_path OUTPUT_PATH, -o OUTPUT_PATH
                        Path where the package should be saved
    -f, --force-generation
                        This option will delete the existing package
                        Note appropriate file permissions must be set to use
                        this option.

REQUIRED_ARGUMENTS:
-------------------
    --config_path CONFIG_PATH, -p CONFIG_PATH
                        The path to a config file that defines a UDO.

qairt-quantizer

The qairt-quantizer tool converts non-quantized DLC models into quantized DLC models.

Basic command line usage looks like:

usage: qairt-quantizer --input_dlc INPUT_DLC [--output_dlc OUTPUT_DLC] [--input_list INPUT_LIST]
                       [--enable_float_fallback] [--apply_algorithms ALGORITHMS [ALGORITHMS ...]]
                       [--bias_bitwidth BIAS_BITWIDTH] [--act_bitwidth ACT_BITWIDTH]
                       [--weights_bitwidth WEIGHTS_BITWIDTH] [--float_bitwidth FLOAT_BITWIDTH]
                       [--float_bias_bitwidth FLOAT_BIAS_BITWIDTH] [--ignore_quantization_overrides]
                       [--use_per_channel_quantization] [--use_per_row_quantization]
                       [--enable_per_row_quantized_bias]
                       [--preserve_io_datatype [PRESERVE_IO_DATATYPE ...]]
                       [--use_native_input_files] [--use_native_output_files]
                       [--restrict_quantization_steps ENCODING_MIN, ENCODING_MAX]
                       [--keep_weights_quantized] [--adjust_bias_encoding]
                       [--act_quantizer_calibration ACT_QUANTIZER_CALIBRATION]
                       [--param_quantizer_calibration PARAM_QUANTIZER_CALIBRATION]
                       [--act_quantizer_schema ACT_QUANTIZER_SCHEMA]
                       [--param_quantizer_schema PARAM_QUANTIZER_SCHEMA]
                       [--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE]
                       [--use_aimet_quantizer] [--op_package_lib OP_PACKAGE_LIB]
                       [--dump_encoding_json] [--config CONFIG_FILE] [--export_stripped_dlc] [-h]
                       [--target_backend BACKEND] [--target_soc_model SOC_MODEL] [--debug [DEBUG]]

required arguments:
  --input_dlc INPUT_DLC, -i INPUT_DLC
                        Path to the dlc container containing the model for which fixed-point
                        encoding metadata should be generated. This argument is required

optional arguments:
  --output_dlc OUTPUT_DLC, -o OUTPUT_DLC
                        Path at which the metadata-included quantized model container should be
                        written.If this argument is omitted, the quantized model will be written at
                        <unquantized_model_name>_quantized.dlc
  --input_list INPUT_LIST, -l INPUT_LIST
                        Path to a file specifying the input data. This file should be a plain text
                        file, containing one or more absolute file paths per line. Each path is
                        expected to point to a binary file containing one input in the "raw" format,
                        ready to be consumed by the quantizer without any further preprocessing.
                        Multiple files per line separated by spaces indicate multiple inputs to the
                        network. See documentation for more details. Must be specified for
                        quantization. All subsequent quantization options are ignored when this is
                        not provided.
  --enable_float_fallback, -f
                        Use this option to enable fallback to floating point (FP) instead of fixed
                        point.
                        This option can be paired with --float_bitwidth to indicate the bitwidth for
                        FP (by default 32).
                        If this option is enabled, then input list must not be provided and
                        --ignore_quantization_overrides must not be provided.
                        The external quantization encodings (encoding file/FakeQuant encodings)
                        might be missing quantization parameters for some interim tensors.
                        First it will try to fill the gaps by propagating across math-invariant
                        functions. If the quantization params are still missing,
                        then it will apply fallback to nodes to floating point.
  --apply_algorithms ALGORITHMS [ALGORITHMS ...]
                        Use this option to enable new optimization algorithms. Usage is:
                        --apply_algorithms <algo_name1> ... The available optimization algorithms
                        are: "cle" - Cross layer equalization includes a number of methods for
                        equalizing weights and biases across layers in order to rectify imbalances
                        that cause quantization errors.
  --bias_bitwidth BIAS_BITWIDTH
                        Use the --bias_bitwidth option to select the bitwidth to use when quantizing
                        the biases, either 8 (default) or 32.
  --act_bitwidth ACT_BITWIDTH
                        Use the --act_bitwidth option to select the bitwidth to use when quantizing
                        the activations, either 8 (default) or 16.
  --weights_bitwidth WEIGHTS_BITWIDTH
                        Use the --weights_bitwidth option to select the bitwidth to use when
                        quantizing the weights, either 4, 8 (default) or 16.
  --float_bitwidth FLOAT_BITWIDTH
                        Use the --float_bitwidth option to select the bitwidth to use for float
                        tensors,either 32 (default) or 16.
  --float_bias_bitwidth FLOAT_BIAS_BITWIDTH
                        Use the --float_bias_bitwidth option to select the bitwidth to use when
                        biases are in float, either 32 or 16 (default '0' if not provided).
  --ignore_quantization_overrides
                        Use only quantizer generated encodings, ignoring any user or model provided
                        encodings.
                        Note: Cannot use --ignore_quantization_overrides with
                        --quantization_overrides (argument of Qairt Converter)
  --use_per_channel_quantization
                        Use this option to enable per-channel quantization for convolution-based op
                        weights.
                        Note: This will only be used if built-in model Quantization-Aware Trained
                        (QAT) encodings are not present for a given weight.
  --use_per_row_quantization
                        Use this option to enable rowwise quantization of Matmul and FullyConnected
                        ops.
  --enable_per_row_quantized_bias
                        Enables row wise quantization of bias for FullyConnected op,
                        when weights are per-row quantized.
  --preserve_io_datatype [PRESERVE_IO_DATATYPE ...]
                        Use this option to preserve IO datatype. The different ways of using this
                        option are as follows:
                            --preserve_io_datatype <space separated list of names of inputs and
                        outputs of the graph>
                        e.g.
                           --preserve_io_datatype input1 input2 output1
                        The user may choose to preserve the datatype for all the inputs and outputs
                        of the graph.
                            --preserve_io_datatype
  --use_native_input_files
                        Boolean flag to indicate how to read input files.
                        If not provided, reads inputs as floats and quantizes if necessary based on
                        quantization parameters in the model. (default)
                        If provided, reads inputs assuming the data type to be native to the model.
                        For ex., uint8_t.
  --use_native_output_files
                        Boolean flag to indicate the data type of the output files
                        If not provided, outputs the file as floats. (default)
                        If provided, outputs the file that is native to the model. For ex., uint8_t.
  --restrict_quantization_steps ENCODING_MIN, ENCODING_MAX
                        Specifies the number of steps to use for computing quantization encodings
                        such that scale = (max - min) / number of quantization steps.
                        The option should be passed as a space separated pair of hexadecimal string
                        minimum and maximum valuesi.e. --restrict_quantization_steps "MIN MAX".
                        Please note that this is a hexadecimal string literal and not a signed
                        integer, to supply a negative value an explicit minus sign is required.
                        E.g.--restrict_quantization_steps "-0x80 0x7F" indicates an example 8 bit
                        range,
                            --restrict_quantization_steps "-0x8000 0x7F7F" indicates an example 16
                        bit range.
                        This argument is required for 16-bit Matmul operations.
  --keep_weights_quantized
                        Use this option to keep the weights quantized even when the output of the op
                        is in floating point. Bias will be converted to floating point as per the
                        output of the op. Required to enable wFxp_actFP configurations according to
                        the provided bitwidth for weights and activations
                        Note: These modes are not supported by all runtimes. Please check
                        corresponding Backend OpDef supplement if these are supported
  --adjust_bias_encoding
                        Use --adjust_bias_encoding option to modify bias encoding and weight
                        encoding to ensure that the bias value is in the range of the bias encoding.
                        This option is only applicable for per-channel quantized weights.
                        NOTE: This may result in clipping of the weight values
  --act_quantizer_calibration ACT_QUANTIZER_CALIBRATION
                        Specify which quantization calibration method to use for activations
                        supported values: min-max (default), sqnr, entropy, mse, percentile
                        This option can be paired with --act_quantizer_schema to override the
                        quantization
                        schema to use for activations otherwise default schema(asymmetric) will be
                        used
  --param_quantizer_calibration PARAM_QUANTIZER_CALIBRATION
                        Specify which quantization calibration method to use for parameters
                        supported values: min-max (default), sqnr, entropy, mse, percentile
                        This option can be paired with --param_quantizer_schema to override the
                        quantization
                        schema to use for parameters otherwise default schema(asymmetric) will be
                        used
  --act_quantizer_schema ACT_QUANTIZER_SCHEMA
                        Specify which quantization schema to use for activations
                        supported values: asymmetric (default), symmetric, unsignedsymmetric
  --param_quantizer_schema PARAM_QUANTIZER_SCHEMA
                        Specify which quantization schema to use for parameters
                        supported values: asymmetric (default), symmetric, unsignedsymmetric
  --percentile_calibration_value PERCENTILE_CALIBRATION_VALUE
                        Specify the percentile value to be used with Percentile calibration method
                        The specified float value must lie within 90 and 100, default: 99.99
  --use_aimet_quantizer
                        Use AIMET for Quantization instead of QNN IR quantizer
  --op_package_lib OP_PACKAGE_LIB, -opl OP_PACKAGE_LIB
                        Use this argument to pass an op package library for quantization. Must be in
                        the form <op_package_lib_path:interfaceProviderName> and be separated by a
                        comma for multiple package libs
  --dump_encoding_json  Use this argument to dump encoding of all the tensors in a json file
  --config CONFIG_FILE, -c CONFIG_FILE
                        Use this argument to pass the path of the config YAML file with quantizer
                        options
  --export_stripped_dlc
                        Use this argument to export a DLC which strips out data not needed for graph
                        composition
  -h, --help            show this help message and exit
  --debug [DEBUG]       Run the quantizer in debug mode.

Backend Options:
  --target_backend BACKEND
                        Use this option to specify the backend on which the model needs to run.
                        Providing this option will generate a graph optimized for the given backend
                        and this graph may not run on other backends. The default backend is HTP.
                        Supported backends are CPU,GPU,DSP,HTP,HTA,LPAI.
  --target_soc_model SOC_MODEL
                        Use this option to specify the SOC on which the model needs to run.
                        This can be found from SOC info of the device and it starts with strings
                        such as SDM, SM, QCS, IPQ, SA, QC, SC, SXR, SSG, STP, QRB, or AIC.
                        NOTE: --target_backend option must be provided to use --target_soc_model
                        option.