snpe-dlc-graph-prepare¶
snpe-dlc-graph-prepare is used to perform offline graph preparation on quantized dlcs to run on DSP/HTP runtimes.
Command Line Options:
[ -h, --help ] Displays this help message.
[ --version ] Displays version information.
[ --verbose ] Enable verbose user messages.
[ --quiet ] Disables some user messages.
[ --silent ] Disables all but fatal user messages.
[ --debug=<val> ] Sets the debug log level.
[ --debug1 ] Enables level 1 debug messages.
[ --debug2 ] Enables level 2 debug messages.
[ --debug3 ] Enables level 3 debug messages.
[ --log-mask=<val> ] Sets the debug log mask to set the log level for one or more areas.
Example: ".*=USER_ERROR, .*=INFO, NDK=DEBUG2, NCC=DEBUG3"
[ --log-file=<val> ] Overrides the default name for the debug log file.
[ --log-dir=<val> ] Overrides the default directory path where debug log files are written.
[ --log-file-include-hostname ]
Appends the name of this host to the log file name.
--input_dlc=<val> Path to the dlc container containing the model for which graph cache
should be generated. This argument is required.
[ --output_dlc=<val> ]
Path at which the cached data included model container should be written.
If this argument is omitted, the quantized model will be written at
<input_model_name>_cached.dlc.
[ --set_output_tensors=<val> ]
Specifies a comma separated list of tensors to be output after execution
without whitespace.
[ --set_output_layers=<val> ]
Specifies a comma separated list of layers whose output buffers should be
output after execution, without whitespace.
[ --input_list=<val> ]
Path to a file specifying input images as passed to snpe-net-run. Only
the graph output buffers information specified in the input list (line starting
with # or %, if any) will be used. Paths to the input images will be ignored
[ --htp_socs=<val> ] Specify SoC(s) to generate HTP Offline Cache for. SoCs are specified with an
ASIC identifier, in a comma seperated list without whitespace.
For example --htp_socs sm8350,sm8450,sm8550,sm8650,qcs6490,qcs8550.
This flag and --htp_archs are mutually exclusive.
Default ASIC identifier: sm8650
[ --htp_archs=<val> ]
Specify DSP Architecture(s) to generate general HTP Offline Cache for.
Architectures are specified with an ASIC identifier, in a comma seperated list
without whitespace. For example, --htp_archs v68,v73. This flag cannot be
coupled with --htp_socs or --vtcm_override
[ --vtcm_override=<val> ]
Specify a single value representing the VTCM size in MB for the generated HTP Offline Caches.
For example, --vtcm_override 4. When set to 0, the SoC maximum VTCM size is used and if cache
compatibility mode is set to STRICT the maximum value is checked. This flag can be used with
--htp_socs to override the default SOC VTCM size setting
[ --optimization_level=<val> ]
Specify an optimization level. Valid values are 1, 2 and 3. Default is 2. Higher optimization levels incur
longer offline prepare time but yield more optimal graph and hence faster execution time for most graphs
[ --optimization_preset==<val> ]
Specify an optimization preset. Valid values are any integer value greater than or equal to zero. Default is 0.
These are experimental HTP graph compiler settings that typically affect latency and DRAM bandwidth.
These presets are intended for use with optimization_level=3. Unlike optimization levels, preset values do not follow
a consistent performance pattern. Results may vary depending on the network architecture and software release.
[ --buffer_data_type=<val> ]
Sets data type of IO buffers during prepare. Data Type can be the following:
float32, fixedPoint8, fixedPoint16. Arguments should be formatted as follows:
--buffer_data_type buffer_name1=buffer_name1_data_type
--buffer_data_type buffer_name2=buffer_name2_data_type
(Note: deprecated)
[ --overwrite_cache_records ]
Allow this tool to overwrite over any cache record that exactly matches the requested SoC(s).
Default behavior is to skip (re)generating cache records when a matching cache already exists
[ --use_float_io ] Prepare quantized HTP Graph to operate with floating point inputs/outputs (Note: deprecated)
[ --htp_dlbc=<val> ] Specify Deep Learning Bandwidth Compression (DLBC) for this HTP graph. The default setting is OFF.
To turn on, specify it as --htp_dlbc=true
[ --num_hvx_threads=<val> ]
Specify the number of HVX threads to reserve for this HTP graph. Must be greater than 0.
[ --input_name=<val> ]
Specifies the name of input for which dimensions are specified
e.g. --input_name=<input name>
If the DLC has multiple graphs, graph names are required.
Quotes are required if graph names are specified. Graph names can be found in snpe-dlc-info.
e.g. --input_name="<graph name> <input name>"
[ --input_dimensions=<val> ]
Specifies new dimensions for input whose name is specified in input_name.
e.g. --input_dimension=1,224,224,3
For multiple inputs, specify --input_name=<input name> and --input_dimensions=<input dimensions> multiple times.
If the DLC has multiple graphs, graph names are required.
Quotes are required if graph names are specified. Graph names can be found in snpe-dlc-info.
e.g. --input_dimensions="<graph name> 1,224,224,3"
[ --memorymapped_buffer_hint=<val> ]
Specifies memory-mapped buffers hint. The default setting is OFF.
To turn on, specify it as --memorymapped_buffer_hint=true
[ --udo_package_path=<val> ]
Use this option to specify path to the Registration Library for UDO Package(s). Usage is:
--udo_package_path=<path_to_reg_lib>
Optionally, user can provide multiple packages as a comma-separated list.
This option must be specified for Networks with UDO. All UDO's in Network must have host executable CPU Implementation
For detailed information on how to use the tool, please refer to Offline Graph Caching for DSP Runtime on HTP
snpe-dlc-quant¶
snpe-dlc-quant converts non-quantized DLC models into quantized DLC models.
Command Line Options:
[ -h,--help ] Displays this help message.
[ --version ] Displays version information.
[ --verbose ] Enable verbose user messages.
[ --quiet ] Disables some user messages.
[ --silent ] Disables all but fatal user messages.
[ --debug=<val> ] Sets the debug log level.
[ --debug1 ] Enables level 1 debug messages.
[ --debug2 ] Enables level 2 debug messages.
[ --debug3 ] Enables level 3 debug messages.
[ --log-mask=<val> ] Sets the debug log mask to set the log level for one or more areas.
Example: ".*=USER_ERROR, .*=INFO, NDK=DEBUG2, NCC=DEBUG3"
[ --log-file=<val> ] Overrides the default name for the debug log file.
[ --log-dir=<val> ] Overrides the default directory path where debug log files are written.
[ --log-file-include-hostname ]
Appends the name of this host to the log file name.
[ --input_dlc=<val> ]
Path to the dlc container containing the model for which fixed-point encoding
metadata should be generated. This argument is required.
[ --input_list=<val> ]
Path to a file specifying the trial inputs. This file should be a plain text file,
containing one or more absolute file paths per line. These files will be taken to constitute
the trial set. Each path is expected to point to a binary file containing one trial input
in the 'raw' format, ready to be consumed by the tool without any further modifications.
This is similar to how input is provided to snpe-net-run application.
[ --no_weight_quantization ]
Note: Deprecated.
[ --output_dlc=<val> ]
Path at which the metadata-included quantized model container should be written.
If this argument is omitted, the quantized model will be written at <unquantized_model_name>_quantized.dlc.
[ --use_enhanced_quantizer ]
Note: Deprecated; use --param_quantizer and/or --act_quantizer.
Use the enhanced quantizer feature when quantizing the model. Regular quantization determines the range using the actual
values of min and max of the data being quantized. Enhanced quantization uses an algorithm to determine optimal range. It can be
useful for quantizing models that have long tails in the distribution of the data being quantized.
[ --use_adjusted_weights_quantizer ]
Note: Deprecated; use --param_quantizer.
Use the adjusted tf quantizer for quantizing the weights only. This might be helpful for improving the accuracy of some models,
such as denoise model as being tested. This option is only used when quantizing the weights with 8 bit.
[ --optimizations=<val> ]
Note: Deprecated; use --algorithms.
Enables new optimization algorithms. Usage is:
--optimizations <algo_name1> --optimizations <algo_name2>
Available optimization algorithms are:
"cle" - Cross layer equalization includes a number of methods for equalizing weights
and biases across layers in order to rectify imbalances that cause quantization errors.
[ --algorithms=<val> ]
Enables new optimization algorithms. Usage is:
--algorithms <algo_name1> --algorithms <algo_name2>
Available optimization algorithms are:
"cle" - Cross layer equalization includes a number of methods for equalizing weights
and biases across layers in order to rectify imbalances that cause quantization errors.
[ --override_params ]
Use this option to override quantization parameters when quantization was provided from the original source framework (eg TF fake quantization).
Note: Quantizer throws an error if overridden encodings contain unsupported bitwidths.
[ --use_encoding_optimizations ]
Note: Deprecated.
[ --udo_package_path=<val> ]
Specifies the path to the registration library for UDO package(s). Usage is:
--udo_package_path=<path_to_reg_lib>
You can (optionally) provide multiple packages as a comma-separated list.
This option must be specified for networks with UDO. All UDO's in a network must have a host-executable CPU implementation.
[ --use_symmetric_quantize_weights ]
Note: Deprecated, use --param_quantizer.
Use the symmetric quantizer feature when quantizing the weights of the model. It makes sure min and max have the
same absolute values about zero. Symmetrically quantized data will also be stored as int#_t data such that the offset is always 0.
[ --use_native_dtype ]
Note: This option is deprecated, use --use_native_input_files option in future.
Use this option to indicate how to read input files,
1. float (default): reads inputs as floats and quantizes if necessary based on quantization parameters in the model.
2. native: reads inputs assuming the data type to be native to the model. For ex., uint8_t.
[ --use_native_input_files ]
Use this option to indicate how to read input files,
1. float (default): reads inputs as floats and quantizes if necessary based on quantization parameters in the model.
2. native: reads inputs assuming the data type to be native to the model. For ex., uint8_t.
[ --use_native_output_files ]
Use this option to indicate the data type of the output files,
1. float (default): generates the output file as float data.
2. native: generates the output file as datatype native to the source model. i.e. uint8_t.
[ --float_fallback ]
Enables fallback to floating point (FP) instead of fixed point.
This option can be paired with --float_bitwidth to indicate the bitwidth for FP (by default 32).
If this option is enabled, then input list must not be provided and --override_params must be provided.
The external quantization file (encoding file) might be missing quantization parameters for some interim tensors.
First it will try to fill the gaps by propagating across math-invariant functions. If the quantization parameters are
still missing, it applies fallback to nodes to floating point
[ --param_quantizer=<val> ]
Indicates the weight/bias quantizer to use. Optional and must be followed by one of the following options:
"tf": Uses the real min/max of the data and specified bitwidth (default).
"enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
"symmetric": Ensures min and max have the same absolute values about zero. Data will be stored as int#_t data such that the offset is always 0.
[ --act_quantizer=<val> ]
Indicates the activation quantizer to use. Optional and must be followed by one of the following options:
"tf": Uses the real min/max of the data and specified bitwidth (default).
"enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
"symmetric": Ensures min and max have the same absolute values about zero. Data will be stored as int#_t data such that the offset is always 0.
[ --bitwidth=<val> ]
Note: Deprecated.
Selects the bitwidth to use when quantizing the weights/activations/biases; 8 (default) or 16.
Cannot be mixed with --weights_bitwidth or --act_bitwidth or --bias_bitwidth.
[ --weights_bitwidth=<val> ]
Selects the bitwidth to use when quantizing the weights; either 4, 8 (default) or 16.
8w/16a is only supported by HTA currently.
Cannot be mixed with --bitwidth.
[ --act_bitwidth=<val> ]
Selects the bitwidth to use when quantizing the activations; either 8 (default) or 16.
8w/16a is only supported by HTA currently.
Cannot be mixed with --bitwidth.
[ --float_bitwidth=<val> ]
Selects the bitwidth to use when using float for parameters (weights/biases) and activations for
all ops or a specific op (via encodings) selected through encoding; either 32 (default) or 16.
[ --bias_bitwidth=<val> ]
Selects the bitwidth to use when quantizing the biases; either 8 (default) or 32.
Using 32-bit biases may sometimes provide a small improvement in accuracy.
Cannot be mixed with --bitwidth.
[ --float_bias_bitwidth=<val> ]
Specifies the bitwidth for float bias tensors; either 32 or 16.
If not provided and bias is overridden to float in the quantizer, the overriding float tensor's bitwidth will be used.
[ --axis_quant ] Note: Deprecated; use --use_per_channel_quantization.
Selects per-axis-element quantization for the weights and biases of certain layer types.
Only Convolution, Deconvolution, and FullyConnected are supported.
[ --use_per_channel_quantization ]
Selects per-axis-element quantization for the weights and biases of certain layer types.
Only Convolution, Deconvolution, and FullyConnected are supported.
[ --use_per_row_quantization ]
Enables row wise quantization of Matmul and FullyConnected ops.
[ --enable_per_row_quantized_bias ]
Enables row wise quantization of bias for FullyConnected op, when weights are per-row quantized.
[ --restrict_quantization_steps=<val> ]
Specifies the number of steps to use for computing quantization encodings such that scale = (max - min) / number of quantization steps.
The option should be passed as a comma separated pair of hexadecimal string minimum and maximum values,
i.e., --restrict_quantization_steps "MIN,MAX".
Note that this is a hexadecimal string literal and not a signed integer, to supply a negative value an explicit minus sign is required,
e.g., --restrict_quantization_steps "-0x80,0x7F" indicates an example 8-bit range.
--restrict_quantization_steps "-0x8000,0x7F7F" indicates an example 16-bit range.
This option only applies to symmetric parameter quantization.
Description:
Generate 8 or 16 bit TensorFlow style fixed point weight and activations encodings for a floating point DLC.
For specifying input_list, refer to input_list argument in snpe-net-run for supported input formats (in order to calculate output activation encoding information for all layers, do not include the line which specifies desired outputs).
The tool requires the batch dimension of the DLC input file to be set to 1 during the original model conversion step.
An example of quantization using snpe-dlc-quant can be found in the C/C++ Tutorial section: Running the Inception v3 Model. For details on quantization see Quantized vs Non-Quantized Models.
Outputs can be specified for snpe-dlc-quant by modifying the input_list in the following ways:
#<output_layer_name>[<space><output_layer_name>] %<output_tensor_name>[<space><output_tensor_name>] <input_layer_name>:=<input_layer_path>[<space><input_layer_name>:=<input_layer_path>] …
Note: Output tensors and layers can be specified individually, but when specifying both, the order shown must be used to specify each.
When using the Qualcomm® Neural Processing SDK API:
Any output layers specified when snpe-dlc-quant was called, need to be specified using the Snpe_SNPEBuilder_SetOutputLayers() function.
Any output tensors specified when snpe-dlc-quant was called, need to be specified using the Snpe_SNPEBuilder_SetOutputTensors() function.
snpe-dlc-quantize¶
snpe-dlc-quantize converts non-quantized DLC models into quantized DLC models.
Command Line Options:
[ -h,--help ] Displays this help message.
[ --version ] Displays version information.
[ --verbose ] Enable verbose user messages.
[ --quiet ] Disables some user messages.
[ --silent ] Disables all but fatal user messages.
[ --debug=<val> ] Sets the debug log level.
[ --debug1 ] Enables level 1 debug messages.
[ --debug2 ] Enables level 2 debug messages.
[ --debug3 ] Enables level 3 debug messages.
[ --log-mask=<val> ] Sets the debug log mask to set the log level for one or more areas.
Example: ".*=USER_ERROR, .*=INFO, NDK=DEBUG2, NCC=DEBUG3"
[ --log-file=<val> ] Overrides the default name for the debug log file.
[ --log-dir=<val> ] Overrides the default directory path where debug log files are written.
[ --log-file-include-hostname ]
Appends the name of this host to the log file name.
[ --input_dlc=<val> ]
Path to the dlc container containing the model for which fixed-point encoding
metadata should be generated. This argument is required.
[ --input_list=<val> ]
Path to a file specifying the trial inputs. This file should be a plain text file,
containing one or more absolute file paths per line. These files will be taken to constitute
the trial set. Each path is expected to point to a binary file containing one trial input
in the 'raw' format, ready to be consumed by the tool without any further modifications.
This is similar to how input is provided to snpe-net-run application.
[ --no_weight_quantization ]
Note: Deprecated.
[ --output_dlc=<val> ]
Path at which the metadata-included quantized model container should be written.
If this argument is omitted, the quantized model will be written at <unquantized_model_name>_quantized.dlc.
[ --enable_htp ] Pack HTP information in quantized DLC.
[ --htp_socs=<val> ] Specify SoC to generate HTP Offline Cache for.
SoCs are specified with an ASIC identifier, in a comma separated list.
For example, --htp_socs sm8650
[ --overwrite_cache_records ]
Overwrite HTP cache records present in the DLC.
[ --use_float_io ]
Pack HTP information in quantized DLC (Note: deprecated).
[ --use_enhanced_quantizer ]
Note: Deprecated; use --param_quantizer and/or --act_quantizer.
Use the enhanced quantizer feature when quantizing the model. Regular quantization determines the range using the actual
values of min and max of the data being quantized. Enhanced quantization uses an algorithm to determine optimal range. It can be
useful for quantizing models that have long tails in the distribution of the data being quantized.
[ --use_adjusted_weights_quantizer ]
Note: Deprecated; use --param_quantizer.
Use the adjusted tf quantizer for quantizing the weights only. This might be helpful for improving the accuracy of some models,
such as denoise model as being tested. This option is only used when quantizing the weights with 8 bit.
[ --optimizations=<val> ]
Note: Deprecated; use --algorithms.
Enables new optimization algorithms. Usage is:
--optimizations <algo_name1> --optimizations <algo_name2>
Available optimization algorithms are:
"cle" - Cross layer equalization includes a number of methods for equalizing weights
and biases across layers in order to rectify imbalances that cause quantization errors.
[ --algorithms=<val> ]
Enables new optimization algorithms. Usage is:
--algorithms <algo_name1> --algorithms <algo_name2>
Available optimization algorithms are:
"cle" - Cross layer equalization includes a number of methods for equalizing weights
and biases across layers in order to rectify imbalances that cause quantization errors.
[ --override_params ]
Use this option to override quantization parameters when quantization was provided from the original source framework (eg TF fake quantization).
Note: Quantizer throws an error if overridden encodings contain unsupported bitwidths.
[ --use_encoding_optimizations ]
Note: Deprecated.
Use this option to enable quantization encoding optimizations. This can reduce requantization in the graph and may improve accuracy for some models.
[ --udo_package_path=<val> ]
Specifies the path to the registration library for UDO package(s). Usage is:
--udo_package_path=<path_to_reg_lib>
You can (optionally) provide multiple packages as a comma-separated list.
This option must be specified for networks with UDO. All UDO's in a network must have a host-executable CPU implementation.
[ --use_symmetric_quantize_weights ]
Note: Deprecated, use --param_quantizer.
Use the symmetric quantizer feature when quantizing the weights of the model. It makes sure min and max have the
same absolute values about zero. Symmetrically quantized data will also be stored as int#_t data such that the offset is always 0.
[ --use_native_dtype ]
Note: This option is deprecated, use --use_native_input_files option in future.
Use this option to indicate how to read input files,
1. float (default): reads inputs as floats and quantizes if necessary based on quantization parameters in the model.
2. native: reads inputs assuming the data type to be native to the model. For ex., uint8_t.
[ --use_native_input_files ]
Use this option to indicate how to read input files,
1. float (default): reads inputs as floats and quantizes if necessary based on quantization parameters in the model.
2. native: reads inputs assuming the data type to be native to the model. For ex., uint8_t.
[ --use_native_output_files ]
Use this option to indicate the data type of the output files,
1. float (default): generates the output file as float data.
2. native: generates the output file as datatype native to the source model. i.e. uint8_t.
[ --float_fallback ]
Enables fallback to floating point (FP) instead of fixed point.
This option can be paired with --float_bitwidth to indicate the bitwidth for FP (by default 32).
If this option is enabled, then input list must not be provided and --override_params must be provided.
The external quantization file (encoding file) might be missing quantization parameters for some interim tensors.
First it will try to fill the gaps by propagating across math-invariant functions. If the quantization parameters are
still missing, it applies fallback to nodes to floating point
[ --param_quantizer=<val> ]
Indicates the weight/bias quantizer to use. Optional and must be followed by one of the following options:
"tf": Uses the real min/max of the data and specified bitwidth (default).
"enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
"symmetric": Ensures min and max have the same absolute values about zero. Data will be stored as int#_t data such that the offset is always 0.
[ --act_quantizer=<val> ]
Indicates the activation quantizer to use. Optional and must be followed by one of the following options:
"tf": Uses the real min/max of the data and specified bitwidth (default).
"enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution.
"symmetric": Ensures min and max have the same absolute values about zero. Data will be stored as int#_t data such that the offset is always 0.
[ --bitwidth=<val> ]
Note: Deprecated.
Selects the bitwidth to use when quantizing the weights/activations/biases; 8 (default) or 16.
Cannot be mixed with --weights_bitwidth or --act_bitwidth or --bias_bitwidth.
[ --weights_bitwidth=<val> ]
Selects the bitwidth to use when quantizing the weights; either 4, 8 (default) or 16.
8w/16a is only supported by HTA currently.
Cannot be mixed with --bitwidth.
[ --act_bitwidth=<val> ]
Selects the bitwidth to use when quantizing the activations; either 8 (default) or 16.
8w/16a is only supported by HTA currently.
Cannot be mixed with --bitwidth.
[ --float_bitwidth=<val> ]
Selects the bitwidth to use when using float for parameters (weights/biases) and activations for
all ops or a specific op (via encodings) selected through encoding; either 32 (default) or 16.
[ --bias_bitwidth=<val> ]
Selects the bitwidth to use when quantizing the biases; either 8 (default) or 32.
Using 32-bit biases may sometimes provide a small improvement in accuracy.
Cannot be mixed with --bitwidth.
[ --float_bias_bitwidth=<val> ]
Specifies the bitwidth for float bias tensors; either 32 or 16.
If not provided and bias is overridden to float in the quantizer, the overriding float tensor's bitwidth will be used.
[ --axis_quant ] Note: Deprecated; use --use_per_channel_quantization.
Selects per-axis-element quantization for the weights and biases of certain layer types.
Only Convolution, Deconvolution, and FullyConnected are supported.
[ --use_per_channel_quantization ]
Selects per-axis-element quantization for the weights and biases of certain layer types.
Only Convolution, Deconvolution, and FullyConnected are supported.
[ --use_per_row_quantization ]
Enables row wise quantization of Matmul and FullyConnected ops.
[ --enable_per_row_quantized_bias ]
Enables row wise quantization of bias for FullyConnected op, when weights are per-row quantized.
[ --restrict_quantization_steps=<val> ]
Specifies the number of steps to use for computing quantization encodings such that scale = (max - min) / number of quantization steps.
The option should be passed as a comma separated pair of hexadecimal string minimum and maximum values,
i.e., --restrict_quantization_steps "MIN,MAX".
Note that this is a hexadecimal string literal and not a signed integer, to supply a negative value an explicit minus sign is required,
e.g., --restrict_quantization_steps "-0x80,0x7F" indicates an example 8-bit range.
--restrict_quantization_steps "-0x8000,0x7F7F" indicates an example 16-bit range.
This option only applies to symmetric parameter quantization.
Description:
Generate 8 or 16 bit TensorFlow style fixed point weight and activations encodings for a floating point DLC model.
For specifying input_list, refer to input_list argument in snpe-net-run for supported input formats (in order to calculate output activation encoding information for all layers, do not include the line which specifies desired outputs).
The tool requires the batch dimension of the DLC input file to be set to 1 during the original model conversion step.
An example of quantization using snpe-dlc-quantize can be found in the C/C++ Tutorial section: Running the Inception v3 Model. For details on quantization see Quantized vs Non-Quantized Models.
Using snpe-dlc-quantize is mandatory for running on HTA.
Using snpe-dlc-quantize is mandatory for running on DSP runtime on Snapdragon 865. It is recommended that offline cache generation be used. It is specified by using –enable_htp option for snpe-dlc-quantize.
When using offline cache generation for HTP, the same input(s) tensors or layers and output(s) tensors or layers should be specified when using snpe-dlc-quantize and to run inference on the model using Qualcomm® Neural Processing SDK APIs or snpe-net-run. Not doing so will cause the cache to be invalidated, and graph initialization will take longer.
Outputs can be specified for snpe-dlc-quantize by modifying the input_list in the following ways:
#<output_layer_name>[<space><output_layer_name>] %<output_tensor_name>[<space><output_tensor_name>] <input_layer_name>:=<input_layer_path>[<space><input_layer_name>:=<input_layer_path>] …
Note: Output tensors and layers can be specified individually, but when specifying both, the order shown must be used to specify each.
When running a model with an offline generated cache using snpe-net-run:
Any output layers specified when snpe-dlc-quantize was called, need to be specified using the input list as shown in the input_list argument to snpe-net-run.
Any output tensors specified when snpe-dlc-quantized was called, need to be specified using the –set_output_tensors argument to snpe-net-run. Refer to snpe-net-run for documentation.
When using the Qualcomm® Neural Processing SDK API:
Any output layers specified when snpe-dlc-quantize was called, need to be specified using the Snpe_SNPEBuilder_SetOutputLayers() function.
Any output tensors specified when snpe-dlc-quantize was called, need to be specified using the Snpe_SNPEBuilder_SetOutputTensors() function.
snpe-udo-package-generator¶
DESCRIPTION:
------------
This tool generates a UDO (User Defined Operation) package using a
user provided config file.
USAGE:
------------
snpe-udo-package-generator [-h] --config_path CONFIG_PATH [--debug]
[--output_path OUTPUT_PATH] [-f]
OPTIONAL ARGUMENTS:
-------------------
-h, --help show this help message and exit
--debug Returns debugging information from generating the package
--output_path OUTPUT_PATH, -o OUTPUT_PATH
Path where the package should be saved
-f, --force-generation
This option will delete the existing package
Note appropriate file permissions must be set to use
this option.
REQUIRED_ARGUMENTS:
-------------------
--config_path CONFIG_PATH, -p CONFIG_PATH
The path to a config file that defines a UDO.
qairt-quantizer¶
The qairt-quantizer tool converts non-quantized DLC models into quantized DLC models.
Basic command line usage looks like:
usage: qairt-quantizer --input_dlc INPUT_DLC [--output_dlc OUTPUT_DLC] [--input_list INPUT_LIST]
[--enable_float_fallback] [--apply_algorithms ALGORITHMS [ALGORITHMS ...]]
[--bias_bitwidth BIAS_BITWIDTH] [--act_bitwidth ACT_BITWIDTH]
[--weights_bitwidth WEIGHTS_BITWIDTH] [--float_bitwidth FLOAT_BITWIDTH]
[--float_bias_bitwidth FLOAT_BIAS_BITWIDTH] [--ignore_quantization_overrides]
[--use_per_channel_quantization] [--use_per_row_quantization]
[--enable_per_row_quantized_bias]
[--preserve_io_datatype [PRESERVE_IO_DATATYPE ...]]
[--use_native_input_files] [--use_native_output_files]
[--restrict_quantization_steps ENCODING_MIN, ENCODING_MAX]
[--keep_weights_quantized] [--adjust_bias_encoding]
[--act_quantizer_calibration ACT_QUANTIZER_CALIBRATION]
[--param_quantizer_calibration PARAM_QUANTIZER_CALIBRATION]
[--act_quantizer_schema ACT_QUANTIZER_SCHEMA]
[--param_quantizer_schema PARAM_QUANTIZER_SCHEMA]
[--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE]
[--use_aimet_quantizer] [--op_package_lib OP_PACKAGE_LIB]
[--dump_encoding_json] [--config CONFIG_FILE] [--export_stripped_dlc] [-h]
[--target_backend BACKEND] [--target_soc_model SOC_MODEL] [--debug [DEBUG]]
required arguments:
--input_dlc INPUT_DLC, -i INPUT_DLC
Path to the dlc container containing the model for which fixed-point
encoding metadata should be generated. This argument is required
optional arguments:
--output_dlc OUTPUT_DLC, -o OUTPUT_DLC
Path at which the metadata-included quantized model container should be
written.If this argument is omitted, the quantized model will be written at
<unquantized_model_name>_quantized.dlc
--input_list INPUT_LIST, -l INPUT_LIST
Path to a file specifying the input data. This file should be a plain text
file, containing one or more absolute file paths per line. Each path is
expected to point to a binary file containing one input in the "raw" format,
ready to be consumed by the quantizer without any further preprocessing.
Multiple files per line separated by spaces indicate multiple inputs to the
network. See documentation for more details. Must be specified for
quantization. All subsequent quantization options are ignored when this is
not provided.
--enable_float_fallback, -f
Use this option to enable fallback to floating point (FP) instead of fixed
point.
This option can be paired with --float_bitwidth to indicate the bitwidth for
FP (by default 32).
If this option is enabled, then input list must not be provided and
--ignore_quantization_overrides must not be provided.
The external quantization encodings (encoding file/FakeQuant encodings)
might be missing quantization parameters for some interim tensors.
First it will try to fill the gaps by propagating across math-invariant
functions. If the quantization params are still missing,
then it will apply fallback to nodes to floating point.
--apply_algorithms ALGORITHMS [ALGORITHMS ...]
Use this option to enable new optimization algorithms. Usage is:
--apply_algorithms <algo_name1> ... The available optimization algorithms
are: "cle" - Cross layer equalization includes a number of methods for
equalizing weights and biases across layers in order to rectify imbalances
that cause quantization errors.
--bias_bitwidth BIAS_BITWIDTH
Use the --bias_bitwidth option to select the bitwidth to use when quantizing
the biases, either 8 (default) or 32.
--act_bitwidth ACT_BITWIDTH
Use the --act_bitwidth option to select the bitwidth to use when quantizing
the activations, either 8 (default) or 16.
--weights_bitwidth WEIGHTS_BITWIDTH
Use the --weights_bitwidth option to select the bitwidth to use when
quantizing the weights, either 4, 8 (default) or 16.
--float_bitwidth FLOAT_BITWIDTH
Use the --float_bitwidth option to select the bitwidth to use for float
tensors,either 32 (default) or 16.
--float_bias_bitwidth FLOAT_BIAS_BITWIDTH
Use the --float_bias_bitwidth option to select the bitwidth to use when
biases are in float, either 32 or 16 (default '0' if not provided).
--ignore_quantization_overrides
Use only quantizer generated encodings, ignoring any user or model provided
encodings.
Note: Cannot use --ignore_quantization_overrides with
--quantization_overrides (argument of Qairt Converter)
--use_per_channel_quantization
Use this option to enable per-channel quantization for convolution-based op
weights.
Note: This will only be used if built-in model Quantization-Aware Trained
(QAT) encodings are not present for a given weight.
--use_per_row_quantization
Use this option to enable rowwise quantization of Matmul and FullyConnected
ops.
--enable_per_row_quantized_bias
Enables row wise quantization of bias for FullyConnected op,
when weights are per-row quantized.
--preserve_io_datatype [PRESERVE_IO_DATATYPE ...]
Use this option to preserve IO datatype. The different ways of using this
option are as follows:
--preserve_io_datatype <space separated list of names of inputs and
outputs of the graph>
e.g.
--preserve_io_datatype input1 input2 output1
The user may choose to preserve the datatype for all the inputs and outputs
of the graph.
--preserve_io_datatype
--use_native_input_files
Boolean flag to indicate how to read input files.
If not provided, reads inputs as floats and quantizes if necessary based on
quantization parameters in the model. (default)
If provided, reads inputs assuming the data type to be native to the model.
For ex., uint8_t.
--use_native_output_files
Boolean flag to indicate the data type of the output files
If not provided, outputs the file as floats. (default)
If provided, outputs the file that is native to the model. For ex., uint8_t.
--restrict_quantization_steps ENCODING_MIN, ENCODING_MAX
Specifies the number of steps to use for computing quantization encodings
such that scale = (max - min) / number of quantization steps.
The option should be passed as a space separated pair of hexadecimal string
minimum and maximum valuesi.e. --restrict_quantization_steps "MIN MAX".
Please note that this is a hexadecimal string literal and not a signed
integer, to supply a negative value an explicit minus sign is required.
E.g.--restrict_quantization_steps "-0x80 0x7F" indicates an example 8 bit
range,
--restrict_quantization_steps "-0x8000 0x7F7F" indicates an example 16
bit range.
This argument is required for 16-bit Matmul operations.
--keep_weights_quantized
Use this option to keep the weights quantized even when the output of the op
is in floating point. Bias will be converted to floating point as per the
output of the op. Required to enable wFxp_actFP configurations according to
the provided bitwidth for weights and activations
Note: These modes are not supported by all runtimes. Please check
corresponding Backend OpDef supplement if these are supported
--adjust_bias_encoding
Use --adjust_bias_encoding option to modify bias encoding and weight
encoding to ensure that the bias value is in the range of the bias encoding.
This option is only applicable for per-channel quantized weights.
NOTE: This may result in clipping of the weight values
--act_quantizer_calibration ACT_QUANTIZER_CALIBRATION
Specify which quantization calibration method to use for activations
supported values: min-max (default), sqnr, entropy, mse, percentile
This option can be paired with --act_quantizer_schema to override the
quantization
schema to use for activations otherwise default schema(asymmetric) will be
used
--param_quantizer_calibration PARAM_QUANTIZER_CALIBRATION
Specify which quantization calibration method to use for parameters
supported values: min-max (default), sqnr, entropy, mse, percentile
This option can be paired with --param_quantizer_schema to override the
quantization
schema to use for parameters otherwise default schema(asymmetric) will be
used
--act_quantizer_schema ACT_QUANTIZER_SCHEMA
Specify which quantization schema to use for activations
supported values: asymmetric (default), symmetric, unsignedsymmetric
--param_quantizer_schema PARAM_QUANTIZER_SCHEMA
Specify which quantization schema to use for parameters
supported values: asymmetric (default), symmetric, unsignedsymmetric
--percentile_calibration_value PERCENTILE_CALIBRATION_VALUE
Specify the percentile value to be used with Percentile calibration method
The specified float value must lie within 90 and 100, default: 99.99
--use_aimet_quantizer
Use AIMET for Quantization instead of QNN IR quantizer
--op_package_lib OP_PACKAGE_LIB, -opl OP_PACKAGE_LIB
Use this argument to pass an op package library for quantization. Must be in
the form <op_package_lib_path:interfaceProviderName> and be separated by a
comma for multiple package libs
--dump_encoding_json Use this argument to dump encoding of all the tensors in a json file
--config CONFIG_FILE, -c CONFIG_FILE
Use this argument to pass the path of the config YAML file with quantizer
options
--export_stripped_dlc
Use this argument to export a DLC which strips out data not needed for graph
composition
-h, --help show this help message and exit
--debug [DEBUG] Run the quantizer in debug mode.
Backend Options:
--target_backend BACKEND
Use this option to specify the backend on which the model needs to run.
Providing this option will generate a graph optimized for the given backend
and this graph may not run on other backends. The default backend is HTP.
Supported backends are CPU,GPU,DSP,HTP,HTA,LPAI.
--target_soc_model SOC_MODEL
Use this option to specify the SOC on which the model needs to run.
This can be found from SOC info of the device and it starts with strings
such as SDM, SM, QCS, IPQ, SA, QC, SC, SXR, SSG, STP, QRB, or AIC.
NOTE: --target_backend option must be provided to use --target_soc_model
option.