Qairt Quantizer¶
The qairt-converter tool now converts non-quantized models into a non-quantized or quantized
DLC file depending on the overrides provided during the Converter step. qairt-quantizer now can be used to quantize all the tensors which
are missing encodings during qairt-converter step (fill in the gaps) or can be used to calibrate the provided encodings through a list of images.
The qairt-quantizer tool is used to quantize the model to one of supported fixed point formats.
For example, the following command will convert an Inception v3 DLC file into a quantized Inception v3 DLC file.
$ qairt-quantizer --input_dlc inception_v3.dlc \
--input_list image_file_list.txt \
--output_dlc inception_v3_quantized.dlc
To properly calculate the ranges for the quantization parameters, a representative set of input data needs to be used as
input into qairt-quantizer using the --input_list parameter.
The --input_list specifies paths to raw image files to be used for calibration during quantization.
For details refer to --input_list argument in qnn-net-run for supported
input formats (in order to calculate output activation encoding information for all layers, do not include the line
which specifies desired outputs).
The tool requires the batch dimension of the DLC input file to be set to 1 during model conversion. The batch dimension can be changed to a different value for inference, by resizing the network during initialization.
Additional details¶
qairt-quantizeris majorly similar tosnpe-dlc-quantwith the following differences:qairt-quantizercan now be used to generate encodings using calibration dataset provided via the--input_listflag for the tensors for the following scenarios:Fill in the gaps: If any tensor is missing encoding during the
qairt-converterstep i.e. the tensors for which override is not specified in--quantization_overridesor source model encodings (QAT).If encodings is not specified for all the tensors via overrides or QAT encodings.
HTP is set as the default backend in the QAIRT quantizer, which may enable certain HTP-specific behaviors that wouldn’t be triggered by default in legacy quantizers where the backend is left empty. This difference can affect how some backend-dependent features behave during conversion/quantization.
For example, during quantization, an optimization called
IntBiasUpdatesis applied to the FullyConnected op if the backend is set toHTPin SNPE, whereas it is always applied in QAIRT.
The external overrides and source model encodings (QAT) are now applied during
qairt-converterstage by default. So the quantizer options to ignore the overrides and source model encodings,--ignore_encodings(legacy) and--ignore_quantization_overridesare now no-op.An alternative option is to the
--export_format=DLC_STRIP_QUANTflag ofqairt-converter, when specified the converter will ignore/remove all the encodings in the source model and output float model which can be recalibrated usingqairt-quantizerand--input_listflag.Another alternative for using this feature is through
qairt-quantizeroptions--input_listand--ignore_quantization_overrides``in combination which signals the quantizer to ignores all the encodings applied during conversion and generates encodings using the calibration dataset provided via ``--input_list.The float fallback feature controlled via command-line option
--enable_float_fallback, present as--float_fallbackin legacy quantizers is also a no-op forqairt-quantizerand can be skipped. The float fallback was added to produce a fully quantized or mixed precision graph by applying encoding overrides or source model encodings, by propagating encodings across data invariant Ops and falling back the missing tensors to float datatype. To simplify the steps, this is taken care duringqairt-converter.qairt-converterapplies the overrides and encodings, and the tensors which are missing encodings will fall back to the default float datatype.To summarize,
qairt-quantizercommand-line arguments--ignore_quantization_overrides, and--enable_float_fallbackare now no-op, and are applied by default duringqairt-converterstep itself.Note
--enable_float_fallbackand--input_listare mutually exclusive options. One of them is mandatory argument for quantizer.
Outputs can be specified for qairt-quantizer by modifying the input_list in the following ways:
#<output_layer_name>[<space><output_layer_name>] %<output_tensor_name>[<space><output_tensor_name>] <input_layer_name>:=<input_layer_path>[<space><input_layer_name>:=<input_layer_path>]
Note: Output tensors and layers can be specified individually, but when specifying both, the order shown must be used to specify each.
qairt-quantizer also supports quantization using AIMET, inplace of default Quantizer, when
--use_aimet_quantizercommand line option is provided. To use AIMET Quantizer, run the setup script to create AIMET specific environment, by executing the following command$ source {SNPE_ROOT}/bin/aimet_env_setup.sh --env_path <path where AIMET venv needs to be created> \ --aimet_sdk_tar <AIMET Torch SDK tarball>Advance AIMET algorithms- AdaRound and AMP is also supported in qairt-quantizer. The user needs to provide a YAML config file through the command line option
--configand specify the algorithm “adaround” or “amp” through--apply_algorithmsalong with--use_aimet_quantizerflag.The template of the YAML file for AMP is shown below:
aimet_quantizer: datasets: <dataset_name>: dataloader_callback: '<path/to/unlabled/dataloader/callback/function>' dataloader_kwargs: {arg1: val, arg2: val2} amp: dataset: <dataset_name>, candidates: [[[8, 'int'], [16, 'int']], [[16, 'float'], [16, 'float']]], allowed_accuracy_drop: 0.02 eval_callback_for_phase2: '<path/to/evaluator/callback/function>'
dataloader_callback is used to set the path of a callback function which returns labeled dataloader of type torch.DataLoader. The data should be in source network input format. dataloader_kwargs is an optional dictionary through which the user can provide keyword arguments of the above defined callback function. dataset is used to specify the name of the dataset that has been defined above. candidates is list of lists for all possible bitwidth values for activations and parameters. allowed_accuracy_drop is used to specify the maximum allowed drop in accuracy from FP32 baseline. The pareto front curve is plotted only till the point where the allowable accuracy drop is met. eval_callback_for_phase2 is used to set the path of the evaluator function which takes predicted value batch as the first argument and ground truth batch as the second argument and returns calculated metric float value.
The template of the YAML file for AdaRound is shown below:
aimet_quantizer: datasets: <dataset_name>: dataloader_callback: '<path/to/unlabled/dataloader/callback/function>' dataloader_kwargs: {arg1: val, arg2: val2} adaround: dataset: <dataset_name> num_batches: 1
dataloader_callback is used to set the path of a callback function which returns unlabeled dataloader of type torch.DataLoader. The data should be in source network input format. dataloader_kwargs is an optional dictionary through which the user can provide keyword arguments of the above defined callback function. dataset is used to specify the name of the dataset that has been defined above. num_batches is used to specify the number of batches to be used for adaround iteration.
AdaRound can also run in default mode, without config file, by just passing “adaround” in the command line option
--apply_algorithmsalong with--use_aimet_quantizerflag. This flow uses the data provided through the input_list option to take rounding decisions.- Note:
AIMET Torch Tarball naming convention should be as follows - aimetpro-release-<VERSION (optionally with build ID)>.torch-<cpu/gpu>-.*.tar.gz. For example, aimetpro-release-x.xx.x.torch-xxx-release.tar.gz.
Once the setup script is run, ensure that AIMET_ENV_PYTHON environment variable is set to <AIMET virtual environment path>/bin/python
Minimum AIMET version supported is, AIMET-1.33.0