Qairt Quantizer

The qairt-converter tool now converts non-quantized models into a non-quantized or quantized DLC file depending on the overrides provided during the Converter step. qairt-quantizer now can be used to quantize all the tensors which are missing encodings during qairt-converter step (fill in the gaps) or can be used to calibrate the provided encodings through a list of images. The qairt-quantizer tool is used to quantize the model to one of supported fixed point formats.

For example, the following command will convert an Inception v3 DLC file into a quantized Inception v3 DLC file.

$ qairt-quantizer --input_dlc inception_v3.dlc \
                  --input_list image_file_list.txt \
                  --output_dlc inception_v3_quantized.dlc

To properly calculate the ranges for the quantization parameters, a representative set of input data needs to be used as input into qairt-quantizer using the --input_list parameter. The --input_list specifies paths to raw image files to be used for calibration during quantization. For details refer to --input_list argument in qnn-net-run for supported input formats (in order to calculate output activation encoding information for all layers, do not include the line which specifies desired outputs).

The tool requires the batch dimension of the DLC input file to be set to 1 during model conversion. The batch dimension can be changed to a different value for inference, by resizing the network during initialization.

Additional details

  • qairt-quantizer is majorly similar to snpe-dlc-quant with the following differences:

    • qairt-quantizer can now be used to generate encodings using calibration dataset provided via the --input_list flag for the tensors for the following scenarios:

      • Fill in the gaps: If any tensor is missing encoding during the qairt-converter step i.e. the tensors for which override is not specified in --quantization_overrides or source model encodings (QAT).

      • If encodings is not specified for all the tensors via overrides or QAT encodings.

    • HTP is set as the default backend in the QAIRT quantizer, which may enable certain HTP-specific behaviors that wouldn’t be triggered by default in legacy quantizers where the backend is left empty. This difference can affect how some backend-dependent features behave during conversion/quantization.

      • For example, during quantization, an optimization called IntBiasUpdates is applied to the FullyConnected op if the backend is set to HTP in SNPE, whereas it is always applied in QAIRT.

    • The external overrides and source model encodings (QAT) are now applied during qairt-converter stage by default. So the quantizer options to ignore the overrides and source model encodings, --ignore_encodings (legacy) and --ignore_quantization_overrides are now no-op.

    • An alternative option is to the --export_format=DLC_STRIP_QUANT flag of qairt-converter, when specified the converter will ignore/remove all the encodings in the source model and output float model which can be recalibrated using qairt-quantizer and --input_list flag.

    • Another alternative for using this feature is through qairt-quantizer options --input_list and --ignore_quantization_overrides``in combination which signals the quantizer to ignores all the encodings applied during conversion and generates encodings using the calibration dataset provided via ``--input_list.

    • The float fallback feature controlled via command-line option --enable_float_fallback, present as --float_fallback in legacy quantizers is also a no-op for qairt-quantizer and can be skipped. The float fallback was added to produce a fully quantized or mixed precision graph by applying encoding overrides or source model encodings, by propagating encodings across data invariant Ops and falling back the missing tensors to float datatype. To simplify the steps, this is taken care during qairt-converter. qairt-converter applies the overrides and encodings, and the tensors which are missing encodings will fall back to the default float datatype.

    • To summarize, qairt-quantizer command-line arguments --ignore_quantization_overrides, and --enable_float_fallback are now no-op, and are applied by default during qairt-converter step itself.

      Note

      --enable_float_fallback and --input_list are mutually exclusive options. One of them is mandatory argument for quantizer.

  • Outputs can be specified for qairt-quantizer by modifying the input_list in the following ways:

    #<output_layer_name>[<space><output_layer_name>]
    %<output_tensor_name>[<space><output_tensor_name>]
    <input_layer_name>:=<input_layer_path>[<space><input_layer_name>:=<input_layer_path>]
    

    Note: Output tensors and layers can be specified individually, but when specifying both, the order shown must be used to specify each.

  • qairt-quantizer also supports quantization using AIMET, inplace of default Quantizer, when --use_aimet_quantizer command line option is provided. To use AIMET Quantizer, run the setup script to create AIMET specific environment, by executing the following command

    $ source {SNPE_ROOT}/bin/aimet_env_setup.sh --env_path <path where AIMET venv needs to be created> \
                                                --aimet_sdk_tar <AIMET Torch SDK tarball>
    
  • Advance AIMET algorithms- AdaRound and AMP is also supported in qairt-quantizer. The user needs to provide a YAML config file through the command line option --config and specify the algorithm “adaround” or “amp” through --apply_algorithms along with --use_aimet_quantizer flag.

  • The template of the YAML file for AMP is shown below:

    aimet_quantizer:
       datasets:
           <dataset_name>:
               dataloader_callback: '<path/to/unlabled/dataloader/callback/function>'
               dataloader_kwargs: {arg1: val, arg2: val2}
    
       amp:
           dataset: <dataset_name>,
           candidates:  [[[8, 'int'], [16, 'int']], [[16, 'float'], [16, 'float']]],
           allowed_accuracy_drop: 0.02
           eval_callback_for_phase2: '<path/to/evaluator/callback/function>'
    

dataloader_callback is used to set the path of a callback function which returns labeled dataloader of type torch.DataLoader. The data should be in source network input format. dataloader_kwargs is an optional dictionary through which the user can provide keyword arguments of the above defined callback function. dataset is used to specify the name of the dataset that has been defined above. candidates is list of lists for all possible bitwidth values for activations and parameters. allowed_accuracy_drop is used to specify the maximum allowed drop in accuracy from FP32 baseline. The pareto front curve is plotted only till the point where the allowable accuracy drop is met. eval_callback_for_phase2 is used to set the path of the evaluator function which takes predicted value batch as the first argument and ground truth batch as the second argument and returns calculated metric float value.

  • The template of the YAML file for AdaRound is shown below:

    aimet_quantizer:
        datasets:
            <dataset_name>:
                dataloader_callback: '<path/to/unlabled/dataloader/callback/function>'
                dataloader_kwargs: {arg1: val, arg2: val2}
    
        adaround:
            dataset: <dataset_name>
            num_batches: 1
    

dataloader_callback is used to set the path of a callback function which returns unlabeled dataloader of type torch.DataLoader. The data should be in source network input format. dataloader_kwargs is an optional dictionary through which the user can provide keyword arguments of the above defined callback function. dataset is used to specify the name of the dataset that has been defined above. num_batches is used to specify the number of batches to be used for adaround iteration.

  • AdaRound can also run in default mode, without config file, by just passing “adaround” in the command line option --apply_algorithms along with --use_aimet_quantizer flag. This flow uses the data provided through the input_list option to take rounding decisions.

    Note:
    1. AIMET Torch Tarball naming convention should be as follows - aimetpro-release-<VERSION (optionally with build ID)>.torch-<cpu/gpu>-.*.tar.gz. For example, aimetpro-release-x.xx.x.torch-xxx-release.tar.gz.

    2. Once the setup script is run, ensure that AIMET_ENV_PYTHON environment variable is set to <AIMET virtual environment path>/bin/python

    3. Minimum AIMET version supported is, AIMET-1.33.0