Qairt Converter

The 2.21 release introduces a new conversion tool, using a new prefix qairt for Qualcomm AI Runtime. This new prefix communicates that this converter can be used with both the Qualcomm Neural Processing SDK API as well as the Qualcomm AI Engine Direct API.

Note

This tool is still in a Beta release status.

The qairt-converter tool converts a model from one of ONNX/TensorFlow/TFLite/PyTorch framework to a DLC. The DLC contains the model in a Qualcomm graph format to support inference on Qualcomm HW. The converter automatically detects the proper framework based on the source model extension.

Supported frameworks and file types are:

Framework

File Type

Onnx

*.onnx

TensorFlow

*.pb

TFLite

*.tflite

PyTorch

*.pt

Basic Conversion

Basic conversion has only one required argument --input_network. Some frameworks may require additional arguments that are otherwise listed as optional. Please check the help text for more details.

  1. Onnx Conversion

$ qairt-converter --input_network model.onnx
  1. Tensorflow Conversion

Tensorflow requires --desired_input_shape and --out_tensor_node.

$ qairt-converter \
     --input_network inception_v3_2016_08_28_frozen.pb \
     --desired_input_shape input 1,299,299,3 \
     --out_tensor_node InceptionV3/Predictions/Reshape_1

Input/Output Layouts

The default input and output layouts in the converted graph are the same as per the source model. This behavior differs from the legacy converter which would modify the input and (optionally) the output layout to the spatial first format. An example single layer Onnx model (spatial last) is shown below.

../images/qairt-conversion-layout-comparison.png

Input/Output Customization using YAML

Note

This feature allows user to specify their desired input/output tensor layout for the converted model.

Users can provide a yaml configuration file to simplify using different input and output configurations over the command line. All configurations in the yaml are optional. If an option is provided in the yaml configuration and an equivalent option is provided on the command line, the command line option takes priority. The YAML configuration schema is shown below.

Input Tensor Configuration:
  # Input 1
  - Name:
    Src Model Parameters:
        DataType:
        Layout:
    Desired Model Parameters:
        DataType:
        Layout:
        Shape:
        Color Conversion:
        QuantParams:
          Scale:
          Offset:

Output Tensor Configuration:
  # Output 1
  - Name:
    Src Model Parameters:
        DataType:
        Layout:
    Desired Model Parameters:
        DType:
        Layout:
        QuantParams:
          Scale:
          Offset:
  • Name: Name of the input or output tensor present in the model that needs to be customized

  • Src Model Parameters

    These are mandatory if a certain equivalent desired configuration is specified.

    • DataType: Data type of the tensor in source model.

    • Layout: Tensor layout in the source model. Valid values are:

      • NCDHW

      • NDHWC

      • NCHW

      • NHWC

      • NFC

      • NCF

      • NTF

      • TNF

      • NF

      • NC

      • F

      where

      • N = Batch

      • C = Channels

      • D = Depth

      • H = Height

      • W = Width

      • F = Feature

      • T = Time

  • Desired Model Parameters

    • DataType: Desired data type of the tensor in the converted model. Valid values are float32, float16, uint8, int8 datatypes.

    • Layout: Desired tensor layout of the converted model. Same valid values as source layout.

    • Shape: Tensor shape/dimension in the converted model. Valid values are comma separated dimension values, i.e., (a,b,c,d).

    • Color Conversion: Tensor color encoding in the converted model. Value values are BGR, RGB, RGBA, ARGB32, NV21, and NV12.

    • QuantParams: Required when the desired model data type is a quantized data type. Has two sub fields: Scale and Offset.

      • Scale: Scale of the buffer as a float value.

      • Offset: Offset value as an integer.

The --dump_io_config_template option of qairt-converter saves the IO configuration file for the user to update. Pass the --dump_io_config_template option to the qairt-converter to save the IO configuration file to the specified location.

$ qairt-converter \
      --input_network model.onnx \
      --dump_io_config_template <output_folder>/io_config.yaml

QAT encodings

QAT encodings are quantization-aware training encodings which are present in the source graph. They can be present in the following form in the source graph.

  • FakeQuant Nodes: There can be FakeQuant nodes in the source network.

  • Tensor output encodings: Quantization overrides can be associated with the output tensors in the source network.

  • Quant-Dequant Nodes: There can be Quant-Dequant nodes present in the source network.

For all the above cases, the FakeQuant and Quant-Dequant nodes are removed and the quantization overrides are cached in the float DLC generated from the qairt-converter tool. These can then be used with the qairt-quantizer tool.

Note

  • Quantizer throws an error if QAT encodings contain bitwidths other than 8 or 16.

  • Inference fails for CPU and DSP runtimes if QAT encodings contain 16-bit.

Quantization Overrides

Provide quantization overrides to qairt-converter with a JSON file containing the parameters to use for quantization by using the --quantization_overrides option, e.g., --quantization_overrides <overrides.json> These will be cached with the float DLC generated by qairt-converter and can be used with the qairt-quantizer tool.

These will override any quantization data carried from conversion ,e.g., TF fake quantization, or calculated during the normal quantization process. For more details refer to Quantized vs Non-Quantized Models.

Note

  • Quantizer throws an error if overridden encodings contain bitwidths other than 8 or 16.

  • Inference fails for CPU and DSP runtimes if overridden encodings contain 16-bit.

FP16 Conversion

Users also have the ability to generate a float16 graph where all float32 tensors are converted to float16 by passing the --float_bitwidth 16 flag to the qairt-converter tool. To generate a float16 graph with the bias still in float32, an additional --float_bias_bitwidth 32 flag can be passed.

DryRun

Use the --dry_run option to evaluate the model without actually converting any ops. This returns unsupported ops/attributes and unused inputs/outputs.

FAQs

  • How is QAIRT Converter different from Legacy Converters?

    • Single converter vs independent framework converters

      The qairt-converter is a single converter tool supporting conversion for all supported frameworks based on the model extension while legacy converters had different framework specific tools.

    • Changed some optional arguments as default behavior

      The default input and output layouts in the Converted graph will be same as in the Source graph. The legacy ONNX and Pytorch converters may not always retain the input and output layouts from Source graph.

    • Removed deprecated arguments

      Deprecated arguments on the legacy converters are not enabled on the new converter.

    • Renamed some arguments for clarity

      The –input_encoding argument is renamed to –input_color_encoding. Framework-specific arguments have the framework name present. eg- –define_symbol is renamed to –onnx_define_symbol, –show_unconsumed_nodes is renamed to –tf_show_unconsumed_nodes, –signature_name is renamed to –tflite_signature_name.

    • DLC as the Converter output file format

      The QAIRT Converter uses DLC as export format similar to SNPE.

    • What about the changes to Quantization?

      qairt-quantizer is a standalone tool for quantization like snpe-dlc-quant. Please refer to qairt-quantizer for more information and usage details.

  • Will the Converted model be any different with QAIRT converter compared to Legacy Converter?

    • The result of the QAIRT Converter will be different from the result of Legacy Converters in terms of the input/output layout.

    • Legacy converters will by default modify the input tensors to Spatial First (e.g. NHWC) layout. This means for Frameworks like ONNX, where the predominant layout is Spatial Last (e.g. NCHW), the input/output layout is different between the source model and the converted model.

    • Since QAIRT Converter preserves the source layouts be default, the QAIRT-converted graphs in case of many ONNX/Pytorch models will be different from the Legacy-converted graphs.

    • The QAIRT Converter will be enhanced in a future release to support the same layouts as the legacy converters.