Converters¶
This page describes the general conversion process, the expected inputs and generated outputs, and provides examples of usage.
Overview¶
Qualcomm® AI Engine Direct currently supports converters for four frameworks: Tensorflow, TFLite, PyTorch, and Onnx. Each converter, at a minimum, requires the original framework model as input to generate a Qualcomm® AI Engine Direct Model. For additional required inputs please refer to the framework specific sections below.
The flow for each converter is the same:
Converter Workflow
There are four main parts to each converter:
The front end translation which handles converting the original framework model into the common intermediate represention (IR)
The common IR code which contains graph and IR operation definitions as well as various graph optimizations that can be applied to translated graphs.
Quantizer, which is optionally invoked to quantize the model prior to the final lowering to QNN. See Quantization for more information.
The Qnn converter backend which is responsible for lowering the IR into the final QnnModel API calls.
All the converters share the same IR code and QNN converter backend. The output for each converter is the same,
a model.cpp or model.cpp/model.bin which contains the final converted QNN graph. The converted model.cpp contains two functions: QnnModel_composeGraphs and QnnModel_freeGraphsInfo. These two functions leverage the
Tools Utility API described below. Additionally, model_net.json is saved which is a json format variant to model.cpp.
QNN Model JSON Format
Note
All QNN enum/macro values are resolved in fields.
All input/output tensors are stored in “tensors” config section and the tensor names are later used for defining a node inputs/outputs. The only tensor defined in the node config is a tensor parameter.
Static input tensor data is not stored in the JSON.
{
"model.cpp": "<CPP filename goes here>",
"model.bin": "<BIN filename goes here if applicable else NA>",
"coverter_command": "<command line used goes here>",
"copyright_str": "<copyright str goes here if applicable else "">",
"op_types": ["list of unique op types found in graph"]
"Total parameters": "total parameter count in graph ( value in MB assuming single precision float)",
"Total MACs per inference": "total multiply and accumulates in graph count in M),
"graph": {
"tensors": {
"<tensor_name>: {
"id": <generated_id>,
"type": <tensor_type>,
"dataFormat": <tensor_memory_layout>,
"data_type": <tensor_data_type>,
"quant_params": {
"definition": <enum_value>,
"encoding": <enum_value>,
"scale_offset": {
"offset": <val>,
"scale": <val>
}
}
"current_dims": <list_val>,
"max_dims": <list_val>,
"params_count": <val> ("parameter count for node, along with value/total percentage. (only where applicable)")
},
"<tensor_name_with_axis_scale_offset_variant>: {
"id": <generated_id>,
"type": <tensor_type>,
"dataFormat": <tensor_memory_layout>,
"data_type": <tensor_data_type>,
"quant_params": {
"definition": <enum_value>,
"encoding": <enum_value>,
"axis_scale_offset": {
"axis": <val>,
"num_scale_offsets": <val>,
"scale_offsets": [
{
"scale": <val>,
"offset": <val>
},
...
]
}
}
"current_dims": <list_val>,
"max_dims": <list_val>
},
...
}
"nodes": {
"<node_name>: {
"package": <str_val>,
"type": <str_val>,
"tensor_params": {
"<param_name>": {
"<tensor_name_*>: {
"id": <generated_id>,
"type": <tensor_type>,
"dataFormat": <tensor_memory_layout>,
"data_type": <tensor_data_type>,
"quant_params": {
"definition": <enum_value>,
"encoding": <enum_value>,
"scale_offset": {
"offset": <val>,
"scale": <val>
}
"current_dims": <list_val>,
"max_dims": <list_val>,
"data": <list_val>
}
}
...
},
"scalar_params": {
"param_name": {
"param_data_type": <val>
}
...
},
"input_names": <list_str_val>,
"output_names": <list_str_val>,
"macs_per_inference": <val> ("multiply and accumulate value for node, along with value/total percentage. (only where applicable)")
}
...
}
}
}
Tools Utility API¶
The tools Utility API contains helper modules to generate QNN API calls. The APIs are light-weight wrappers on-top of the core QNN API and are intended to mitigate repetitive steps for creating QNN graphs.
Tools Utility C++ API:
QNN Core C API Reference: C
QNN Model Classes
QnnModel: This class is analogous to a QnnGraph and its tensors inside a given context. The context shall be provided at initialization and a new QnnGraph will be created within it. For more details on these class APIs please see QnnModel.hpp, QnnWrapperUtils.hpp
GraphConfigInfo: This structure is used to pass a list of QNN graph configurations(if applicable) from the client. Refer to QnnGraph API for details on available graph config options.
GraphInfo: This structure is used to communicate constructed graph along with its input and output tensors to the client.
QnnModel_composeGraphs: is responsible for constructing QNN graph on the provided QNN backend using the QnnModel class. It will return the constructed graph via graphsInfo.QnnModel_freeGraphsInfo: should only be called once the graph is no longer being used.
For more information on integrating the model into an application see Integration workflow
Tensorflow Conversion¶
QNN, like many other neural network runtime engines, supports both low level operations (like an elementwise multiply) as well as high level operations (like Prelu). TensorFlow on the other hand, generally supports high level operations by representing them as subgraphs of low level operations. To reconcile these differences the converter must sometimes pattern match subgraphs of small operations into larger “layer-like” operations that can be leveraged in QNN.
Pattern Matching¶
The following are a few examples of pattern matching that occurs in the QNN Tensorflow converter. In each case the pattern generally consists of any operations that fall in between the layer input and output, with additional parameters like weights and biases being absorbed into the final IR op.
Convolution example:
Prelu example:
The important thing to remember is that these patterns are hard coded in the converter. Changes to the model that affect the connectivity and order of the operations in these patterns is also likely to break the conversion as the converter will not be able to identify and map the subgraph to the appropriate layer.
The TF converter also supports propagating quantization aware trained (QAT) model parameters to the final QNN model. This happens automatically during conversion when quantization is invoked. Note that the placement of quantization nodes also determines whether or not they will be propagated. Inserting quantization nodes inside a pattern will cause the pattern matching to break and conversion to fail. The safe place to insert nodes is after “layer-like” layers to capture activation information for a layer. In addition, quantization nodes inserted after weights and biases can capture the quantization information for static parameters.
An example of inserting a quantization node after a Convolution:
See Quantization for more information on initiating quantization as part of the conversion process.
Additional Required Parameters¶
As Tensorflow graphs often include extraneous nodes that are not required for general inference it is required that the input nodes and dimensions be provided along with the final output nodes required for inference. The converter will then prune unnecessary nodes from the graph ensuring a more compact and efficient graph.
To specify graph’s inputs to the converter pass the following on the command line:
--input_dim <input_name> <comma separated dims>
To specify the graph’s output nodes simply pass:
--out_node <output_name>
Tensorflow also has multiple input formats, but only frozen graphs (.pb files) or .meta files are supported. Saved training sessions are not supported by the converter.
Notes on Tensorflow 2.x Support¶
The qnn-tensorflow-converter has been updated to support conversion of Tensorflow 2.3 models. Note that while some TF 1.x models may convert using Tensorflow 2.3 as the conversion framework it is generally recommended to use the same TF version for conversion as was used for training the model. Some older 1.x models may not convert at all using TF 2.3 and a TF 1.x instance may be required for successful conversion.
Note that some options have been updated or added to support Tensorflow 2.x models. The first is a change to support the SavedModel format. Users can provide the directory to the SavedModel files by passing it to the same input_network option:
--input_network <SavedModel path>
Users can optionally pass saved_model_tag to indicate the tag and associated MetaGraph from the SavedModel. Default is “serve”
--saved_model_tag <tag>
Lastly a user can select the input and output of the model by using the signature key. Default value is ‘serving default’
--saved_model_signature_key <signature_key>
Example¶
The following is an example of an SSD model which requires one image input, but has 4 output nodes.
qnn-tensorflow-converter --input_network frozen_graph.pb --input_dim Preprocessor/sub 1,300,300,3 --output_path ssd_model.cpp --out_node detection_scores --out_node detection_boxes --out_node detection_classes --out_node Postprocessor/BatchMultiClassNonMaxSuppression/map/TensorArrayStack_2/TensorArrayGatherV3 -p "qti.aisw"
TFLite Conversion¶
The qnn-tflite-converter converts a TFLite model to an equivalent QNN representation. It takes as input a .tflite model.
Additional Required Parameters¶
TFlite converter needs the names and dimensions of the input nodes to be provided at commandline for the conversion. Each input must be passed individually using the same argument.
To specify graph’s inputs to the converter pass the following on the command line:
--input_dim <input_name_1> <comma separated dims> --input_dim <input_name_2> <comma separated dims>
PyTorch Conversion¶
The qnn-pytorch-converter converts a PyTorch model to an equivalent QNN representation. It takes as input a TorchScript model (.pt).
Additional Required Parameters¶
PyTorch converter needs the names and dimensions of the input nodes to be provided at commandline for the conversion. Each input must be passed individually using the same argument.
To specify graph’s inputs to the converter pass the following on the command line:
--input_dim <input_name_1> <comma separated dims> --input_dim <input_name_2> <comma separated dims>
Onnx Conversion¶
The qnn-onnx-converter converts a serialized ONNX model to an equivalent QNN representation. By default, it also runs
onnx-simplifier if available in user environment(see Setup). Additionally, onnx-simplifier is only
run by default if user has not provided quantization overrides/custom ops as the simplification process could possibly
squash layers preventing the custom ops or quantization overrides from being used. If the model contains ONNX functions,
converter always does inlining of function nodes.
Note: If conversion fails, the onnx converter supports an additional option “–dry_run” which will dump detailed
information about unsupported ops and associated parameters.
Current ONNX Conversion supports upto ONNX Opset 21.
Custom Operation Output Shape and Datatype Inference¶
QNN converter requires output shapes and datatypes for all operations to be present in the model for successful conversion. Output shapes and
datatypes for custom operations can be inferred from the model if present in the model or inferred using the framework’s shape inference script.
When the output shapes and datatypes of a custom operation are not present in the model or cannot be inferred from the framework’s shape inference
script, the logic to infer custom operation output shapes and datatypes can be provided to the converter through a shared library compiled with
Convter Op Package Generation. The compiled library can be provided with the
--converter_op_package_lib or -cpl option followed by the absolute path to the compiled library. The converter takes the library,
infers the output shapes and datatypes of the custom operations needed for successful model conversion. Multiple libraries must be comma separated.
Note
--converter_op_package_lib or -cpl is an optional argument and should be used when the output shapes and/or output datatypes for custom operations
are not present in the model or cannot be inferred from the framework’s shape inference script.
Note
When the output datatypes are present in the model and the --converter_op_package_lib with the logic to populate the output datatypes is passed,
output datatypes inferred from the library will be given priority and override the output datatypes inferred from the model.
Example¶
qnn-onnx-converter --input_network model.onnx --converter_op_package_lib libExampleLibrary.so
Note
See Convter Op Package Generation for library generation and compilation instructions.
Custom operation output shape inference is only supported for ONNX and PyTorch converters.
Tensorflow and TFLite converters do not support custom operation output shape inference.
Custom I/O¶
Introduction¶
Custom I/O feature allows users to provide the desired layout and datatype for the inputs and outputs while loading a network. Instead of compiling the network for the inputs and outputs specified in the model, the network is compiled for the inputs and outputs described in custom configuration. This feature is used when the user intends to pre-process (on GPU/CDSP or any other method) or offline process (like allowed by ML commons) the input data and avoid some steps in the input processing. Users can avoid redundant transposes and data-type conversions if they have knowledge of the input pre-processing steps. Similarly, on the post-processing side, if the model output is to be fed to a next stage in a pipeline, the desired format and type can be configured as the output of current stage.
In this section, the term “Model I/O” refers to the input and output datatypes and formats of the original model. The term “Custom I/O” refers to the input and output datatypes and formats desired by the user.
Custom I/O Configuration File¶
Custom I/O can be applied using a configuration yaml file that contains the following fields for each input and output that needs to be modified.
IOName: Name of the input or output present in the model that needs to be loaded as per the custom requirement.
Layout: Layout field (optional) has two sub fields: Model and Custom. Model and Custom fields support valid QNN Layout. Accepted values are: NCDHW, NDHWC, NCHW, NHWC, NFC, NCF, NTF, TNF, NF, NC, F, NONTRIVIAL, where, N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature, T = Time
Model: Specify the layout of the buffer in the original model. This is equivalent to the –input_layout option and both cannot be used together.
Custom: Specify the custom layout desired for the buffer. This field needs to be filled by the user.
Datatype: Datatype field (optional) supports float32, float16 and uint8 datatypes.
QuantParam: QuantParam field (optional) has three sub fields: Type, Scale and Offset.
Type: Set to QNN_DEFINITION_DEFINED (default) if the scale and offset are provided by the user else set to QNN_DEFINITION_UNDEFINED.
Scale: Float value for the scale of the buffer as desired by the user.
Offset: Integer value for the offset as desired by the user.
Example¶
Consider a ONNX model with the original model I/O and custom I/O configuration as shown in the table below:
Input/Output Name |
Model I/O |
Custom I/O |
|---|---|---|
‘input_0’ |
float NCHW |
int8 NHWC |
‘output_0’ |
float NHWC |
float NCHW |
Then, the content of custom I/O configuration yaml file that should be provided is
- IOName: input_0
Layout:
Model: NCHW
Custom: NCHW
Datatype: uint8
QuantParam:
Type:
QNN_DEFINITION_DEFINED
Scale:
0.12
Offset:
2
- IOName: output_0
Layout:
Model: NHWC
Custom: NCHW
Note:
If no change is required for an input or output, it can be skipped in the configuration file.
Datatype can be modified using custom I/O feature only if the model input or output datatype is float, float16, int8 or uint8. For other datatypes, ‘Datatype’ field should be skipped in the configuration file.
Usage¶
The custom IO config YAMl file can be provided using the --custom_io option of qnn-onnx-converter. Sample usage is as follows:
$ ${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-onnx-converter \
--custom_io <path/to/YAML/file> ....
Custom IO Config Template File¶
The Custom IO Configuration file filled with default values can be obtained using the --dump_custom_io_config_template option of qnn-onnx-converter.
$ ${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-onnx-converter \
--input_network ${QNN_SDK_ROOT}/examples/Models/InceptionV3/tensorflow/model.onnx \
--dump_custom_io_config_template <output_folder>/config.yaml
The dumped template file has an entry for each input and output of the model provided. Each field in the template file is filled with the default value obtained from the model for that particular input or output. The template file also has comments describing each field for the user.
Supported Use Cases¶
Layout conversions of the input and output buffers of the model. Valid layout conversions are inter-conversions between:
NCDHW and NDHWC
NHWC and NCHW
NFC and NCF
NTF and TNF
Passing quantized inputs of datatype uint8 or int8 to a non-quantized model. In this case, users must provide the scale and offset for the quantized inputs.
Users can provide custom scale and offset for the inputs and outputs of a quantized model. The scale and offset generated by the quantizer are overrriden by those provided by the user in the YAML file.
The user may use the --input_data_type and --output_data_type options of qnn-net-run to provide float or uint8_t type data to model inputs/outputs. Users may pass and get int8/uint8 data to the model using the native option. By default, qnn-net-run assumes the data to be of type float32 and performs the quantization at input and dequatization at output in case of quantized models.
Limitations¶
Custom IO only supports providing the following datatypes: float32, float16, uint8, int8.
If the user needs to pass quantized inputs (i.e. of type int8 or uint8) to a non-quantized model, the scale and offset must be provided by the user in the YAML file. Not providing the scale and offset in this case would throw an error.
Preserve I/O¶
Introduction¶
Preserve I/O feature allows users to retain the layout and datatype of the inputs and outputs as present in the original ONNX model. This feature allows the user to avoid any pre- or post-processing steps to transform the data to the layout and datatype due to the default behavior of QNN converters at the input and output of the model.
Usage¶
The different ways of using this option are as follows:
The user may choose to preserve layouts and datatypes for all IO tensors by just passing the
--preserve_iooption as follows:
$ ${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-onnx-converter \
--preserve_io ....
The user may choose to preserve the only layout or datatype for all the inputs and outputs of the graph as follows:
$ ${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-onnx-converter \
--preserve_io layout ....
or,
$ ${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-onnx-converter \
--preserve_io datatype....
The user may choose to preserve the layout or datatype for only a few inputs and outputs of the graph as follows:
$ ${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-onnx-converter \
--preserve_io layout <space separated list of names of inputs and outputs of the graph>....
or,
$ ${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-onnx-converter \
--preserve_io datatype <space separated list of names of inputs and outputs of the graph>....
The user can pass a combination of
--preserve_io layoutand--preserve_io datatypeas follows:
$ ${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-onnx-converter \
--preserve_io layout <space separated list of names of inputs and outputs of the graph> \
--preserve_io datatype <space separated list of names of inputs and outputs of the graph> ....
Passing just --preserve_io layout and --preserve_io datatype together is valid and equivalent to passing --preserve_io only.
Usage in point 3 cannot be combined with usage in point 1 or point 2 and will result in an error if used together.
Usage in qnn-pytorch-converter¶
In PyTorch models there may be no tensor names. Input tensor names are named by passing -d, but output names in converter are named by
internal logic. To preserve layout or datatype for only the specified output tensor user can do as follows:
Run a 1st pass of the Converter and use the generated CPP/JSON file to fetch the APP_READ type tensor names.
Run a 2nd Converter for preserve layout or datatype for only the specified IO tensor with their names:
$ ${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-onnx-converter \
--preserve_io layout <space separated list of names of inputs and outputs of the graph>....
or,
$ ${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-onnx-converter \
--preserve_io datatype <space separated list of names of inputs and outputs of the graph>....
Usage with other converter options¶
--keep_int64_inputsneed not be passed if preserve IO is used to preserve the datatype of such inputs.--use_native_input_filesis set to True in case of quantization if preserve IO is used to preserve the datatypes.The layout specified using
--input_layoutis honored.Using
--input_dtypewith preserve IO may result in an error in case of datatype mismatch for any IO tensor.The layouts and datatypes specified using
--custom_ioget higher precedence over--preserve_io.
Since preserve IO retains the datatypes of IO tensors in the original model, the user must use --use_native_input_files or --native_input_tensor_names with qnn-net-run.
Common Parameters¶
There are a number of common parameters that can be passed to all the converters. These are described here: Tools
In addition, quantization parameters are also specified at conversion time. For more information refer to the tools document above and to: Quantization
Qairt Converter¶
The 2.21 release introduces a new conversion tool, using a new prefix qairt for Qualcomm AI Runtime. This new prefix
communicates that this converter can be used with both the Qualcomm Neural Processing SDK API as well as the Qualcomm AI Engine Direct
API.
Note
This tool is still in a Beta release status.
The qairt-converter tool converts a model from one of ONNX/TensorFlow/TFLite/PyTorch framework to a DLC.
The DLC contains the model in a Qualcomm graph format to support inference on Qualcomm HW.
The converter automatically detects the proper framework based on the source model extension.
Supported frameworks and file types are:
Framework |
File Type |
|---|---|
Onnx |
*.onnx |
TensorFlow |
*.pb |
TFLite |
*.tflite |
PyTorch |
*.pt |
Basic Conversion¶
Basic conversion has only one required argument --input_network. Some frameworks may require additional arguments
that are otherwise listed as optional. Please check the help text for more details.
Onnx Conversion
Current ONNX Conversion supports upto ONNX Opset 21.
$ qairt-converter --input_network model.onnx
Tensorflow Conversion
Tensorflow requires --desired_input_shape and --out_tensor_node.
$ qairt-converter \
--input_network inception_v3_2016_08_28_frozen.pb \
--desired_input_shape input 1,299,299,3 \
--out_tensor_node InceptionV3/Predictions/Reshape_1
Input/Output Layouts¶
The default input and output layouts in the converted graph are the same as per the source model. This behavior differs from the legacy converter which would modify the input and (optionally) the output layout to the spatial first format. An example single layer Onnx model (spatial last) is shown below.
Input/Output Customization using YAML¶
Note
This feature allows user to specify their desired input/output tensor layout for the converted model.
Users can provide a yaml configuration file to simplify using different input and output configurations over the command line. All configurations in the yaml are optional. If an option is provided in the yaml configuration and an equivalent option is provided on the command line, the command line option takes priority. The YAML configuration schema is shown below.
Input Tensor Configuration:
# Input 1
- Name:
Src Model Parameters:
DataType:
Layout:
Desired Model Parameters:
DataType:
Layout:
Shape:
Color Conversion:
QuantParams:
Scale:
Offset:
Output Tensor Configuration:
# Output 1
- Name:
Src Model Parameters:
DataType:
Layout:
Desired Model Parameters:
DType:
Layout:
QuantParams:
Scale:
Offset:
Name:Name of the input or output tensor present in the model that needs to be customizedSrc Model ParametersThese are mandatory if a certain equivalent desired configuration is specified.
DataType:Data type of the tensor in source model.Layout:Tensor layout in the source model. Valid values are:NCDHW
NDHWC
NCHW
NHWC
NFC
NCF
NTF
TNF
NF
NC
F
where
N = Batch
C = Channels
D = Depth
H = Height
W = Width
F = Feature
T = Time
Desired Model ParametersDataType:Desired data type of the tensor in the converted model. Valid values are float32, float16, uint8, int8 datatypes.Layout:Desired tensor layout of the converted model. Same valid values as source layout.Shape:Tensor shape/dimension in the converted model. Valid values are comma separated dimension values, i.e., (a,b,c,d).Color Conversion:Tensor color encoding in the converted model. Value values are BGR, RGB, RGBA, ARGB32, NV21, and NV12.QuantParams:Required when the desired model data type is a quantized data type. Has two sub fields: Scale and Offset.Scale:Scale of the buffer as a float value.Offset:Offset value as an integer.
The --dump_io_config_template option of qairt-converter saves the IO configuration file for the user to update.
Pass the --dump_io_config_template option to the qairt-converter to save the IO configuration file to the specified location.
$ qairt-converter \
--input_network model.onnx \
--dump_io_config_template <output_folder>/io_config.yaml
QAT encodings¶
QAT encodings are quantization-aware training encodings which are present in the source graph. They can be present in the following form in the source graph.
FakeQuant Nodes: There can be FakeQuant nodes in the source network.
Tensor output encodings: Quantization overrides can be associated with the output tensors in the source network.
Quant-Dequant Nodes: There can be Quant-Dequant nodes present in the source network.
For all the above cases, the FakeQuant and Quant-Dequant nodes are removed and the quantization overrides are cached
in the float DLC generated from the qairt-converter tool. These can then be used with the qairt-quantizer tool.
Note
Quantizer throws an error if QAT encodings contain bitwidths other than 8 or 16.
Inference fails for CPU and DSP runtimes if QAT encodings contain 16-bit.
Quantization Overrides¶
Provide quantization overrides to qairt-converter with a JSON file containing the parameters to use for quantization by
using the --quantization_overrides option, e.g., --quantization_overrides <overrides.json> These will be cached with
the float DLC generated by qairt-converter and can be used with the qairt-quantizer tool.
These will override any quantization data carried from conversion ,e.g., TF fake quantization, or calculated during the normal quantization process. For more details refer to Quantization Overrides.
Note
Quantizer throws an error if overridden encodings contain bitwidths other than 8 or 16.
Inference fails for CPU and DSP runtimes if overridden encodings contain 16-bit.
FP16 Conversion¶
Users also have the ability to generate a float16 graph where all float32 tensors are converted to float16 by passing
the --float_bitwidth 16 flag to the qairt-converter tool.
To generate a float16 graph with the bias still in float32, an additional --float_bias_bitwidth 32 flag can be
passed.
FAQs¶
How is QAIRT Converter different from Legacy Converters?
Single converter vs independent framework converters
The qairt-converter is a single converter tool supporting conversion for all supported frameworks based on the model extension while legacy converters had different framework specific tools.
Changed some optional arguments as default behavior
The default input and output layouts in the Converted graph will be same as in the Source graph. The legacy ONNX and Pytorch converters may not always retain the input and output layouts from Source graph.
Removed deprecated arguments
Deprecated arguments on the legacy converters are not enabled on the new converter.
Renamed some arguments for clarity
The –input_encoding argument is renamed to –input_color_encoding. Framework-specific arguments have the framework name present. eg- –define_symbol is renamed to –onnx_define_symbol, –show_unconsumed_nodes is renamed to –tf_show_unconsumed_nodes, –signature_name is renamed to –tflite_signature_name.
DLC as the Converter output file format
The QAIRT Converter uses DLC as output format. The .cpp/.bin & .json format used by
qnn-<framework>-converterConverter are not supported by QAIRT Converter. In order to generate the .cpp/.bin and .json output, continue to use the legacy converter.Quantizer functionality is separated from Conversion functionality
qnn-<framework>-converterinvokes the quantizer as part of the converter tool when--input_listor--float_fallbackis passed.qairt-quantizerhowever is a standalone tool for quantization likesnpe-dlc-quant.Please refer to qairt-quantizer for more information and usage details.
Will the Converted model be any different with QAIRT converter compared to Legacy Converter?
The result of the QAIRT Converter will be different from the result of Legacy Converters in terms of the input/output layout.
Legacy converters will by default modify the input tensors to Spatial First (e.g. NHWC) layout. This means for Frameworks like ONNX, where the predominant layout is Spatial Last (e.g. NCHW), the input/output layout is different between the source model and the converted model.
Since QAIRT Converter preserves the source layouts be default, the QAIRT-converted graphs in case of many ONNX/Pytorch models will be different from the Legacy-converted graphs.
The QAIRT Converter will be enhanced in a future release to support the same layouts as the legacy converters.