SNPE1 to SNPE2 Migration Guide¶
There are 2 primary changes that drove the major version update to SNPE2:
C API
SNPE2 introduces a C API.
This API will be the only way to interact with Qualcomm® Neural Processing SDK going forward.
The C API reduces/removes potential compatibility concerns that C++ can have (e.g. due to C++ templates).
To ease the transition, the C++ API will continue to be supported until early/mid 2023, at which point it will be retired.
For more information, reference the “C tutorial” section of “Code Examples” and C API sections.
DLC Content Changes
The contents of the DLC file are being updated.
These changes mean that the DLC is not compatible between SNPE1 and SNPE2.
All DLCs must be generated for SNPE2.
HTP Offline Cache Records are backward compatible in SNPE2 - a cache generated with an older SNPE2 version will work in a later SNPE2 version.
Offline Caches in SNPE2 are compatible across SoCs:
For the same DSP Arch, a cache prepared for one SoC can run on another SoC if the “prepared with VTCM” is less than or equal to the “target SoC VTCM”. For example, a cache prepared for sm7450 with 2MB VTCM is compatible with sm8450 which has 8MB VTCM, but a cache prepared for sm8450 with 4MB VTCM will not be compatible with sm7450 which only has 2MB VTCM.
A cache generated for a newer DSP Arch cannot run on an SoC with a lower DSP Arch. A cache generated for sm8550 (v73) will not be compatible with sm8450 (v69), whereas a cache generated for sm8450 (v69) will be compatible with sm8550 (v73).
DLCs can now represent additional datatypes not supported by SNPE1 – cardinal datatypes (int8, int32, uint32), fp16.
These are in addition to the fp32 and quantized8 bit types that were supported in SNPE1.
Any User Buffers will need to match the data types contained in the DLC.
snpe-dlc-info can be used to identify the proper types.
Additionally, the names of layers and input/output tensors may be assigned different names than in SNPE1.
snpe-dlc-info can be used to identify and review what names the converters generate.
Inputs (generally referred as “Data” layer types) and “Const” layer types are now folded into the consuming Ops and not considered as standalone “layers”. As a result, they will no longer be visible in the output generated by snpe-diagview and snpe-dlc-info tools.
If client code uses APIs like SNPEBuilder.setOutputTensors() to request specific output tensors, this code may have to be modified as a result.
ArgMax now always outputs a uint32, so there may be some performance impact if the SNPE1 based model used an int8 output, due to the larger data size.
Previously, SNPE1 would use int8 if the maximum value was less than or equal to 255.
SDK Behavior/Content Changes:
Unsigned PD/Skels
SNPE2 defaults to using Unsigned PD.
SNPE2 delivers unsigned skel files.
In order to use a Signed PD, the skels must be signed by the customer, and the signed PD must be explicitly requested at runtime.
Runtime Changes:
The DSP runtime for SOCs with V66 CDSP does not support running a float DLC. The DLC must be quantized to be used with the DSP runtime for these SOCs.
For the GPU runtime, some of the initialization time is recorded differently, so init times will appear longer. This is simply a difference in how the initialization is measured and captured.
Integer Input/Output Tensors
SNPE1 did not support integer (cardinal) tensor types. Indices or integer data were handled as floating point tensors or using Q8 input using a special hard-wired quantization encoding [offset=0, scale=1.0].
SNPE2 supports integer type directly. The converter/quantizer will represent integer input as a UIntN tensor, where N can be 8, 16, or 32.
Qualcomm® Neural Processing SDK cannot convert a Q8 input tensor to UInt8 input tensor, so any existing code for such a network will have to be modified to use the new API to create UInt8 UserBuffer input.
Qualcomm® Neural Processing SDK can convert a floating-point input tensor to UInt8 input tensor, but this code path will be slower than providing correct type.
Tool/Flag Changes:
The snpe-dlc-quantize functionality has been refactored into two separate tools:
A shell script is still provided that supports the previous command-line interface, and calls the new tools as appropriate.
snpe-dlc-quant: for generating the quantized DLCs from float DLC, with various quantization options.
snpe-dlc-graph-prepare: for performing offline graph preparation on quantized DLCs for HTP SOCs.
This enables the flexibility to re-prepare the graph quickly from quantized DLC if needed, without needing to do both quantize and offline prepare which generally takes more time.
Flag Changes:
The “bc” (bias correction) algorithm is deprecated, and has no effect.
The “enable_hta” option is deprecated and has no effect.
Offline preparation for AIP is no longer supported, and all preparation is done at runtime.
snpe-net-run:
A new flag “–userbuffer_auto” has been added to automatically detect and create the right userbuffer type based on input and output tensor data type of the model.
SDK Content Changes:
The snpe-dlc-reorder tool has been removed, as it is no longer relevant (it was only used for HTA offline prepare)