SNPE1 to SNPE2 Migration Guide

There are 2 primary changes that drove the major version update to SNPE2:

  • C API

    • SNPE2 introduces a C API.

    • This API will be the only way to interact with Qualcomm® Neural Processing SDK going forward.

    • The C API reduces/removes potential compatibility concerns that C++ can have (e.g. due to C++ templates).

    • To ease the transition, the C++ API will continue to be supported until early/mid 2023, at which point it will be retired.

    • For more information, reference the “C tutorial” section of “Code Examples” and C API sections.

  • DLC Content Changes

    • The contents of the DLC file are being updated.

    • These changes mean that the DLC is not compatible between SNPE1 and SNPE2.

    • All DLCs must be generated for SNPE2.

    • HTP Offline Cache Records are backward compatible in SNPE2 - a cache generated with an older SNPE2 version will work in a later SNPE2 version.

    • Offline Caches in SNPE2 are compatible across SoCs:

      • For the same DSP Arch, a cache prepared for one SoC can run on another SoC if the “prepared with VTCM” is less than or equal to the “target SoC VTCM”. For example, a cache prepared for sm7450 with 2MB VTCM is compatible with sm8450 which has 8MB VTCM, but a cache prepared for sm8450 with 4MB VTCM will not be compatible with sm7450 which only has 2MB VTCM.

      • A cache generated for a newer DSP Arch cannot run on an SoC with a lower DSP Arch. A cache generated for sm8550 (v73) will not be compatible with sm8450 (v69), whereas a cache generated for sm8450 (v69) will be compatible with sm8550 (v73).

    • DLCs can now represent additional datatypes not supported by SNPE1 – cardinal datatypes (int8, int32, uint32), fp16.

      • These are in addition to the fp32 and quantized8 bit types that were supported in SNPE1.

      • Any User Buffers will need to match the data types contained in the DLC.

        • snpe-dlc-info can be used to identify the proper types.

    • Additionally, the names of layers and input/output tensors may be assigned different names than in SNPE1.

      • snpe-dlc-info can be used to identify and review what names the converters generate.

      • Inputs (generally referred as “Data” layer types) and “Const” layer types are now folded into the consuming Ops and not considered as standalone “layers”. As a result, they will no longer be visible in the output generated by snpe-diagview and snpe-dlc-info tools.

      • If client code uses APIs like SNPEBuilder.setOutputTensors() to request specific output tensors, this code may have to be modified as a result.

    • ArgMax now always outputs a uint32, so there may be some performance impact if the SNPE1 based model used an int8 output, due to the larger data size.

      • Previously, SNPE1 would use int8 if the maximum value was less than or equal to 255.

SDK Behavior/Content Changes:

  • Unsigned PD/Skels

    • SNPE2 defaults to using Unsigned PD.

    • SNPE2 delivers unsigned skel files.

    • In order to use a Signed PD, the skels must be signed by the customer, and the signed PD must be explicitly requested at runtime.

  • Runtime Changes:

    • The DSP runtime for SOCs with V66 CDSP does not support running a float DLC. The DLC must be quantized to be used with the DSP runtime for these SOCs.

    • For the GPU runtime, some of the initialization time is recorded differently, so init times will appear longer. This is simply a difference in how the initialization is measured and captured.

  • Integer Input/Output Tensors

    • SNPE1 did not support integer (cardinal) tensor types. Indices or integer data were handled as floating point tensors or using Q8 input using a special hard-wired quantization encoding [offset=0, scale=1.0].

    • SNPE2 supports integer type directly. The converter/quantizer will represent integer input as a UIntN tensor, where N can be 8, 16, or 32.

    • Qualcomm® Neural Processing SDK cannot convert a Q8 input tensor to UInt8 input tensor, so any existing code for such a network will have to be modified to use the new API to create UInt8 UserBuffer input.

    • Qualcomm® Neural Processing SDK can convert a floating-point input tensor to UInt8 input tensor, but this code path will be slower than providing correct type.

  • Tool/Flag Changes:

    • The snpe-dlc-quantize functionality has been refactored into two separate tools:

      • A shell script is still provided that supports the previous command-line interface, and calls the new tools as appropriate.

      • snpe-dlc-quant: for generating the quantized DLCs from float DLC, with various quantization options.

      • snpe-dlc-graph-prepare: for performing offline graph preparation on quantized DLCs for HTP SOCs.

        • This enables the flexibility to re-prepare the graph quickly from quantized DLC if needed, without needing to do both quantize and offline prepare which generally takes more time.

      • Flag Changes:

        • The “bc” (bias correction) algorithm is deprecated, and has no effect.

        • The “enable_hta” option is deprecated and has no effect.

          • Offline preparation for AIP is no longer supported, and all preparation is done at runtime.

    • snpe-net-run:

      • A new flag “–userbuffer_auto” has been added to automatically detect and create the right userbuffer type based on input and output tensor data type of the model.

SDK Content Changes:

  • The snpe-dlc-reorder tool has been removed, as it is no longer relevant (it was only used for HTA offline prepare)