UDO DSP tutorial for Quantized DLC¶
Overview
This tutorial describes the steps needed to create a UDO package for DSP runtime and execute the Inception V3 model using the package. The Softmax operation has been chosen in this tutorial to demonstrate the implementation of a UDO with Qualcomm® Neural Processing SDK. This tutorial also describes the offline cache generation steps for DSP V68.
The Qualcomm® Neural Processing SDK provides the resources for this example under
$SNPE_ROOT/examples/SNPE/NativeCpp/UdoExample/Softmax
Information on UDO in general is available at UDO Overview. Information on running the Inception V3 network without UDO is available at Inception V3 Tutorial.
The artifacts necessary to run the Inception V3 network for CPU, GPU, and DSP runtime will be generated in this tutorial. The steps required to compile and execute the Inception V3 network for DSP runtime alone are outlined here. Information on running the Inception V3 network for CPU and GPU runtime is available at Inception V3 UDO Tutorial.
Prerequisites
The following tutorial assumes that general Qualcomm (R) Neural Processing SDK
setup has been followed to support
SDK environment, TensorFlow environment, and desired platform
dependencies. Additionally, we need an extracted Qualcomm® AI Direct SDK (no
need of Qualcomm® AI Direct SDK setup) for generating the skeleton code and
building the libraries. For Qualcomm® AI Direct SDK details, refer to the Qualcomm® AI Direct SDK
documentation at $QNN_SDK_ROOT/docs/QNN/index.html page, where
QNN_SDK_ROOT is the location of the Qualcomm® AI Direct SDK installation.
Set the $QNN_SDK_ROOT to the unzipped Qualcomm® AI Direct SDK location. This has to be performed
after running the envsetup.sh script mentioned in SNPE Setup. The
steps listed in this tutorial use the Tensorflow model in the
form of inception_v3_2016_08_28_frozen.pb. For details on
acquiring the Inception V3 model visit Tutorials
Setup.
Introduction
Here are the steps to develop and run a UDO
Steps 1-4 are run offline on the x86 host and are necessary for execution in step 5. Step 5 provides information on execution using the Qualcomm® Neural Processing SDK command-line executable snpe-net-run. Optionally, the user can perform steps 1-4 automatically using the provided setup script.
Step 1: Package Generation
Generating the SoftmaxUdoPackage requires the snpe-udo-package-generator tool and the provided UDO plugin: Softmax_Quant.json. The plugin is located under $SNPE_ROOT/examples/SNPE/NativeCpp/UdoExample/Softmax/config. More information about creating a UDO plugin can be found here.
Generate the SoftmaxUdoPackage using the following:
export SNPE_UDO_ROOT=$SNPE_ROOT/share/SNPE/SnpeUdo
export QNN_SDK_ROOT=<path to Qualcomm® AI Direct SDK>
snpe-udo-package-generator -p $SNPE_ROOT/examples/SNPE/NativeCpp/UdoExample/Softmax/config/Softmax_Quant.json -o $SNPE_ROOT/examples/Models/InceptionV3/
Similarly for DSP V68 example, the config is available at the location
$SNPE_ROOT/examples/SNPE/NativeCpp/UdoExample/Softmax/config/Softmax_v68.json
This command creates the Softmax based package at $SNPE_ROOT/examples/Models/InceptionV3/SoftmaxUdoPackage. For more information on the snpe-udo-package-generator tool visit here.
Step 2: Framework model Conversion to a DLC
Information for converting a model to a DLC is available at Inception V3 UDO Model Conversion. This will generate a DLC named inception_v3_udo.dlc containing the Softmax as UDO at $SNPE_ROOT/examples/Models/InceptionV3/dlc.
Step 3: Package Implementations
The generated package creates the skeleton of the operation implementation, which must be filled by the user to create a functional UDO. The rest of the code scaffolding for compatibility with Qualcomm® Neural Processing SDK is provided by the snpe-udo-package-generator. The UDO implementations for this tutorial are provided under $SNPE_ROOT/examples/SNPE/NativeCpp/UdoExample/Softmax/src.
DSP Implementations for V65 and V66
A registration library and an implementation library are required to run inference on a network with UDO layers on Qualcomm® Neural Processing SDK DSP. The registration library will run on CPU, and specifies the DSP implementation library of the UDO. Refer Implementing a UDO for DSP V65 and V66 for more information on implementing UDO for DSP V65 and V66 runtimes.
The file in the package that need to be implemented for DSP V65 and V66 are
SoftmaxUdoPackage/jni/src/DSP/Softmax.cpp
The provided example implementation is present at the location
$SNPE_ROOT/examples/SNPE/NativeCpp/UdoExample/Softmax/src/DSP/Softmax.cpp
Copy the provided implementations to the package:
cp -f $SNPE_ROOT/examples/SNPE/NativeCpp/UdoExample/Softmax/src/DSP/Softmax.cpp $SNPE_ROOT/examples/Models/InceptionV3/SoftmaxUdoPackage/jni/src/DSP/src/ops
Optionally, the user can provide their own implementations in the package.
DSP Implementations for V68 or later
Similar to all other Qualcomm® Neural Processing SDK runtimes, a registration library and an implementation library are required to run inference on a network with UDO layers on Qualcomm® Neural Processing SDK DSP. The registration library will run on CPU, and specifies the DSP implementation library of the UDO. Refer Implementing a UDO for DSP V68 or later for more information on implementing UDO for DSP V68 or later runtimes. The directory paths and locations in this example are specific to DSP V68. For later runtimes, please replace DSP_V68 with the corresponding DSP architecture (for example, DSP_V69) in the paths.
The file in the package that needs to be implemented for DSP V68 and later is
SoftmaxUdoPackage/jni/src/DSP_V68/src/ops/Softmax.cpp
The provided example implementation is present at the location
$SNPE_ROOT/examples/SNPE/NativeCpp/UdoExample/Softmax/src/HTP/Softmax.cpp
Copy the provided implementations to the package:
cp -f $SNPE_ROOT/examples/SNPE/NativeCpp/UdoExample/Softmax/src/HTP/Softmax.cpp $SNPE_ROOT/examples/Models/InceptionV3/SoftmaxUdoPackage/jni/src/DSP_V68/src/ops
Optionally, the user can provide their own implementations in the package.
Step 4: Package Compilation
Hexagon DSP Runtime Compilation
Compilation for the DSP runtime makes use of the make system.
In order to build the implementation libraries for DSP V65 and
V66 runtimes, Hexagon-SDK needs to be installed and set up. For
details, follow the setup instructions on
$HEXAGON_SDK_ROOT/docs/readme.html page, where
HEXAGON_SDK_ROOT is the location of your Hexagon-SDK
installation. Information for compiling a UDO for DSP is
available at Compiling UDO for
DSP.
In order to build the implementation libraries for DSP V68 or
later runtimes, Hexagon-SDK 4.0+ needs to be installed and set
up. For Hexagon-SDK details, follow the setup instructions on
$HEXAGON_SDK4_ROOT/docs/readme.html page, where
HEXAGON_SDK_ROOT is the location of your Hexagon-SDK
installation. Also, we need an extracted Qualcomm® AI Direct SDK (no need of
Qualcomm® AI Direct SDK setup) for building the libraries. For Qualcomm® AI Direct SDK details,
refer to the Qualcomm® AI Direct SDK documentation at
$QNN_SDK_ROOT/docs/QNN/index.html page, where QNN_SDK_ROOT
is the location of the Qualcomm® AI Direct SDK installation. Set the
$QNN_SDK_ROOT to the unzipped Qualcomm® AI Direct SDK location. Information
for compiling a UDO for DSP V68 or later is available at
Compiling a UDO for DSP_V68 or
later.
Compile for offline cache generation in case of DSP V68:
cd SoftmaxUdoPackage
make dsp_x86 X86_CXX=<path_to_x86_64_clang>
The expected artifact after compiling for offline cache generation is
The UDO DSP implementation library: SoftmaxUdoPackage/libs/x86-64_linux_clang/libUdoSoftmaxUdoPackageImplDsp.so
Setup Script
The Qualcomm® Neural Processing SDK provides an option to automatically perform steps of DLC conversion, package generation, package implementation, and package compilation for UDO as outlined in steps 1-4 above. The option is an extension of the Inception V3 setup script. To enable Inception V3 setup for UDO, run the script with the –udo or -u option.
usage: $SNPE_ROOT/models/examples/Models/InceptionV3/scripts/setup_inceptionv3_snpe.py [-h] -a ASSETS_DIR [-d] [-r RUNTIME] [-u] [-l [HTP_SOC]]
Prepares the InceptionV3 assets for tutorial examples.
required arguments:
-a ASSETS_DIR, --assets_dir ASSETS_DIR
directory containing the InceptionV3 assets
optional arguments:
-d, --download Download InceptionV3 assets to InceptionV3 example
directory
-r RUNTIME, --runtime RUNTIME
Choose a runtime to set up tutorial for. Choices: cpu,
gpu, dsp, aip, all. 'all' option is only supported
with --udo flag
-u, --udo Generate and compile a user-defined operation package
to be used with InceptionV3. Softmax is simulated as
a UDO for this script.
-l [HTP_SOC], --htp_soc [HTP_SOC]
Specify SOC target for generating HTP Offline Cache.
For example: "--htp_soc sm8450" for Snapdragon 8 Gen 1,
default value is sm8750.
The –udo extension is compatible with options normally used by the setup_inceptionv3.py script. When the –udo option is enabled, the -r or –runtime option controls the runtime for the package implementation and compilation. Additionally, the –udo option supports use of an ‘all’ runtime option to create and compile the SoftmaxUdoPackage for the CPU, GPU, and DSP/AIP runtimes. Selecting the ‘aip’ or ‘dsp’ runtime options additionally compiles x86 libraries in order to quantize the model. Selecting the ‘cpu’ runtime option compiles for both x86 and Android targets. Compilation for Android target will be skipped if ANDROID_NDK_ROOT is not set. If no runtime option is provided the package is compiled for the CPU runtime. The -l or –htp_soc option will generate and compile the package for the HTP architecture of the SoC provided.
The command to use the setup script for UDO is:
python3 $SNPE_ROOT/examples/Models/InceptionV3/scripts/setup_inceptionv3.py -a ~/tmpdir -d -u -r <runtime_of_choice>
In case of DSP V68:
python3 $SNPE_ROOT/examples/Models/InceptionV3/scripts/setup_inceptionv3.py -a ~/tmpdir -d -u -r <runtime_of_choice> -l
This will populate the artifacts in Step 4.
Model Execution
Execution using snpe-net-run
Executing Inception V3 with UDO is largely the same as use of snpe-net-run without UDO.
The Qualcomm® Neural Processing SDK provides Linux and Android binaries of snpe-net-run under
$SNPE_ROOT/bin/x86_64-linux-clang
$SNPE_ROOT/bin/aarch64-android
$SNPE_ROOT/bin/aarch64-oe-linux-gcc8.2
$SNPE_ROOT/bin/aarch64-oe-linux-gcc9.3
$SNPE_ROOT/bin/aarch64-ubuntu-gcc9.4
$SNPE_ROOT/bin/aarch64-oe-linux-gcc11.2
For UDO, snpe-net-run consumes the registration library through the –udo_package_path option. LD_LIBRARY_PATH must also be updated to include the runtime-specific artifacts generated from package compilation.
Android Target Execution
The tutorial for execution on Android targets will use the arm64-v8a architecture. Set SNPE_TARGET_DSPARCH to the DSP architecture of the target Android device.
# architecture: arm64-v8a - compiler: clang - STL: libc++
export SNPE_TARGET_ARCH=aarch64-android
export SNPE_TARGET_DSPARCH=hexagon-v68
Then, push Qualcomm® Neural Processing SDK binaries and libraries to the target device:
adb shell "mkdir -p /data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/bin"
adb shell "mkdir -p /data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/lib"
adb push $SNPE_ROOT/lib/$SNPE_TARGET_ARCH/*.so \
/data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/lib
adb push $SNPE_ROOT/bin/$SNPE_TARGET_ARCH/snpe-net-run \
/data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/bin
Next, update environment variables on the target device to include the Qualcomm® Neural Processing SDK libraries and binaries:
adb shell
export SNPE_TARGET_ARCH=aarch64-android
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/lib
export PATH=$PATH:/data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/bin
Lastly, push the Inception V3 UDO model and input data to the device:
cd $SNPE_ROOT/examples/Models/InceptionV3
mkdir data/rawfiles && cp data/cropped/*.raw data/rawfiles/
adb shell "mkdir -p /data/local/tmp/inception_v3_udo"
adb push data/rawfiles /data/local/tmp/inception_v3_udo/cropped
adb push data/target_raw_list.txt /data/local/tmp/inception_v3_udo
adb push dlc/inception_v3_udo.dlc /data/local/tmp/inception_v3_udo
rm -rf data/rawfiles
Hexagon DSP Execution
The procedure for execution on device for DSP is largely the same as CPU and GPU. However, the DSP runtime requires quantized network parameters. While DSP allows unquantized DLCs, it is generally recommended to quantize DLCs for improved performance. The tutorial will use a quantized DLC as an illustrative example. Quantizing the DLC requires the snpe-dlc-quantize tool.
Note: In the below command one should use input dlc generated at Model DLC Conversion. Also, provide the path of the registration lib generated after compiling x86 Host under the argument “udo_package_path”. More information about compiling x86 can be found here.
To quantize the DLC for use on DSP:
cd $SNPE_ROOT/examples/Models/InceptionV3/
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$SNPE_ROOT/examples/Models/InceptionV3/SoftmaxUdoPackage/libs/x86-64_linux_clang/
snpe-dlc-quantize --input_dlc dlc/inception_v3_udo.dlc --input_list data/cropped/raw_list.txt --udo_package_path SoftmaxUdoPackage/libs/x86-64_linux_clang/libUdoSoftmaxUdoPackageReg.so --output_dlc dlc/inception_v3_udo_quantized.dlc
For quantizing the DLC with offline cache generation to use on DSP V68 :
cd $SNPE_ROOT/examples/Models/InceptionV3/
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:SoftmaxUdoPackage/libs/x86-64_linux_clang
snpe-dlc-quantize --input_dlc dlc/inception_v3_udo.dlc --input_list data/cropped/raw_list.txt --udo_package_path SoftmaxUdoPackage/libs/x86-64_linux_clang/libUdoSoftmaxUdoPackageReg.so --output_dlc dlc/inception_v3_udo_quantized.dlc --enable_htp --htp_socs sm8350
For more information on snpe-dlc-quantize visit quantization. For information on UDO-specific quantization visit Quantizing a DLC with UDO. For information on DSP/AIP runtime visit DSP Runtime or AIP Runtime.
Now push the quantized model to device:
adb push dlc/inception_v3_udo_quantized.dlc /data/local/tmp/inception_v3_udo
Before executing on the DSP, push the Qualcomm® Neural Processing SDK libraries for DSP to device:
adb shell "mkdir -p /data/local/tmp/snpeexample/dsp/lib"
adb push $SNPE_ROOT/lib/$SNPE_TARGET_DSPARCH/unsigned/*.so /data/local/tmp/snpeexample/dsp/lib
Now push DSP-specific UDO libraries to device. Depending on DSP architecture specified in the config, dsp_v68 directory can be dsp_v60 or dsp (with older Qualcomm® Neural Processing SDKs).
cd $SNPE_ROOT/examples/Models/InceptionV3
adb shell "mkdir -p /data/local/tmp/inception_v3_udo/dsp"
adb push SoftmaxUdoPackage/libs/dsp_v68/*.so /data/local/tmp/inception_v3_udo/dsp
adb push SoftmaxUdoPackage/libs/arm64-v8a/libUdoSoftmaxUdoPackageReg.so /data/local/tmp/inception_v3_udo/dsp # Pushes reg lib
adb push SoftmaxUdoPackage/libs/arm64-v8a/libc++_shared.so /data/local/tmp/inception_v3_udo/dsp
Then set required environment variables and run snpe-net-run on device:
adb shell
cd /data/local/tmp/inception_v3_udo/
export SNPE_TARGET_ARCH=aarch64-android
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/lib
export PATH=$PATH:/data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/bin
export LD_LIBRARY_PATH=/data/local/tmp/inception_v3_udo/dsp/:$LD_LIBRARY_PATH
export ADSP_LIBRARY_PATH="/data/local/tmp/inception_v3_udo/dsp/;/data/local/tmp/snpeexample/dsp/lib;/system/lib/rfsa/adsp;/system/vendor/lib/rfsa/adsp;/dsp"
snpe-net-run --container inception_v3_udo_quantized.dlc --input_list target_raw_list.txt --udo_package_path dsp/libUdoSoftmaxUdoPackageReg.so --use_dsp
AIP Execution
Because UDOs are not supported on the HTA hardware, executing on the AIP runtime defaults to the DSP UDO implementations. HTA hardware runs exclusively on quantized models and therefore as with the DSP runtime, a quantized model will be used.
Note: In the below command one should use input dlc generated at Model DLC Conversion. Also provide the path of the registration lib generated after compiling x86 Host under the argument “udo_package_path”. More information about compiling x86 can be found here.
The command to quantize the DLC for AIP is:
cd $SNPE_ROOT/examples/Models/InceptionV3/
snpe-dlc-quantize --input_dlc dlc/inception_v3_udo.dlc --input_list data/cropped/raw_list.txt --udo_package_path SoftmaxUdoPackage/libs/x86-64_linux_clang/libUdoSoftmaxUdoPackageReg.so --output_dlc dlc/inception_v3_udo_quantized.dlc --enable_hta
Now push the quantized model to device:
adb push dlc/inception_v3_udo_quantized.dlc /data/local/tmp/inception_v3_udo
Before executing using the AIP runtime, push the Qualcomm® Neural Processing SDK libraries for DSP to device with these commands:
adb shell "mkdir -p /data/local/tmp/snpeexample/dsp/lib"
adb push $SNPE_ROOT/lib/$SNPE_TARGET_DSPARCH/unsigned/*.so /data/local/tmp/snpeexample/dsp/lib
Now push DSP-specific UDO libraries to device. Depending on DSP architecture specified in the config, dsp_v68 directory can be dsp_v60 or dsp (with older Qualcomm® Neural Processing SDKs).
cd $SNPE_ROOT/examples/Models/InceptionV3
adb shell "mkdir -p /data/local/tmp/inception_v3_udo/dsp"
adb push SoftmaxUdoPackage/libs/dsp_v68/*.so /data/local/tmp/inception_v3_udo/dsp
adb push SoftmaxUdoPackage/libs/arm64-v8a/libUdoSoftmaxUdoPackageReg.so /data/local/tmp/inception_v3_udo/dsp # Pushes reg lib
adb push SoftmaxUdoPackage/libs/arm64-v8a/libc++_shared.so /data/local/tmp/inception_v3_udo/dsp
Then set required environment variables and run snpe-net-run on device:
adb shell
cd /data/local/tmp/inception_v3_udo/
export SNPE_TARGET_ARCH=aarch64-android
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/lib
export PATH=$PATH:/data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/bin
export LD_LIBRARY_PATH=/data/local/tmp/inception_v3_udo/dsp/:$LD_LIBRARY_PATH
export ADSP_LIBRARY_PATH="/data/local/tmp/inception_v3_udo/dsp/;/data/local/tmp/snpeexample/dsp/lib;/system/lib/rfsa/adsp;/system/vendor/lib/rfsa/adsp;/dsp"
snpe-net-run --container inception_v3_udo_quantized.dlc --input_list target_raw_list.txt --udo_package_path dsp/libUdoSoftmaxUdoPackageReg.so --use_aip
Integration with Android APK
This portion of the tutorial outlines how to integrate Qualcomm® Neural Processing SDK UDO libraries and Java API for package registration into an Android application. Generally, for native shared libraries to be discoverable by the application they must be placed in the project under
<project>/app/src/main/jniLibs/<platform_abi>
Once the libraries are accessible by the application, the registration library can be registered using the provided Java API. This process will be replicated with the example Image Classifiers application. The following assumes that the rest of the example application setup has been followed. The tutorial will issue instructions for platforms with arm64-v8a ABI.
First, create the neccessary directories to contain the UDO libraries. The following steps will populate all runtime implementation libraries.
mkdir app/src/main/jniLibs/
cp -a $SNPE_ROOT/examples/Models/InceptionV3/SoftmaxUdoPackage/libs/arm64-v8a/ app/src/main/jniLibs/
If DSP is to be used as the runtime, copy the implementation library with the following:
cp $SNPE_ROOT/examples/Models/InceptionV3/SoftmaxUdoPackage/libs/dsp_v68/*.so app/src/main/jniLibs/arm64-v8a/
If not already done, running setup_inceptionv3.sh will add the Inception V3 model enabled with UDO to the project.
bash ./setup_inceptionv3.sh
Now the Java API can be registered. Edit the file $SNPE_ROOT/examples/SNPE/android/image-classifiers/app/src/main/java/com/qualcomm/qti/snpe/imageclassifiers/tasks/LoadNetworkTask.java
To contain this line
@Override
protected NeuralNetwork doInBackground(File... params) {
NeuralNetwork network = null;
try {
SNPE.addOpPackage(mApplication,"libUdoSoftmaxUdoPackageReg.so"); // Add this line to register package
final SNPE.NeuralNetworkBuilder builder = new SNPE.NeuralNetworkBuilder(mApplication)
...
Now the APK can be built and exercised
./gradlew assembleDebug