UDO Tutorial With Weights¶
Overview
This tutorial describes the steps needed to create a UDO package with weights and execute the VGG model using the package. The Convolution operation has been chosen in this tutorial to demonstrate the implementation of a UDO with weights.
The Qualcomm® Neural Processing SDK provides the resources for this example under
$SNPE_ROOT/examples/SNPE/NativeCpp/UdoExample/Conv2D
Information on UDO in general is available at UDO Overview. Information on running the VGG network without UDO is available at VGG Tutorial. Information on creating a UDO package and executing the model using the package is available at UDO Tutorial.
Prerequisites
The following tutorial assumes that general Qualcomm (R) Neural Processing SDK
setup has been followed to support
SDK environment, ONNX environment, and desired platform
dependencies. Additionally, we need an extracted Qualcomm® AI Direct SDK (no
need of Qualcomm® AI Direct SDK setup) for generating the skeleton code and
building the libraries. For Qualcomm® AI Direct SDK details, refer to the Qualcomm® AI Direct SDK
documentation at $QNN_SDK_ROOT/docs/QNN/index.html page, where
QNN_SDK_ROOT is the location of the Qualcomm® AI Direct SDK installation.
Set the $QNN_SDK_ROOT to the unzipped Qualcomm® AI Direct SDK location. This has to be performed
after running the envsetup.sh script mentioned in SNPE Setup. The
steps listed in this tutorial use the ONNX model in the
form of vgg16.onnx. For details on
acquiring the VGG model visit Tutorials
Setup.
Introduction
Here are the steps to develop and run a UDO
Steps 1-4 are run offline on the x86 host and are necessary for execution in step 5. Step 5 provides information on execution using the Qualcomm® Neural Processing SDK command-line executable snpe-net-run.
Step 1: Package Generation
Generating the Conv2DPackage requires the snpe-udo-package-generator tool and the provided UDO plugin: Conv2D.json / Conv2DQuant.json / Conv2D_Htp.json depending on your runtime requirement. The Conv2D.json and Conv2DQuant.json gives you skeleton code for CPU (float) and DSP (uint8) implementations respectively. The plugins is located under $SNPE_ROOT/examples/SNPE/NativeCpp/UdoExample/Conv2D/config. More information about creating a UDO plugin can be found here.
Generate the Conv2DPackage UDO package using the following:
export SNPE_UDO_ROOT=$SNPE_ROOT/share/SNPE/SnpeUdo
export QNN_SDK_ROOT=<path to Qualcomm® AI Direct SDK>
mkdir $SNPE_ROOT/examples/Models/VGG/ConvUdoCpu
snpe-udo-package-generator -p $SNPE_ROOT/examples/SNPE/NativeCpp/UdoExample/Conv2D/config/Conv2D.json -o $SNPE_ROOT/examples/Models/VGG/ConvUdoCpu
or for DSP less than V68
export SNPE_UDO_ROOT=$SNPE_ROOT/share/SNPE/SnpeUdo
export QNN_SDK_ROOT=<path to Qualcomm® AI Direct SDK>
mkdir $SNPE_ROOT/examples/Models/VGG/ConvUdoDsp
snpe-udo-package-generator -p $SNPE_ROOT/examples/SNPE/NativeCpp/UdoExample/Conv2D/config/Conv2DQuant.json -o $SNPE_ROOT/examples/Models/VGG/ConvUdoDsp
or for DSP V68 and later
export SNPE_UDO_ROOT=$SNPE_ROOT/share/SNPE/SnpeUdo
export QNN_SDK_ROOT=<path to Qualcomm® AI Direct SDK>
mkdir $SNPE_ROOT/examples/Models/VGG/ConvUdoDsp
snpe-udo-package-generator -p $SNPE_ROOT/examples/SNPE/NativeCpp/UdoExample/Conv2D/config/Conv2D_Htp.json -o $SNPE_ROOT/examples/Models/VGG/ConvUdoDsp
This command creates the Convolution based package at $SNPE_ROOT/examples/Models/VGG/ConvUdoCpu/Conv2DPackage or $SNPE_ROOT/examples/Models/VGG/ConvUdoDsp/Conv2DPackage.
For more information on the snpe-udo-package-generator tool visit here.
Step 2: Framework model Conversion to a DLC
Converting the ONNX VGG model to DLC requires the snpe-onnx-to-dlc tool. The snpe-onnx-to-dlc tool consumes the same Conv2D.json used in package generation via the –udo command line option. In this step, <VGG_PATH> refers to the path to the vgg.onnx file. For example, after running the setup_vgg.py script <VGG_PATH> is $SNPE_ROOT/examples/Models/VGG/onnx.
Convert VGG with the following:
snpe-onnx-to-dlc --input_network <VGG_PATH>/vgg16.onnx --output_path $SNPE_ROOT/examples/Models/VGG/dlc/vgg16_udo.dlc --udo $SNPE_ROOT/examples/SNPE/NativeCpp/UdoExample/Conv2D/config/Conv2D.json
This will generate a DLC named vgg16_udo.dlc containing the Convolution as UDO at $SNPE_ROOT/examples/Models/VGG/dlc.
Step 3: Package Implementations
The generated package creates the skeleton of the operation implementation, which must be filled by the user to create a functional UDO. The rest of the code scaffolding for compatibility with Qualcomm® Neural Processing SDK is provided by the snpe-udo-package-generator. The UDO implementations for this tutorial are provided under $SNPE_ROOT/examples/SNPE/NativeCpp/UdoExample/Conv2D/src.
CPU Implementations (Android and x86)
The file in the package that needs to be implemented for CPU is
ConvUdoCpu/Conv2DPackage/jni/src/CPU/src/ops/Conv.cpp
The provided example implementation is present at the location
$SNPE_ROOT/examples/SNPE/NativeCpp/UdoExample/Conv2D/src/CPU/Conv.cpp
Copy the provided implementation to the package:
cp -f $SNPE_ROOT/examples/SNPE/NativeCpp/UdoExample/Conv2D/src/CPU/Conv.cpp $SNPE_ROOT/examples/Models/VGG/ConvUdoCpu/Conv2DPackage/jni/src/CPU/src/ops/
DSP Implementations (Android) for V65 and V66
Please note that only C files are supported for UDO on DSP V65 and V66 runtimes. Refer Implementing a UDO for DSP V65 and V66 for more information on implementing UDO for DSP V65 and V66 runtimes. The example here executes float implementation on DSP runtime. Please refer to UDO DSP for Quantized DLC tutorial for executing quantized implementation on DSP runtime.
The file in the package that need to be implemented for DSP V65 and V66 are
ConvUdoDsp/Conv2DPackage/jni/src/DSP/ConvolutionImplLibDsp.c
ConvUdoDsp/Conv2DPackage/include/ConvolutionImplLibDsp.h
The provided example implementations are present at the locations
$SNPE_ROOT/examples/SNPE/NativeCpp/UdoExample/Conv2D/src/DSP/Conv2DInt8Impl/ConvolutionImplLibDsp.c
$SNPE_ROOT/examples/SNPE/NativeCpp/UdoExample/Conv2D/src/DSP/Conv2DInt8Impl/ConvolutionImplLibDsp.h
Copy the provided implementations to the package:
cp -f $SNPE_ROOT/examples/SNPE/NativeCpp/UdoExample/Conv2D/src/DSP/Conv2DInt8Impl/ConvolutionImplLibDsp.c $SNPE_ROOT/examples/Models/VGG/ConvUdoDsp/Conv2DPackage/jni/src/DSP/
cp -f $SNPE_ROOT/examples/SNPE/NativeCpp/UdoExample/Conv2D/src/DSP/Conv2DInt8Impl/ConvolutionImplLibDsp.h $SNPE_ROOT/examples/Models/VGG/ConvUdoDsp/Conv2DPackage/include/
Optionally, the user can provide their own implementations in the package.
DSP Implementations for V68 and later
Please note that only C++ files are supported for UDO on DSP V68 and later runtimes. Refer Implementing a UDO for DSP V68 or later for more information on implementing UDO for DSP V68 or later runtimes. The directory paths and locations in this example are specific to DSP V68 and later architectures. For runtimes later than DSP V68, please replace DSP_V68 with the corresponding DSP architecture.
The file in the package that needs to be implemented for DSP V68 and later is
ConvUdoDsp/Conv2DPackage/jni/src/DSP_V68/ConvImplLibDsp.cpp
The provided example implementation is present at the location
$SNPE_ROOT/examples/SNPE/NativeCpp/UdoExample/Conv2D/src/HTP/ConvImplLibDsp.cpp
Copy the provided implementations to the package:
cp -f $SNPE_ROOT/examples/SNPE/NativeCpp/UdoExample/Conv2D/src/HTP/ConvImplLibDsp.cpp $SNPE_ROOT/examples/Models/VGG/ConvUdoDsp/Conv2DPackage/jni/src/DSP_V68/
Optionally, the user can provide their own implementations in the package.
Step 4: Package Compilation
x86 Host Compilation
Compiling on x86 host uses the make build system. Compile the CPU implementations with the following:
cd $SNPE_ROOT/examples/Models/VGG/ConvUdoCpu/Conv2DPackage
make cpu_x86
The expected artifacts after compiling for CPU on x86 host are
ConvUdoCpu/Conv2DPackage/libs/x86-64_linux_clang/libUdoConv2DPackageImplCpu.so
ConvUdoCpu/Conv2DPackage/libs/x86-64_linux_clang/libUdoConv2DPackageReg.so
Android CPU Runtime Compilation
Compilation for the CPU runtime on Android uses Android NDK. The ANDROID_NDK_ROOT environment variable must be set to the directory containing ndk-build in order to compile the package.
export ANDROID_NDK_ROOT=<path_to_android_ndk>
It is suggested to add ANDROID_NDK_ROOT to the PATH environment variable to access ndk-build.
export PATH=$ANDROID_NDK_ROOT:$PATH
Once the ANDROID_NDK_ROOT is part of PATH, compile the package for Android CPU target:
cd $SNPE_ROOT/examples/Models/VGG/ConvUdoCpu/Conv2DPackage
make cpu_android
The expected artifacts after compiling for Android CPU are
ConvUdoCpu/Conv2DPackage/libs/arm64-v8a/libUdoConv2DPackageImplCpu.so
ConvUdoCpu/Conv2DPackage/libs/arm64-v8a/libUdoConv2DPackageReg.so
ConvUdoCpu/Conv2DPackage/libs/arm64-v8a/libc++_shared.so
Hexagon DSP Runtime Compilation
Compilation for the DSP runtime makes use of the make system.
In order to build the implementation libraries for DSP V65 and
V66 runtimes, Hexagon-SDK needs to be installed and set up. For
details, follow the setup instructions on
$HEXAGON_SDK_ROOT/docs/readme.html page, where
HEXAGON_SDK_ROOT is the location of your Hexagon-SDK
installation. Information for compiling a UDO for DSP is
available at Compiling UDO for
DSP.
Model Execution
Execution using snpe-net-run
Executing VGG with UDO is largely the same as use of snpe-net-run without UDO.
The Qualcomm® Neural Processing SDK provides Linux and Android binaries of snpe-net-run under
$SNPE_ROOT/bin/x86_64-linux-clang
$SNPE_ROOT/bin/aarch64-android
$SNPE_ROOT/bin/aarch64-oe-linux-gcc8.2
$SNPE_ROOT/bin/aarch64-oe-linux-gcc9.3
For UDO, snpe-net-run consumes the registration library through the –udo_package_path option. LD_LIBRARY_PATH must also be updated to include the runtime-specific artifacts generated from package compilation.
x86 Host Execution
To execute the network on x86 host, run:
cd $SNPE_ROOT/examples/Models/VGG
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$SNPE_ROOT/examples/Models/VGG/ConvUdoCpu/Conv2DPackage/libs/x86-64_linux_clang/
snpe-net-run --container dlc/vgg16_udo.dlc --input_list data/cropped/raw_list.txt --udo_package_path ConvUdoCpu/Conv2DPackage/libs/x86-64_linux_clang/libUdoConv2DPackageReg.so
Android Target Execution
The tutorial for execution on Android targets will use the arm64-v8a architecture. This portion of the tutorial is generic to all runtimes (CPU, DSP). Set SNPE_TARGET_DSPARCH to the DSP architecture of the target Android device.
# architecture: arm64-v8a - compiler: clang - STL: libc++
export SNPE_TARGET_ARCH=aarch64-android
export SNPE_TARGET_DSPARCH=hexagon-v68
Then, push Qualcomm® Neural Processing SDK binaries and libraries to the target device:
adb shell "mkdir -p /data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/bin"
adb shell "mkdir -p /data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/lib"
adb push $SNPE_ROOT/lib/$SNPE_TARGET_ARCH/*.so \
/data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/lib
adb push $SNPE_ROOT/bin/$SNPE_TARGET_ARCH/snpe-net-run \
/data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/bin
Next, update environment variables on the target device to include the Qualcomm® Neural Processing SDK libraries and binaries:
adb shell
export SNPE_TARGET_ARCH=aarch64-android
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/lib
export PATH=$PATH:/data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/bin
Lastly, push the VGG UDO model and input data to the device:
cd $SNPE_ROOT/examples/Models/VGG
mkdir data/rawfiles && cp data/cropped/*.raw data/rawfiles/
adb shell "mkdir -p /data/local/tmp/vgg16_udo"
adb push data/rawfiles /data/local/tmp/vgg16_udo/cropped
adb push data/raw_list.txt /data/local/tmp/vgg16_udo
adb push dlc/vgg16_udo.dlc /data/local/tmp/vgg16_udo
rm -rf data/rawfiles
Android CPU Execution
Once the model and data have been placed on the device, place the UDO libraries on the device:
cd $SNPE_ROOT/examples/Models/VGG
adb shell "mkdir -p /data/local/tmp/vgg16_udo/cpu"
adb push ConvUdoCpu/Conv2DPackage/libs/arm64-v8a/libUdoConv2DPackageImplCpu.so /data/local/tmp/vgg16_udo/cpu
adb push ConvUdoCpu/Conv2DPackage/libs/arm64-v8a/libUdoConv2DPackageReg.so /data/local/tmp/vgg16_udo/cpu
adb push ConvUdoCpu/Conv2DPackage/libs/arm64-v8a/libc++_shared.so /data/local/tmp/vgg16_udo/cpu
Now set required environment variables and run snpe-net-run on device:
adb shell
cd /data/local/tmp/vgg16_udo/
export SNPE_TARGET_ARCH=aarch64-android
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/lib
export PATH=$PATH:/data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/bin
export LD_LIBRARY_PATH=/data/local/tmp/vgg16_udo/cpu/:$LD_LIBRARY_PATH
snpe-net-run --container vgg16_udo.dlc --input_list raw_list.txt --udo_package_path cpu/libUdoConv2DPackageReg.so
Hexagon DSP Execution
The procedure for execution on device for DSP is largely the same as CPU and GPU. However, the DSP runtime requires quantized network parameters. While DSP allows unquantized DLCs, it is generally recommended to quantize DLCs for improved performance. The tutorial will use a quantized DLC as an illustrative example. Quantizing the DLC requires the snpe-dlc-quantize tool.
To quantize the DLC for use on DSP:
cd $SNPE_ROOT/examples/Models/VGG/
snpe-dlc-quantize --input_dlc dlc/vgg16_udo.dlc --input_list data/cropped/raw_list.txt --udo_package_path ConvUdoCpu/Conv2DPackage/libs/x86-64_linux_clang/libUdoConv2DPackageReg.so --output_dlc dlc/vgg16_udo_quantized.dlc
For more information on snpe-dlc-quantize visit quantization. For information on UDO-specific quantization visit Quantizing a DLC with UDO. For information on DSP runtime visit DSP Runtime.
Now push the quantized model to device:
adb push dlc/vgg16_udo_quantized.dlc /data/local/tmp/vgg16_udo
Note: Please refer to UDO DSP tutorial for Quantized DLC for executing on the DSP runtime using quantized dlc.
Before executing on the DSP, push the Qualcomm® Neural Processing SDK libraries for DSP to device:
adb shell "mkdir -p /data/local/tmp/snpeexample/dsp/lib"
adb push $SNPE_ROOT/lib/$SNPE_TARGET_DSPARCH/unsigned/*.so /data/local/tmp/snpeexample/dsp/lib
Now push DSP-specific UDO libraries to device. Depending on DSP architecture specified in the config, dsp_v68 directory can be dsp_v60 or dsp (with older Qualcomm® Neural Processing SDK).
cd $SNPE_ROOT/examples/Models/VGG
adb shell "mkdir -p /data/local/tmp/vgg16_udo/dsp"
adb push ConvUdoDsp/Conv2DPackage/libs/dsp_v68/*.so /data/local/tmp/vgg16_udo/dsp # For DSP V68 or later
adb push ConvUdoDsp/Conv2DPackage/libs/dsp_v60/*.so /data/local/tmp/vgg16_udo/dsp # For DSP versions less than v68
adb push ConvUdoDsp/Conv2DPackage/libs/arm64-v8a/libUdoConv2DPackageReg.so /data/local/tmp/vgg16_udo/dsp # Pushes reg lib
adb push ConvUdoDsp/Conv2DPackage/libs/arm64-v8a/libc++_shared.so /data/local/tmp/vgg16_udo/dsp
Then set required environment variables and run snpe-net-run on device. Note that Conv2DInt8Impl should be used for quantized DLCs:
adb shell
cd /data/local/tmp/vgg16_udo/
export SNPE_TARGET_ARCH=aarch64-android
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/lib
export PATH=$PATH:/data/local/tmp/snpeexample/$SNPE_TARGET_ARCH/bin
export LD_LIBRARY_PATH=/data/local/tmp/vgg16_udo/dsp/:$LD_LIBRARY_PATH
export ADSP_LIBRARY_PATH="/data/local/tmp/vgg16_udo/dsp/;/data/local/tmp/snpeexample/dsp/lib;/system/lib/rfsa/adsp;/system/vendor/lib/rfsa/adsp;/dsp"
snpe-net-run --container vgg16_udo_quantized.dlc --input_list raw_list.txt --udo_package_path dsp/libUdoConv2DPackageReg.so --use_dsp
To verify classification results, run the following on your host cpu machine.
cd $SNPE_ROOT/examples/Models/VGG
adb pull /data/local/tmp/vgg16_udo/output .
python3 $SNPE_ROOT/examples/Models/VGG/scripts/show_vgg_classifications.py -i data/cropped/raw_list.txt \
-o output/ \
-l data/synset.txt
The output should look like the following, showing classification results for all the images.
Classification results
probability=0.351832 ; class=n02123045 tabby, tabby cat
probability=0.315168 ; class=n02123159 tiger cat
probability=0.313084 ; class=n02124075 Egyptian cat
probability=0.012995 ; class=n02127052 lynx, catamount
probability=0.003528 ; class=n02129604 tiger, Panthera tigris