Tutorial: Utilizing Deep Learning Containers (DLCs) in Qualcomm® AI Engine Direct

This tutorial demonstrates how to use Deep Learning Containers (DLCs) in QNN. DLCs are Qualcomm’s serialized model format. Qualcomm Neural Processing SDK introduced the DLC serialization format to represent and store deep learning models. Qualcomm® AI Engine Direct now supports DLCs, to simplify cross product workflows with Qualcomm® Neural Processing SDK since developers can leverage model preparation and analysis tools from Qualcomm® Neural Processing SDK using the same model artifact. Qualcomm® AI Engine Direct can also now prepare and execute with large (> 2GB) models.

Utilizing DLCs in QNN requires model conversion tools from Qualcomm® Neural Processing SDK, and execution tools and libraries from Qualcomm® AI Engine Direct.

This tutorial uses Inception V3 as the source framework model and the qnn-net-run executable as the example application. This tutorial covers execution on CPU, GPU, and HTP backends on host (CPU/HTP) and client devices.

This tutorial is divided into the following sections:

  1. Tutorial Setup

  2. Model Conversion

  3. Executing Example Model

Note

DLC in QNN is only supported on Linux x86 and Aarch64 platforms.

Note

DLC in QNN should be used with matching SNPE and QNN releases i.e. DLC from version 2.15 of SNPE should be used with version 2.15 of QNN.

Tutorial Setup

The tutorial assumes general setup instructions for both QNN and SNPE have been followed. In particular, conversion to DLCs with tools requires PYTHONPATH and SNPE_ROOT to be set appropriately.

Additionally, this tutorial requires the acquisition of the Inception V3 Tensorflow model file and sample images. This is handled by the provided setup script setup_inceptionv3.py. The script is located at:

${QNN_SDK_ROOT}/examples/Models/InceptionV3/scripts/setup_inceptionv3.py

Usage is as follows:

usage: setup_inceptionv3.py [-h] -a ASSETS_DIR [-d] [-c] [-q]

Prepares the inception_v3 assets for tutorial examples.

required arguments:
  -a ASSETS_DIR, --assets_dir ASSETS_DIR
                        directory containing the inception_v3 assets

optional arguments:
  -d, --download        Download inception_v3 assets to inception_v3 example
                        directory
  -c, --convert_model   Convert and compile model once acquired.
  -q, --quantize_model  Quantize the model during conversion. Only available
                        if --c or --convert_model option is chosen

Before using the script, please set the environment variable TENSORFLOW_HOME to point to the location where TensorFlow package is installed. The script uses TensorFlow utilities like optimize_for_inference.py, which are present in the TensorFlow installation directory.

  1. Find the location of the TensorFlow package:

    $ python3 -m pip show tensorflow
    
  2. Set the TENSORFLOW_HOME environment variable using the installation location of the TensorFlow package (the location field from the output in step #1):

    $ export TENSORFLOW_HOME=<tensorflow-location>/tensorflow_core
    
  3. Install the Inception V3 TensorFlow model and sample images using the setup_inceptionv3.py script:

    $ python3 ${QNN_SDK_ROOT}/examples/Models/InceptionV3/scripts/setup_inceptionv3.py -a ~/tmpdir -d
    

This model file should now be populated at the following location:

${QNN_SDK_ROOT}/examples/Models/InceptionV3/tensorflow/inception_v3_2016_08_28_frozen.pb

This raw images should now be populated at the following location:

${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/cropped

Model Conversion

After acquiring the the model assets the model can be converted to a DLC using the Conversion Tools in the Qualcomm® Neural Processing SDK.

Note

A quantized model is needed for use on HTP and DSP backends. See Model Quantization to generate a quantized DLC.

Convert the Inception V3 model use the snpe-tensorflow-to-dlc tool.

$ ${SNPE_ROOT}/bin/x86_64-linux-clang/snpe-tensorflow-to-dlc \
  --input_network ${QNN_SDK_ROOT}/examples/Models/InceptionV3/tensorflow/inception_v3_2016_08_28_frozen.pb \
  --input_dim input 1,299,299,3 \
  --out_node InceptionV3/Predictions/Reshape_1 \
  --output_path ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3.dlc \

This produces the ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3.dlc DLC file.

The DLC contains the serialized model, network topology and the associated model data.

Model Quantization

The DLC can be quantized using the snpe-dlc-quantize tool. Example usage is below:

$ ${SNPE_ROOT}/bin/x86_64-linux-clang/snpe-dlc-quantize \
  --input_dlc ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3.dlc \
  --input_list ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/cropped/raw_list.txt \
  --output_dlc ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3_quantized.dlc \

This will produce the following artifacts:

  • ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3_quantized.dlc

Note

When quantizing a model the input list must contain the absolute path to the input data.

Execution requires the produced DLC and the provided utility library libQnnModelDlc.so. This library extends the QNN Model API to compose a QNN graph and return its handle from a provided DLC path.

ModelError_t QnnModel_composeGraphsFromDlc(Qnn_BackendHandle_t backendHandle,
                                        QNN_INTERFACE_VER_TYPE interface,
                                        Qnn_ContextHandle_t contextHandle,
                                        const GraphConfigInfo_t **graphsConfigInfo,
                                        const char *dlcPath,
                                        const uint32_t numGraphsConfigInfo,
                                        GraphInfoPtr_t **graphsInfo,
                                        uint32_t *numGraphsInfo,
                                        bool debug,
                                        QnnLog_Callback_t logCallback,
                                        QnnLog_Level_t maxLogLevel)

This is identical to the QnnGraph_ComposeGraphs API with the addition of the dlcPath input argument. The returned QNN graph handle can then be finalized and executed.

The following section demonstrates execution with a DLC.

CPU Backend Execution

Execute on a Linux host

  1. Execute the model using qnn-net-run with the libQnnModelDlc.so utility library as the --model argument and the Inception_v3.dlc as the --dlc_path argument.

    $ cd ${QNN_SDK_ROOT}/examples/Models/InceptionV3
    $ ${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-net-run \
                  --backend ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnCpu.so \
                  --model ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnModelDlc.so \
                  --dlc_path ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3.dlc \
                  --input_list data/cropped/raw_list.txt
    

    The results will be located at ${QNN_SDK_ROOT}/examples/Models/InceptionV3/output.

  2. View the results.

    $ python ${QNN_SDK_ROOT}/examples/Models/InceptionV3/scripts/show_inceptionv3_classifications.py -i data/cropped/raw_list.txt \
                                                                                   -o output/ \
                                                                                   -l data/imagenet_slim_labels.txt
    

Execute on Android

Running the CPU backend on an Android target is similar to running on a Linux x86 target.

  1. Create a directory for the example on the Android device.

    $ adb shell "mkdir /data/local/tmp/inception_v3"
    
  2. Push the necessary libraries and DLC to the device.

    $ adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnCpu.so /data/local/tmp/inception_v3
    $ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3.dlc /data/local/tmp/inception_v3
    $ adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnModelDlc.so /data/local/tmp/inception_v3
    
  3. Push the input data and lists to the device.

    $ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/cropped /data/local/tmp/inception_v3
    $ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/target_raw_list.txt /data/local/tmp/inception_v3
    
  4. Push the qnn-net-run tool to the device.

    $ adb push ${QNN_SDK_ROOT}/bin/aarch64-android/qnn-net-run /data/local/tmp/inception_v3
    
  5. Set up the device environment.

    $ adb shell
    $ cd /data/local/tmp/inception_v3
    $ export LD_LIBRARY_PATH=/data/local/tmp/inception_v3
    
  6. Run qnn-net-run with the following arguments.

    $ ./qnn-net-run --backend libQnnCpu.so --model libQnnModelDlc.so --dlc_path Inception_v3.dlc --input_list target_raw_list.txt
    

    Outputs from the run will be located at the default ./output directory.

  7. Exit the device and view the results.

    $ exit
    $ cd ${QNN_SDK_ROOT}/examples/Models/InceptionV3
    $ adb pull /data/local/tmp/inception_v3/output output_android
    $ python3 ${QNN_SDK_ROOT}/examples/Models/InceptionV3/scripts/show_inceptionv3_classifications.py -i data/cropped/raw_list.txt \
                                                                                   -o output_android/ \
                                                                                   -l data/imagenet_slim_labels.txt
    

GPU Backend Execution

Execute on Android

Running the GPU backend on an Android target is similar to running the CPU backend on an Android target.

  1. Create a directory for the example on the Android device.

    $ adb shell "mkdir /data/local/tmp/inception_v3"
    
  2. Push the necessary libraries and DLC to the device.

    $ adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnGpu.so /data/local/tmp/inception_v3
    $ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3.dlc /data/local/tmp/inception_v3
    $ adb push ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnModelDlc.so /data/local/tmp/inception_v3
    
  3. Push the input data and lists to the device.

    $ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/cropped /data/local/tmp/inception_v3
    $ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/target_raw_list.txt /data/local/tmp/inception_v3
    
  4. Push the qnn-net-run tool to the device.

    $ adb push ${QNN_SDK_ROOT}/bin/aarch64-android/qnn-net-run /data/local/tmp/inception_v3
    
  5. Set up the device environment.

    $ adb shell
    $ cd /data/local/tmp/inception_v3
    $ export LD_LIBRARY_PATH=/data/local/tmp/inception_v3
    
  6. Run qnn-net-run with the following arguments.

    $ ./qnn-net-run --backend libQnnGpu.so --model libQnnModelDlc.so --dlc_path Inception_v3.dlc --input_list target_raw_list.txt
    

    Outputs from the run will be located at the default ./output directory.

  7. Exit the device and view the results.

    $ exit
    $ cd ${QNN_SDK_ROOT}/examples/Models/InceptionV3
    $ adb pull /data/local/tmp/inception_v3/output output_android
    $ python3 ${QNN_SDK_ROOT}/examples/Models/InceptionV3/scripts/show_inceptionv3_classifications.py -i data/cropped/raw_list.txt \
                                                                                  -o output_android/ \
                                                                                  -l data/imagenet_slim_labels.txt
    

Execute on Windows

Please ensure the `Model Build on Windows Host`_ and `Compiling for CPU on Windows`_ sections have been completed before proceeding.

First, create the following folder on the Windows device: C:\dlc_qnn_test.

Now, copy the following files from the development host to the Windows device’s testing folder: C:\dlc_qnn_test

- ${QNN_SDK_ROOT}/bin/aarch64-windows-msvc/qnn-net-run.exe
- ${QNN_SDK_ROOT}/lib/aarch64-windows-msvc/QnnGpu.dll
- ${QNN_SDK_ROOT}/lib/aarch64-windows-msvc/QnnModelDlc.dll
- ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3.dlc
- ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/cropped
- ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/target_raw_list.txt

Finally, connect to the Windows device and use qnn-net-run.exe with the following command in PowerShell:

$ .\qnn-net-run.exe --backend .\QnnGpu.dll `
                    --model .\QnnModelDlc.dll `
                    --dlc_path .\Inception_v3.dll `
                    --input_list .\target_raw_list.txt

This will produce the results at .\output.

To view the result:

First, create a directory ${QNN_SDK_ROOT}\tmp\qnn_inception_v3_test_package on the development host.

Copy the output folder from the Windows device back to ${QNN_SDK_ROOT}\tmp\qnn_inception_v3_test_package.

Also copy the following files and directories to ${QNN_SDK_ROOT}\tmp\qnn_inception_v3_test_package.

- ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/cropped
- ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/imagenet_slim_labels
- ${QNN_SDK_ROOT}/examples/Models/InceptionV3/scripts/show_inceptionv3_classifications.py

On the development host, open Developer PowerShell for VS 2022 to view the result:

$ cd ${QNN_SDK_ROOT}\tmp\qnn_inception_v3_test_package
$ py -3 .\show_inceptionv3_classifications.py `
     -i .\cropped\raw_list.txt `
     -o .\output `
     -l .\imagenet_slim_labels.txt

HTP Backend Execution

Execute on a Linux host

Note

The HTP backend can be exercised on a Linux host using the HTP emulation backend.

  1. Execute the model using qnn-net-run with the libQnnModelDlc.so utility library as the --model argument, and the Inception_v3_quantized.dlc as the --dlc_path argument.

    $ cd ${QNN_SDK_ROOT}/examples/Models/InceptionV3
    $ ${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-net-run \
                  --backend ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnHtp.so \
                  --model ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnModelDlc.so \
                  --dlc_path ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3_quantized.dlc \
                  --input_list data/cropped/raw_list.txt
    

    Note

    The HTP Emulation backend requires a quantized model. For more information on quantization see Model Quantization.

    The results will be located at ${QNN_SDK_ROOT}/examples/Models/InceptionV3/output.

  2. View the results.

    $ python ${QNN_SDK_ROOT}/examples/Models/InceptionV3/scripts/show_inceptionv3_classifications.py -i data/cropped/raw_list.txt \
                                                                                   -o output/ \
                                                                                   -l data/imagenet_slim_labels.txt
    

Execute on Android

Running the HTP backend on an Android target is similar to running the CPU and GPU backends on an Android target, except the HTP backend requires a quantized model and a user-generated serialized context. For more information on quantization see Model Quantization.

  1. Generate a serialized context from a DLC by running qnn-context-binary-generator with libQnnModelDlc.so as the --model argument and the quantized DLC as the --dlc_path argument.

    $ ${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-context-binary-generator \
                  --backend ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnHtp.so \
                  --model ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnModelDlc.so \
                  --dlc_path ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3_quantized.dlc \
                  --binary_file Inception_v3_quantized.serialized
    

    The context will be created at ./output/Inception_v3_quantized.serialized.bin.

  2. Create a directory for the example on the Android device.

    $ adb shell "mkdir /data/local/tmp/inception_v3"
    
  3. Push the necessary libraries and DLC to the device.

    $ adb push ${QNN_SDK_ROOT}/lib/hexagon-v68/unsigned/libQnnHtpV68Skel.so /data/local/tmp/inception_v3
    $ adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnHtpV68Stub.so /data/local/tmp/inception_v3
    $ adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnHtp.so /data/local/tmp/inception_v3
    $ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/output/Inception_v3_quantized.serialized.bin /data/local/tmp/inception_v3
    

    Note

    This section demonstrates HTP execution on Android with offline prepared graph steps. To execute an on-device (online) prepared graph, push an on-device prepare library and quantized DLC.

    $ adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnHtpPrepare.so /data/local/tmp/inception_v3
    $ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3_quantized.dlc /data/local/tmp/inception_v3
    
  4. Push the input data and lists to the device.

    $ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/cropped /data/local/tmp/inception_v3
    $ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/target_raw_list.txt /data/local/tmp/inception_v3
    
  5. Push the qnn-net-run tool to the device.

    $ adb push ${QNN_SDK_ROOT}/bin/aarch64-android/qnn-net-run /data/local/tmp/inception_v3
    
  6. Set up the device environment.

    $ adb shell
    $ cd /data/local/tmp/inception_v3
    $ export LD_LIBRARY_PATH=/data/local/tmp/inception_v3
    $ export ADSP_LIBRARY_PATH="/data/local/tmp/inception_v3"
    
  7. Run qnn-net-run with the following arguments.

    $ ./qnn-net-run --backend libQnnHtp.so --input_list target_raw_list.txt --retrieve_context Inception_v3_quantized.serialized.bin
    

    Outputs from the run will be located at the default ./output directory.

  8. Exit the device and view the results.

    $ exit
    $ cd ${QNN_SDK_ROOT}/examples/Models/InceptionV3
    $ adb pull /data/local/tmp/inception_v3/output output_android
    $ python3 ${QNN_SDK_ROOT}/examples/Models/InceptionV3/scripts/show_inceptionv3_classifications.py -i data/cropped/raw_list.txt \
                                                                                  -o output_android/ \
                                                                                  -l data/imagenet_slim_labels.txt