Tutorial - Running Inference Using the Qualcomm® AI Engine Direct Delegate

This tutorial demonstrates how to run the TFLite inception_v3_quant model using the Qualcomm® AI Engine Direct Delegate on the HTP backend.

Prerequisites

The following list of prerequisites must be met before starting this tutorial:

  1. Finished the Setup from QNN, notice this is different from the Delegate Setup Section. The QNN Setup Section can be accessed through $QNN_SDK_ROOT/docs/QNN/index.html. After open up the index.html, users should see the Setup Section for QNN on the left side.

  2. A Qualcomm device with an ADB connection.

  3. Read the Overview and Setup pages to understand the different components of the Qualcomm® AI Engine Direct Delegate.

  4. A python3 environment with the numpy package installed.

  5. Set the environment variable TENSORFLOW_HOME to point to the location where TensorFlow package is installed. TensorFlow 2.10.1 has been tested and is compatible with this tutorial.

Setup

In this tutorial, the inception_v3_quant model will be used to run inference with the delegate. The Qualcomm® AI Engine Direct Delegate comes with some artifacts for this model under $QNN_SDK_ROOT/examples/Models/InceptionV3.

First, to get the the model file and images, run:

$ python3 $QNN_SDK_ROOT/examples/Models/InceptionV3/scripts/setup_inceptionv3.py -a ~/tmpdir -d

Notice the following directories under this path:

  • data/cropped: the jpg and the preprocessed versions of the images.

  • data/target_raw_list.txt: The list of paths of the preprocessed images.

  • tensorflow/inception_v3_2016_08_28_frozen_opt.pb: the jpg and the preprocessed versions of the images.

Follow the instructions below to convert inception_v3_2016_08_28_frozen_opt.pb into inception_v3_quant.tflite.

$ python3 $QNN_SDK_ROOT/examples/QNN/TFLiteDelegate/Models/InceptionV3Quant/scripts/convert_inceptionv3_tflite.py

The output inception_v3_quant.tflite is located at $QNN_SDK_ROOT/examples/QNN/TFLiteDelegate/Models/InceptionV3Quant. This is the model that will be used to run inference with.

Next, push the model, croped, and target_raw_list.txt to the device using adb.

$ adb shell mkdir -p /data/local/tmp/qnn_delegate/inception_v3_quant
$ adb push $QNN_SDK_ROOT/examples/QNN/TFLiteDelegate/Models/InceptionV3Quant/inception_v3_quant.tflite /data/local/tmp/qnn_delegate/inception_v3_quant/
$ adb push $QNN_SDK_ROOT/examples/Models/InceptionV3/data/cropped /data/local/tmp/qnn_delegate/inception_v3_quant/
$ adb push $QNN_SDK_ROOT/examples/Models/InceptionV3/data/target_raw_list.txt /data/local/tmp/qnn_delegate/inception_v3_quant/

This tutorial will use the tools:qtld-net-run application to run inference through the delegate. Push this application and the Qualcomm® AI Engine Direct Delegate to the device.

$ adb push $QNN_SDK_ROOT/bin/aarch64-android/qtld-net-run /data/local/tmp/qnn_delegate/

Finally, push the Qualcomm® AI Engine Direct HTP backend libraries to the device. Notice that for the HTP and DSP backend, there are two libraries that need to be pushed, the Stub library that will run on the CPU and the Skel library that will run on the HTP or DSP.

Here is an example for the HTP backend.

$ adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnSystem.so /data/local/tmp/qnn_delegate/
$ adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnHtp.so /data/local/tmp/qnn_delegate/
$ adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnHtpPrepare.so /data/local/tmp/qnn_delegate/
$ adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnHtpV68Stub.so /data/local/tmp/qnn_delegate/
$ adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnHtpV69Stub.so /data/local/tmp/qnn_delegate/
$ adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnHtpV73Stub.so /data/local/tmp/qnn_delegate/
$ adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnHtpV75Stub.so /data/local/tmp/qnn_delegate/
$ adb push $QNN_SDK_ROOT/lib/hexagon-v68/unsigned/libQnnHtpV68Skel.so /data/local/tmp/qnn_delegate/
$ adb push $QNN_SDK_ROOT/lib/hexagon-v69/unsigned/libQnnHtpV69Skel.so /data/local/tmp/qnn_delegate/
$ adb push $QNN_SDK_ROOT/lib/hexagon-v73/unsigned/libQnnHtpV73Skel.so /data/local/tmp/qnn_delegate/
$ adb push $QNN_SDK_ROOT/lib/hexagon-v75/unsigned/libQnnHtpV75Skel.so /data/local/tmp/qnn_delegate/

Running the inception_v3_quant Model

Now that all artifacts are on the device, inference can be run on the inception_v3_quant model using the qtld-net-run application.

Run the following command to execute inference with qtld-net-run. Checkout the tools:qtld-net-run page for a reference on all the supported command line options.

$ adb shell 'export LD_LIBRARY_PATH=/data/local/tmp/qnn_delegate/:$LD_LIBRARY_PATH &&
             export ADSP_LIBRARY_PATH="/data/local/tmp/qnn_delegate/" &&
             cd /data/local/tmp/qnn_delegate/inception_v3_quant/ &&
             /data/local/tmp/qnn_delegate/qtld-net-run \
             --model inception_v3_quant.tflite \
             --input target_raw_list.txt \
             --output output \
             --backend htp'

The output should look something like the following. If there are any errors revisit the instructions above.

TFLite model: [inception_v3_quant.tflite]
Input list file: [target_raw_list.txt]
Total number of inferences: [4]
Using QNN Backend: [htp]
Loaded model successfully.

INFO: Initialized TensorFlow Lite runtime.
INFO: TfLiteQnnDelegate delegate: 128 nodes delegated out of 128 nodes with 1 partitions.

=== Pre-invoke Interpreter State ===
Line 720: Allocated 1 input tensor(s)
Line 730: Allocated 1 output tensor(s)

=== Invoking Interpreter ===
Line 894: About to fout.write() output tensors with 4004 bytes
=== Invoking Interpreter ===
Line 894: About to fout.write() output tensors with 4004 bytes
=== Invoking Interpreter ===
Line 894: About to fout.write() output tensors with 4004 bytes
=== Invoking Interpreter ===
Line 894: About to fout.write() output tensors with 4004 bytes

Notice the line X nodes delegated out of Y nodes with N partitions. This is an info log from the TFLite framework stating how many nodes in the graph were successfully delegated to the Qualcomm® AI Engine Direct Delegate. If the Qualcomm® AI Engine Direct Delegate does not support an operator in the model, it will not be delegated but will instead fall back to other supported runtimes, creating multiple partitions.

After qtld-net-run has completed running, the output results can be pulled from the disk and inspected.

$ cd $QNN_SDK_ROOT/examples/QNN/TFLiteDelegate/Models/InceptionV3Quant
$ adb pull /data/local/tmp/qnn_delegate/inception_v3_quant/output ./

Notice under the output folder, there are four result folders, representing the output of each input image.

QNN has provided a program, show_inceptionv3_classifications.py, to view the results. Under $QNN_SDK_ROOT/examples/Models/InceptionV3/scripts, launch the script file convert_output.sh to convert the output directory into show_inceptionv3_classifications.py readable format.

$ ./scripts/convert_output.sh

The converted output will be stored inside the folder output_android. Next, execute show_inceptionv3_classifications.py with the following:

$ python3 $QNN_SDK_ROOT/examples/Models/InceptionV3/scripts/show_inceptionv3_classifications.py \
            -i $QNN_SDK_ROOT/examples/Models/InceptionV3/data/cropped/raw_list.txt \
            -o $QNN_SDK_ROOT/examples/QNN/TFLiteDelegate/Models/InceptionV3Quant/output_android/ \
            -l $QNN_SDK_ROOT/examples/Models/InceptionV3/data/imagenet_slim_labels.txt

The classification result should be similar:

${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/cropped/trash_bin.raw   0.695312 413 ashcan
${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/cropped/plastic_cup.raw 0.996094 648 measuring cup
${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/cropped/notice_sign.raw 0.175781 459 brass
${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/cropped/chairs.raw      0.410156 832 studio couch

Congratulations, you have just ran your first inference with the Qualcomm® AI Engine Direct Delegate!

Get Profile Result from Inference

Run the following command to profile the inception_v3_quant model.

$ adb shell 'export LD_LIBRARY_PATH=/data/local/tmp/qnn_delegate/:$LD_LIBRARY_PATH &&
             export ADSP_LIBRARY_PATH="/data/local/tmp/qnn_delegate/" &&
             cd /data/local/tmp/qnn_delegate/inception_v3_quant/ &&
             /data/local/tmp/qnn_delegate/qtld-net-run \
             --model inception_v3_quant.tflite \
             --input target_raw_list.txt \
             --output output \
             --profiling 1 \
             --profiling_output_dir profile_results \
             --backend htp'

For other profiling options, please refer to tools:qtld-net-run.

Warning

The profiling behavior of Qualcomm® AI Engine Direct Delegate subject to change in near future. Please use with cautions.

View Profiling Result by qtld-profile-viewer

After qtld-net-run has completed running, the profiling output results can be pulled from the device.

$ adb pull /data/local/tmp/qnn_delegate/inception_v3_quant/profile_results/qnn_delegate_profiling_result.bin ./

The binary file can be convert into a .txt file by the following command.

$ $QNN_SDK_ROOT/bin/aarch64-android/qtld-profile-viewer \
   --input_profile_data qnn_delegate_profiling_result.bin \
   --topK <topK> \
   --output ./profiling_output.txt \
   --num_warmup 1

Note that the options is described as below.

  • --input_profile_data: A binary input file that contains the profiling result.

  • --topK: Only used in detailed profiling mode. Number of events to be printed under Top by Computation Time section. Default is 5.

  • --output: An output txt file contains human readable profiling result. If it is not specified, the profiling output will show in standard output.

  • --num_warmup: Number of initialization/execution to be counted as warmups. Default is 1.