Tutorial - Benchmarking the Qualcomm® AI Engine Direct Delegate¶
This tutorial demonstrates how to benchmark models running through the Qualcomm® AI Engine Direct Delegate using the TFLite benchmark_model application.
Prerequisites¶
The following list of prerequisites must be met before starting this tutorial:
Finished the Setup from QNN, notice this is different from the Delegate Setup Section. The QNN Setup Section can be accessed through $QNN_SDK_ROOT/docs/QNN/index.html. After open up the index.html, users should see the Setup Section for QNN on the left side.
A Qualcomm device with an ADB connection.
Read the Overview and Setup pages to understand the different components of the Qualcomm® AI Engine Direct Delegate.
The TFLite native benchmark_model application. It is possible to use a precompiled version of the benchmark_model application through the External Delegate interface.
Set the environment variable TENSORFLOW_HOME to point to the location where TensorFlow package is installed. TensorFlow 2.10.1 has been tested and is compatible with this tutorial.
Setup¶
This tutorial will use the same inception_v3_quant model used in Tutorial qtld-net-run Setup to perform benchmarking.
Follow the instructions below to download the model and un-tar it. If you have already performed this setup, these commands can be skipped.
$ python3 $QNN_SDK_ROOT/examples/Models/InceptionV3/scripts/setup_inceptionv3.py -a ~/tmpdir -d
$ python3 $QNN_SDK_ROOT/examples/QNN/TFLiteDelegate/Models/InceptionV3Quant/scripts/convert_inceptionv3_tflite.py
There should be a file called inception_v3_quant.tflite under $QNN_SDK_ROOT/examples/QNN/TFLiteDelegate/Models/InceptionV3Quant. This is the model that will be used to run benchmarking.
Push the benchmark_model application, the TFLite model, the Qualcomm® AI Engine Direct Delegate, and the Qualcomm® AI Engine Direct backend libraries to the device using ADB.
$ adb shell mkdir -p /data/local/tmp/qnn_delegate/inception_v3_quant
$ adb push <PATH_TO_BENCHMARK_MODEL>/benchmark_model /data/local/tmp/qnn_delegate/
$ adb push inception_v3_quant.tflite /data/local/tmp/qnn_delegate/inception_v3_quant/
$ adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnTFLiteDelegate.so /data/local/tmp/qnn_delegate/
# push QNN libraries
$ adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnSystem.so /data/local/tmp/qnn_delegate/
$ adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnHtp.so /data/local/tmp/qnn_delegate/
$ adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnHtpPrepare.so /data/local/tmp/qnn_delegate/
$ adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnHtpV68Stub.so /data/local/tmp/qnn_delegate/
$ adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnHtpV69Stub.so /data/local/tmp/qnn_delegate/
$ adb push $QNN_SDK_ROOT/lib/aarch64-android/libQnnHtpV73Stub.so /data/local/tmp/qnn_delegate/
$ adb push $QNN_SDK_ROOT/lib/hexagon-v68/unsigned/libQnnHtpV68Skel.so /data/local/tmp/qnn_delegate/
$ adb push $QNN_SDK_ROOT/lib/hexagon-v69/unsigned/libQnnHtpV69Skel.so /data/local/tmp/qnn_delegate/
$ adb push $QNN_SDK_ROOT/lib/hexagon-v73/unsigned/libQnnHtpV73Skel.so /data/local/tmp/qnn_delegate/
Notice that the model data does not need to be pushed to the device. This is because the benchmark_model application will automatically create random data for benchmarking.
Running the Benchmark¶
Run the following command to benchmark the inception_v3_quant model.
$ adb shell 'export LD_LIBRARY_PATH=/data/local/tmp/qnn_delegate/:$LD_LIBRARY_PATH &&
export ADSP_LIBRARY_PATH="/data/local/tmp/qnn_delegate/" &&
cd /data/local/tmp/qnn_delegate/inception_v3_quant/ &&
/data/local/tmp/qnn_delegate/benchmark_model \
--graph=inception_v3_quant.tflite \
--external_delegate_path=/data/local/tmp/qnn_delegate/libQnnTFLiteDelegate.so \
--external_delegate_options="backend_type:htp"'
In order to enable per-operator profiling measurements from the benchmark, add
the --enable_op_profiling=true benchmark_model option along with the
profiling:2 delegate option. Look at
External Delegate Options for more information. Run the following
command to get per-operator profiling measurements. You can also save a csv file
by profiling_output_csv_file option of benchmark_model.
Warning
The profiling behavior of Qualcomm® AI Engine Direct Delegate subject to change in near future. Please use with cautions.
$ adb shell 'export LD_LIBRARY_PATH=/data/local/tmp/qnn_delegate/:$LD_LIBRARY_PATH &&
export ADSP_LIBRARY_PATH="/data/local/tmp/qnn_delegate/" &&
cd /data/local/tmp/qnn_delegate/inception_v3_quant/ &&
/data/local/tmp/qnn_delegate/benchmark_model \
--graph=inception_v3_quant.tflite \
--external_delegate_path=/data/local/tmp/qnn_delegate/libQnnTFLiteDelegate.so \
--external_delegate_options="backend_type:htp;profiling:2" \
--enable_op_profiling=true'
Note that the benchmark_model application displays the unit as a time measurement whereas HTP backend capture cycles events.
The HTP backend allows for performance modes which can be configured to achieve
the best performance. Adding the htp_performance_mode:1 delegate option will
enable the maximum performance mode, details for different performance modes
options can be found here External Delegate Options.
Run the following command to enable max performance mode.
$ adb shell 'export LD_LIBRARY_PATH=/data/local/tmp/qnn_delegate/:$LD_LIBRARY_PATH &&
export ADSP_LIBRARY_PATH="/data/local/tmp/qnn_delegate/" &&
cd /data/local/tmp/qnn_delegate/inception_v3_quant/ &&
/data/local/tmp/qnn_delegate/benchmark_model \
--graph=inception_v3_quant.tflite \
--external_delegate_path=/data/local/tmp/qnn_delegate/libQnnTFLiteDelegate.so \
--external_delegate_options="backend_type:htp;htp_performance_mode:1"'
Note that this option is only for the HTP backend. Other backends handle performance mode configurations internally.
Congratulations, you have just benchmarked the Qualcomm® AI Engine Direct Delegate!
Generate Model Cache or Restoring from One¶
By specifying model_token and cache_dir in --external_delegate_options, the
model passed in through the --graph option will either be saved to a
folder for future use, or loaded from the cache file if that file is present.
Building on the example above:
$ adb shell 'export LD_LIBRARY_PATH=/data/local/tmp/qnn_delegate/:$LD_LIBRARY_PATH &&
export ADSP_LIBRARY_PATH="/data/local/tmp/qnn_delegate/" &&
cd /data/local/tmp/qnn_delegate/inception_v3_quant/ &&
/data/local/tmp/qnn_delegate/benchmark_model \
--graph=inception_v3_quant.tflite \
--external_delegate_path=/data/local/tmp/qnn_delegate/libQnnTFLiteDelegate.so \
--external_delegate_options="backend_type:htp;htp_performance_mode:1;cache_dir:/data/local/tmp/;model_token:qnn_delegate_model"'
If the cache directory doesn’t exist, model caching will fail.
if it exists and caching data doesn’t exist, model caching will operate in SAVE MODE, meaning
the prepared model will be saved to these files. These files are created by the Qualcomm® AI Engine Direct Delegate.
The effect of caching can be seen if benchmark_model
is executed with the same options again. Its console output would indicate caching is operating
in RESTORE MODE.
The activation of the model caching feature is logged as INFO / WARNING logs.
One way to tell if the model is being restored from the cache file, rather than prepared from the tflite
model file, is by comparing the time it took to initialize the session. The time can be found on the
Initialized session line in console output: For moderate or large models, there should be a noticeable
difference.
One thing to note about the model caching feature: The Qualcomm® AI Engine Direct Delegate is designed to continue with the inference requests even when the caching feature fails. Here are some of the ways that the caching feature can fail:
Files in
cache_dirare not readable when restoring, or not writable when saving