Tutorial - Skip Delegation Ops Using the Qualcomm® AI Engine Direct Delegate¶
Qualcomm® AI Engine Direct Delegate has provided skip options, giving users the flexilbity to control whether to fallback specific Ops to TFLite runtime. This tutorial demonstrates how to use the skip options with the Qualcomm® AI Engine Direct Delegate. We will go through how to use qtld-net-run to perform inferences. If users are interested in the implementation of Skip Delegation OPs with Qualcomm® AI Engine Direct Delegate, a sample is provided at $QNN_SDK_ROOT/examples/QNN/TFLiteDelegate/SkipNodeExample.
Prerequisites¶
The following list of prerequisites must be met before starting this tutorial:
Finish Tutorial qtld-net-run
How to Skip Delegation OP IDs¶
Qualcomm® AI Engine Direct Delegate has provided 2 ways to skip Delegation Ops:
Skip Delegation Ops based on Op IDs.
Skip Delegation Ops based on Node IDs
Skip Delegation Ops with Op IDs¶
Each Op has its corresponding Op ID, which is defined in tensorflow/lite/builtin_ops.h.
If users do not want specific Op(s) to be delegated by Qualcomm® AI Engine Direct Delegate,
users can do so by specifying the Op IDs.
Taking inception_v3_quant model as an example, there are a total of 4 MaxPool2d operations in the model.
If we want to skip MaxPool2d, which has Op ID=17, we can run:
$ adb shell 'export LD_LIBRARY_PATH=/data/local/tmp/qnn_delegate/:$LD_LIBRARY_PATH &&
export ADSP_LIBRARY_PATH="/data/local/tmp/qnn_delegate/" &&
cd /data/local/tmp/qnn_delegate/inception_v3_quant/ &&
/data/local/tmp/qnn_delegate/qtld-net-run \
--model inception_v3_quant.tflite \
--input target_raw_list.txt \
--output output \
--backend htp \
--skip_delegate_ops 17'
The output should look similar to the following:
TFLite model: [inception_v3_quant.tflite]
Input list file: [target_raw_list.txt]
Total number of inferences: [4]
Using QNN Backend: [htp]
Ops not to be delegated : [17]
Loaded model successfully.
INFO: Initialized TensorFlow Lite runtime.
INFO: Operator Builtin Code 17 MAX_POOL_2D not to be delegated
INFO: Operator Builtin Code 17 MAX_POOL_2D not to be delegated
INFO: Operator Builtin Code 17 MAX_POOL_2D not to be delegated
INFO: Operator Builtin Code 17 MAX_POOL_2D not to be delegated
INFO: TfLiteQnnDelegate delegate: 124 nodes delegated out of 128 nodes with 5 partitions.
=== Pre-invoke Interpreter State ===
Line 719: Allocated 1 input tensor(s)
Line 729: Allocated 1 output tensor(s)
=== Invoking Interpreter ===
Line 918: About to fout.write() output tensors with 4004 bytes
=== Invoking Interpreter ===
Line 918: About to fout.write() output tensors with 4004 bytes
=== Invoking Interpreter ===
Line 918: About to fout.write() output tensors with 4004 bytes
=== Invoking Interpreter ===
Line 918: About to fout.write() output tensors with 4004 bytes
From the output above, we can see that there are a total of 5 paritions for Qualcomm® AI Engine Direct Delegate
to execute since the graph is partitioned based on the 4 MaxPool2d operations.
Notice that different from Tutorial qtld-net-run, we added the following when executing:
--skip_delegate_ops 17
If we want to skip more than 1 Op ID, for example, MaxPool2d, and AveragePool2d,
we can do:
--skip_delegate_ops 1,17
Skip Delegation Ops with Node IDs¶
The second option to skip Delegation Op(s) is based on the Node ID(s).
Different from above where we use --skip_delegate_ops, this time we will be using --skip_delegate_node_ids.
Taking inception_v3_quant model as an example, there are a total of 128 nodes in the model, and each of them has corresponding node ID, from 0-127.
Users can specify the node ID to be skipped when executing. For example, if a user wants to skip node ID 2 and 5, the following can be performed:
$ adb shell 'export LD_LIBRARY_PATH=/data/local/tmp/qnn_delegate/:$LD_LIBRARY_PATH &&
export ADSP_LIBRARY_PATH="/data/local/tmp/qnn_delegate/" &&
cd /data/local/tmp/qnn_delegate/inception_v3_quant/ &&
/data/local/tmp/qnn_delegate/qtld-net-run \
--model inception_v3_quant.tflite \
--input target_raw_list.txt \
--output output \
--backend htp \
--skip_delegate_node_ids 2,5'
The output should look similar to the following:
TFLite model: [inception_v3_quant.tflite]
Input list file: [target_raw_list.txt]
Total number of inferences: [4]
Using QNN Backend: [htp]
Node ids not to be delegated : [2,5]
Loaded model successfully.
INFO: Initialized TensorFlow Lite runtime.
INFO: Node 2 not to be delegated
INFO: Node 5 not to be delegated
INFO: TfLiteQnnDelegate delegate: 126 nodes delegated out of 128 nodes with 3 partitions.
=== Pre-invoke Interpreter State ===
Line 719: Allocated 1 input tensor(s)
Line 729: Allocated 1 output tensor(s)
=== Invoking Interpreter ===
Line 918: About to fout.write() output tensors with 4004 bytes
=== Invoking Interpreter ===
Line 918: About to fout.write() output tensors with 4004 bytes
=== Invoking Interpreter ===
Line 918: About to fout.write() output tensors with 4004 bytes
=== Invoking Interpreter ===
Line 918: About to fout.write() output tensors with 4004 bytes
From the output above, we can see that Node 2 and Node 5 is skipped, so there are a total of 3 paritions for Qualcomm® AI Engine Direct Delegate to execute since the graph is partitioned based on the 2 skipped Node IDs. As mentioned earlier, users can refer to $QNN_SDK_ROOT/examples/QNN/TFLiteDelegate/SkipNodeExample for the implementation of Skip Delegation OPs with Node IDs.