Tools¶
This page describes the various tools included in the Qualcomm® AI Engine Direct Delegate and their features.
qtld-net-run¶
The qtld-net-run tool is used to perform inference with a model using an input list, while saving the model’s output tensors to disk. This tool can run using TFLite’s own CPU runtime or it may utilize the Qualcomm® AI Engine Direct Delegate and perform inference using a specific backend.
Note
Please check qtld-net-run –help for latest options.
DESCRIPTION:
------------
Example application to load a TFLite model and perform inference with the QNN Delegate
or the TFLite CPU.
REQUIRED ARGUMENTS:
-------------------
--model <tflite model> Path to TFLite model.
--input <input list file> A text file that contains paths to
pre-processed raw input data.
OPTIONAL ARGUMENTS:
-------------------
--output <output directory path> Path of directory to save output tensor data.
Default is ./outputs
--backend <backend type> What QNN Backend to use. Supported backends:
gpu,
htp,
dsp.
Please do not enter more than 1 backend.
Not given will fall back to TFLite CPU runtime.
--library_path <file path> Path to QNN Library.
--skel_library_dir <Skel directory> Directory of QNN Skel Library.
--log_level <log level> Log Level between 0-4, higher is more verbose.
--gpu_precision <gpu precision> Precision for GPU backend. 0 = User Specified,
1 = Float32, 2 = Float16, 3 = Hybrid.
--gpu_performance_mode <gpu performance mode> Flag to enable gpu performance mode for gpu
backend, 0 = Default, 1 = High,
2 = Normal, 3 = Low.
--htp_use_conv_hmx This is a default option. With --htp_disable_conv_hmx, this flag
will be ignored. With using short conv hmx, we might have better
performance, but convolution that have short depth and/or weights
that are not symmetric could exhibit inaccurate results.
--htp_disable_conv_hmx Disable short conv hmx. Clients that have graphs where weights are
not symmetric and have Convolution with short depths should set this
flag to guarantee accurate results."
--htp_use_fold_relu With using fold relu, we might have better performance, this
optimization is correct when quantization ranges for convolution are
equal or subset of the Relu operation.
--htp_performance_mode <htp performance mode> Flag to enable htp performance mode for htp
backend, 0 = Default,
1 = Sustained High Performance,
2 = Burst, 3 = High Performance,
4 = Power Saver, 5 = Low Power Saver,
6 = High Power Saver, 7 = Low Balance,
8 = Balance.
--htp_precision <htp precision> Precision for HTP backend. 0 = Quantized,
1 = Quantized and also Float16 on certain SoCs. (Default=1)
--htp_optimization_strategy <htp optimization strategy> HTP optimization_strategy 0 = optimize for inference,
1 = optimize for prepare,
2 = optimize for inference O3
--htp_vtcm_size <htp vtcm size> Set HTP VTCM Size in MB
--htp_num_hvx_threads <htp hvx threads> Set HTP number of HVX threads
--dsp_performance_mode <dsp performance mode> Flag to enable dsp performance mode for dsp
backend, 0 = Default,
1 = Sustained High Performance,
2 = Burst, 3 = High Performance,
4 = Power Saver, 5 = Low Power Saver,
6 = High Power Saver, 7 = Low Balance,
8 = Balance.
--dsp_pd_session <dsp pd session> Flag to enable pd session mode for dsp,
Supported mode: unsigned, signed, adaptive.
--skip_delegate_ops <id0,id1,id2> Set ops not to be delegated manually based on the op id(s).
To obtain all the op ids, please refer to tensorflow/lite/builtin_ops.h.
Notice that we skip ALL same type in the array.
For example, if you set skip SquaredDifference in your model,
all of SquaredDifference ops in models will not be delegated.
--skip_delegate_node_ids <id0,id1,id2> Set node not to be delegated manually based on the node id(s).
Node id can be obtained by node's location information in .tflite.
--graph_priority <priority level> Sets the graph priority. 0 = QNN_PRIORITY_DEFAULT,
1 = QNN_PRIORITY_LOW, 2 = QNN_PRIORITY_NORMAL(Default),
3 = QNN_PRIORITY_NORMAL_HIGH, 4 = QNN_PRIORITY_HIGH
--cache_dir <cache_dir path> Path of directory to save or restore with cache data.
--model_token <token> Unique token with model cache.
--verbose <verbose if 1, else 0> Print verbose inference messages.
--profiling <profiling> 0 = no profiling, 1 = basic profiling, 2 = detailed profiling, 3 = linting profiling.
--profiling_output_dir <output binary path> Path to output binary file.
--htp_device_id <htp device id> Set htp device id
If the SoC has more than 1 HTP device, you can choose a device by the device id.
--help Show help message with all possible arguments.
An example input file list is shown below, demonstrating the format needed for inputs to qtld-net-run:
inputs/1.raw
inputs/2.raw
inputs/3.raw
inputs/4.raw
inputs/5.raw
This input list file showcases relative file paths (from where qtld-net-run is being executed) to the preprocessed input tensors on the device. In this example, there are 5 preprocessed input tensors within a folder called inputs in the same directory as the text file list.
Below is another example of an input list for a model with two inputs.
inputs/1_a.raw input/1_b.raw
inputs/2_a.raw input/2_b.raw
inputs/3_a.raw input/3_b.raw
inputs/4_a.raw input/4_b.raw
inputs/5_a.raw input/5_b.raw
Like above, each line coincides to one inference. However, the space separated list indicates that two input files should be loaded and passed to the model. The order of the input files, from left to right, should be the same as the order of the input tensors in the TFLite model.
This input list file showcases relative file paths (from where qtld-net-run is being executed) to the preprocessed input tensors on the device. In this example, there are 5 preprocessed input tensors within a folder called inputs in the same directory as the text file list.
Another way to provide the relative file path in the input list is by specifying the input layer name. This can be specified with the below format:
<input_layer_name>:=<input_layer_path>[<space><input_layer_name>:=<input_layer_path>]
[<input_layer_name>:=<input_layer_path>[<space><input_layer_name>:=<input_layer_path>]]
...
Below is an example containing 3 sets of inputs with layer names “Input_1” and “Input_2”, and files located in the relative path “inputs/” and “input/”:
Input_1:=inputs/1_a.raw Input_2:=input/1_b.raw
Input_1:=inputs/2_a.raw Input_2:=input/2_b.raw
Input_1:=inputs/3_a.raw Input_2:=input/3_b.raw
Below is an example of how to run inference on a model through TFLite CPU Runtime:
$ adb shell '/data/local/tmp/qnn_delegate/qtld-net-run \
--model <path to model on device> \
--input <path to input file list> \
--output <output directory path on device>'
Below is an example of how to run inference on a model through a Qualcomm® AI Engine Direct backend with Qualcomm® AI Engine Direct Delegate. Options regarding profiling is optional, default will be no profiling. Note that using the delegate requires setting environment variables like $LD_LIBRARY_PATH or $ADSP_LIBRARY_PATH, as mentioned in On Device Environment Setup.
$ adb shell 'LD_LIBRARY_PATH=/data/local/tmp/qnn_delegate/:$LD_LIBRARY_PATH \
ADSP_LIBRARY_PATH="/data/local/tmp/qnn_delegate/" \
/data/local/tmp/qnn_delegate/qtld-net-run \
--model <path to model on device> \
--input <path to input file list> \
--output <output directory path on device> \
--backend <backend type> \
--profiling <profiling> \
--profiling_output_dir <output binary path> \
--library_path <path to QNN Library on device>'