HTA¶
This section provides information specific to QNN HTA backend.
API Specializations¶
This section contains information related to API specialization for the HTA backend. All QNN HTA
backend specialization is available under <QNN_SDK_ROOT>/include/QNN/HTA/ directory.
The current version of the QNN HTA backend API is:
-
QNN_HTA_API_VERSION_MAJOR 2
-
QNN_HTA_API_VERSION_MINOR 0
-
QNN_HTA_API_VERSION_PATCH 0
QNN HTA Supported Operations¶
QNN HTA supports running quantized 8-bit and quantized 16-bit networks. List of operations supported by QNN HTA Quant runtime can be seen under Backend Support HTA column in Supported Operations
QNN HTA 16-bit Integer Support Limitations¶
To enable 16-bit integer inference, specify quantization bit width of activation to 16 while keeping that of weights to 8. The input/output data format should be defined as 16-bit.
Use
--act_bw 16 --weight_bw 8by QNN converter tools to generate model with 16-bit activations and 8 bit weights.
QNN HTA Performance Infrastructure API¶
Clients can invoke QnnBackend_getPerfInfrastructure after loading the QNN HTA library and then invoke methods that are available in File QnnHtaPerfInfrastructure.h. These APIs allow a client to control the HTA accelerator’s system settings thereby giving fine-grained control of the accelerator. A few use-cases are:
Set up the power mode of the accelerator.