HTA

This section provides information specific to QNN HTA backend.

API Specializations

This section contains information related to API specialization for the HTA backend. All QNN HTA backend specialization is available under <QNN_SDK_ROOT>/include/QNN/HTA/ directory.

The current version of the QNN HTA backend API is:

QNN_HTA_API_VERSION_MAJOR 2
QNN_HTA_API_VERSION_MINOR 0
QNN_HTA_API_VERSION_PATCH 0

QNN HTA Supported Operations

QNN HTA supports running quantized 8-bit and quantized 16-bit networks. List of operations supported by QNN HTA Quant runtime can be seen under Backend Support HTA column in Supported Operations

QNN HTA 16-bit Integer Support Limitations

To enable 16-bit integer inference, specify quantization bit width of activation to 16 while keeping that of weights to 8. The input/output data format should be defined as 16-bit.

  • Use --act_bw 16 --weight_bw 8 by QNN converter tools to generate model with 16-bit activations and 8 bit weights.

QNN HTA Performance Infrastructure API

Clients can invoke QnnBackend_getPerfInfrastructure after loading the QNN HTA library and then invoke methods that are available in File QnnHtaPerfInfrastructure.h. These APIs allow a client to control the HTA accelerator’s system settings thereby giving fine-grained control of the accelerator. A few use-cases are:

  1. Set up the power mode of the accelerator.