Usage guidelines

Use QnnInterface to access QNN APIs

The QNN Interface is an abstraction that combines all component APIs and provides a single unified interface to the user. This provides a major convenience to users to operate with a fixed set of API hooks through the interface, instead of relying on dynamic symbol lookup using dlsym() calls.

The QNN Interface API is made available to users using what are referred to as ‘API providers’. API providers are based on the publisher-subscriber paradigm with which backends publish access points to APIs they support, and clients subscribe to them dynamically.

QnnInterface_t demonstrates the central unit that holds information about the provider (its name, the backend that publishes it, and the collection of APIs it supports) as function pointers.

The client code should invoke QnnInterface_getProviders() to query all providers that are available on the system, and access individual component APIs through them.

Use initializer macros for struct data types

Initializer macros defined for struct data types in API headers have the recommended default values initialized for various fields. It is recommended that clients use these macros when declaring data structure variables to avoid depending on compiler defaults, which may vary by platform or architecture.

The following code snippet shows a sample usage of initializing Qnn_Tensor_t using the QNN_TENSOR_INIT init macro.

 1// Default initialization using init macro
 2Qnn_Tensor_t tensor          = QNN_TENSOR_INIT;
 3
 4// Update fields with context-specific values
 5tensor.version               = QNN_TENSOR_VERSION_1;
 6tensor.v1.name               = "tensor";
 7tensor.v1.type               = QNN_TENSOR_TYPE_APP_WRITE;
 8tensor.v1.dataType           = QNN_DATATYPE_FLOAT_32;
 9tensor.v1.rank               = rank;
10tensor.v1.dimensions         = dimensions;
11tensor.v1.clientBuf.data     = data;
12tensor.v1.clientBuf.dataSize = numElements * sizeof(float);

Read error codes from QNN API

QNN APIs return error codes bundled into Qnn_ErrorHandle_t. However, users must note from the description of the field that the actual error codes are to be read out from the least significant 16 bits of this 64-bit handle.

Users may use the QNN_GET_ERROR_CODE macro to extract the return code. It is recommended that users always do this when handling return values from QNN APIs. Successful invocation of a QNN API will return QNN_SUCCESS and no additional processing is required in this case.

Multi-Threading

QNN APIs invoked on the same handle (e.g., Qnn_GraphHandle_t), a parent handle (e.g., a Qnn_ContextHandle_t is a parent of a Qnn_GraphHandle_t), or a dependents handle (e.g., a Qnn_ContextHandle_t depends upon a Qnn_DeviceHandle_t) are not thread safe.

Allocate external memory to register with a backend

QNN APIs defined in QnnMem.h provide a mechanism to register externally allocated memory with a backend. Data passed via shared memory can be read from directly, allowing for zero-copy inference, unlike data passed via a raw pointer, which generally may be copied before graph execution.

Currently, ION is the only external memory type supported by QNN. ION is only available on Android platforms. One of the methods to allocate ION memory is via the RPCMem framework provided by the Hexagon SDK.

Specifically, the following RPCMem APIs are required:

  • rpcmem_alloc() – to allocate ION memory.

  • rpcmem_to_fd() – to obtain a file descriptor that refers to this allocated memory, which can be registered with a backend via QnnMem_register().

  • rpcmem_free() – to free the allocated memory.

Refer to the Hexagon SDK documentation for more information about RPCMem.

Qnn_TensorV2_t

The Qnn_TensorV2_t data structure introduces three new features to the Qnn_Tensor_t structure: sparsity, dynamic shape, and tensor production notification.

Warning

Note that in QNN SDK 2.20.0, there is no backend or tool support added for Qnn_TensorV2_t features. Support will begin to be added in QNN SDK 2.21.0 and subsequent releases.

Sparsity

Backend support for the tensor sparsity feature can be queried via QnnProperty_hasCapability using the QNN_PROPERTY_TENSOR_SUPPORT_SPARSITY capability.

Tensor sparsity can be expressed via Qnn_TensorV2_t by setting the Qnn_TensorV2_t::dataFormat field to QNN_TENSOR_DATA_FORMAT_SPARSE along with setting Qnn_TensorV2_t::sparseParams. The Qnn_SparseParams_t data structure is extensible, but only supports a hybrid COO sparse format. This sparse format allows for partially sparse tensors and more details will follow in subsequent releases.

New operations have been added to support networks taking advantage of tensor sparsity: CreateSparse, GetSparseIndices, GetSparseValues, and SparseToDense. Other operations will be updated to support sparsity as needed.

Dynamic dimensions

Backend support for the tensor dynamic shape feature can be queried via QnnProperty_hasCapability using the QNN_PROPERTY_TENSOR_SUPPORT_DYNAMIC_DIMENSIONS capability.

Dynamic shape allows for tensors to change shape from inference to inference. A tensor can be created with dynamic shape by setting the Qnn_TensorV2_t::isDynamicDimensions field to an array of uint8_t values. If a value is non-zero, it indicates that the corresponding dimension has a dynamic size. If a value is zero, the corresponding dimension is statically sized. If the Qnn_TensorV2_t::isDynamicDimensions field is set to NULL, it indicates the entire tensor has a static shape. When a tensor is created by calling QnnTensor_createGraphTensor or QnnTensor_createContextTensor, the Qnn_TensorV2_t::dimensions field is interpreted as the maximum size the dimension can have. When the tensor is provided during a call to QnnGraph_execute, the Qnn_TensorV2_t::dimensions field is interpreted as the actual tensor shape. For dynamically shaped graph input/output tensors, the Qnn_TensorV2_t::dimensions field must be populated with actual dimensions. If the maximum dimensions for a tensor were exceeded during a call to QnnGraph_execute, the QNN_GRAPH_ERROR_DYNAMIC_TENSOR_SHAPE error code will be returned. In this scenario, it is possible that the tensor data is not valid.

Early termination

Backend support for the early termination and output tensor production feature can be queried via QnnProperty_hasCapability using the QNN_PROPERTY_GRAPH_SUPPORT_EARLY_TERMINATION capability.

An operation may have a capability to terminate graph execution early. If a call to QnnGraph_execute terminates early, the QNN_GRAPH_ERROR_EARLY_TERMINATION error code will be returned. If this error code is detected, it is possible that not all graph output tensors were produced. If the Qnn_TensorV2_t::isProduced field is non-zero, the graph output tensor data was produced and can be consumed. If the Qnn_TensorV2_t::isProduced field is zero, the tensor data was not produced and is not valid.

Control graph priority

Backend support for the graph priority control feature can be queried via QnnProperty_hasCapability using the QNN_PROPERTY_GRAPH_SUPPORT_PRIORITY_CONTROL capability.

When a backend supports graph priority control, clients can assign priority configuration during graph creation (using QnnGraph_create) or modify the priority of an existing graph (using QnnGraph_setConfig). In general the priority configuration controls how inference jobs are scheduled and executed. However, the exact results vary for different backends. See the backend-specific documentation for more information about priority and its related behavior.