QNN LPAI Integration¶
This section is intended for developers building applications using the QNN Common API and targeting the LPAI backend Successful integration requires a comprehensive understanding of both QNN and LPAI subsystems, particularly in the areas of memory management and data structure interoperability.
The LPAI backend introduces specific constraints and requirements that differ from other QNN backends. Developers must be familiar with:
Memory Allocation Strategies: LPAI imposes strict limitations on memory usage, necessitating precise control over buffer allocation, alignment, and lifecycle.
Understanding how QNN interacts with LPAI’s memory model is critical for avoiding runtime errors, crashes and optimizing performance.
LPAI-Specific Data Structures and Enumerations: The LPAI API defines a set of custom data types, enumerations, and configuration parameters that must be correctly instantiated and passed to QNN interfaces.
These include tensor descriptors, execution contexts, and backend-specific metadata.
For detailed guidance, refer to the following sections:
Proper integration ensures compatibility, stability, and optimal performance of your application when deployed on LPAI-enabled hardware.
QNN LPAI Memory Allocations¶
There are three types of memory pools used by the LPAI runtime:
Each serves a distinct purpose in managing memory during network execution.
Scratch Memory¶
Scratch memory is used to hold intermediate results during network execution that can be reused (i.e., overwritten). This memory is essential for optimizing performance and minimizing memory footprint during inference.
Key characteristics:
The user must allocate this memory pool by querying the QNN API for the scratch memory requirements specific to their model.
The allocated memory is passed into the LPAI Backend.
All tensors using scratch memory are memory-planned offline, ensuring proper alignment and efficient access.
Querying Scratch Memory Requirements¶
The following code snippet demonstrates how to query the required scratch memory size using the QNN LPAI API:
1// Create QNN LPAI custom property
2QnnLpaiGraph_CustomProperty_t customGraphProp;
3customGraphProp.option = QNN_LPAI_GRAPH_GET_PROP_SCRATCH_MEM_SIZE;
4customGraphProp.property = scratchSize;
5
6// Create QNN property
7QnnGraph_Property_t graphProp;
8graphProp.option = QNN_GRAPH_PROPERTY_OPTION_CUSTOM;
9graphProp.customProperty = &customGraphProp;
10
11// Prepare property pointer array
12QnnGraph_Property_t *graphPropPtrs[2] = {0}; // graphPropPtrs[1] is nullptr
13graphPropPtrs[0] = &graphProp;
14
15// Query the graph for scratch memory size
16QnnGraph_getProperty(graphHandle, graphPropPtrs);
Allocating and Configuring Scratch Memory¶
Once the memory requirements are retrieved, it is the user’s responsibility to allocate the memory and pass the pointer back to the backend using the QnnGraph_setConfig() API:
1// Create LPAI memory configuration
2QnnLpaiGraph_Mem_t lpaiGraphMem;
3lpaiGraphMem.memType = memType;
4lpaiGraphMem.size = scratchSize;
5lpaiGraphMem.addr = scratchBuffer;
6
7// Create QNN LPAI custom config
8QnnLpaiGraph_CustomConfig_t customGraphCfg;
9customGraphCfg.option = QNN_LPAI_GRAPH_SET_CFG_SCRATCH_MEM;
10customGraphCfg.config = &lpaiGraphMem;
11
12// Create QNN config
13QnnGraph_Config_t graphConfig;
14graphConfig.option = QNN_GRAPH_CONFIG_OPTION_CUSTOM;
15graphConfig.customConfig = &customGraphCfg;
16
17// Prepare config pointer array
18QnnGraph_Config_t *graphCfgPtrs[2] = {0}; // graphCfgPtrs[1] is nullptr
19graphCfgPtrs[0] = &graphConfig;
20
21// Set the configuration for the graph
22QnnGraph_setConfig(graphHandle, (const QnnGraph_Config_t **)graphCfgPtrs);
Explanation:
QnnLpaiGraph_CustomProperty_t is used to specify the custom property type for LPAI.
QNN_LPAI_GRAPH_GET_PROP_SCRATCH_MEM_SIZE is the option used to request the scratch memory size.
The graphPropPtrs array is passed to QnnGraph_getProperty() to retrieve the required memory size.
The retrieved scratchSize is used to allocate memory, which is then passed
Persistent Memory¶
Persistent memory holds intermediate results that cannot be reused during execution (i.e., they persist across operations). This type of memory is essential for maintaining state across time steps or layers in models such as RNNs.
Key characteristics:
A typical example is the RNN operator, where tensors store the previous state.
Like scratch memory, the user must allocate this pool by querying the QNN API for persistent memory requirements.
These tensors are also memory-planned offline with proper alignment to ensure efficient access.
Querying Persistent Memory Requirements¶
The following code snippet demonstrates how to query the required persistent memory size using the QNN LPAI API:
1// Create QNN LPAI custom property
2QnnLpaiGraph_CustomProperty_t customGraphProp;
3customGraphProp.option = QNN_LPAI_GRAPH_GET_PROP_PERSISTENT_MEM_SIZE;
4customGraphProp.property = persistentSize;
5
6// Create QNN property
7QnnGraph_Property_t graphProp;
8graphProp.option = QNN_GRAPH_PROPERTY_OPTION_CUSTOM;
9graphProp.customProperty = &customGraphProp;
10
11// Prepare property pointer array
12QnnGraph_Property_t *graphPropPtrs[2] = {0}; // graphPropPtrs[1] is nullptr
13graphPropPtrs[0] = &graphProp;
14
15// Query the graph for persistent memory size
16QnnGraph_getProperty(graphHandle, graphPropPtrs);
Allocating and Configuring Persistent Memory¶
Once the memory requirements are retrieved, it is the user’s responsibility to allocate the memory and pass the pointer back to the backend using the QnnGraph_setConfig() API:
1// Create LPAI memory configuration
2QnnLpaiGraph_Mem_t lpaiGraphMem;
3lpaiGraphMem.memType = memType;
4lpaiGraphMem.size = persistentSize;
5lpaiGraphMem.addr = persistentBuffer;
6
7// Create QNN LPAI custom config
8QnnLpaiGraph_CustomConfig_t customGraphCfg;
9customGraphCfg.option = QNN_LPAI_GRAPH_SET_CFG_PERSISTENT_MEM;
10customGraphCfg.config = &lpaiGraphMem;
11
12// Create QNN config
13QnnGraph_Config_t graphConfig;
14graphConfig.option = QNN_GRAPH_CONFIG_OPTION_CUSTOM;
15graphConfig.customConfig = &customGraphCfg;
16
17// Prepare config pointer array
18QnnGraph_Config_t *graphCfgPtrs[2] = {0}; // graphCfgPtrs[1] is nullptr
19graphCfgPtrs[0] = &graphConfig;
20
21// Set the configuration for the graph
22QnnGraph_setConfig(graphHandle, (const QnnGraph_Config_t **)graphCfgPtrs);
Explanation:
QnnLpaiGraph_CustomProperty_t is used to define a custom property specific to LPAI.
QNN_LPAI_GRAPH_GET_PROP_PERSISTENT_MEM_SIZE is the option used to request the persistent memory size.
The QnnGraph_getProperty() function retrieves the required size, which is then used to allocate memory.
QnnGraph_setConfig() is used to pass the allocated memory back to the backend before finalizing the graph.
Get Memory Alignment Requirements¶
Before passing memory buffers to the LPAI Backend, the starting address must be correctly aligned. This ensures compatibility with hardware requirements and optimal performance.
To retrieve the alignment requirements for memory buffers, use the following QNN LPAI API call:
1QnnLpaiBackend_BufferAlignmentReq_t bufferAlignmentReq;
2
3// Create QNN LPAI backend custom property
4QnnLpaiBackend_CustomProperty_t customBackendProp;
5customBackendProp.option = QNN_LPAI_BACKEND_GET_PROP_ALIGNMENT_REQ;
6customBackendProp.property = &bufferAlignmentReq;
7
8// Create QNN property
9QnnBackend_Property_t backendProp;
10backendProp.option = QNN_BACKEND_PROPERTY_OPTION_CUSTOM;
11backendProp.customProperty = &customBackendProp;
12
13// Prepare property pointer array
14QnnBackend_Property_t *backendPropPtrs[2] = {0}; // backendPropPtrs[1] is nullptr
15backendPropPtrs[0] = &backendProp;
16
17// Query the backend for alignment requirements
18QnnBackend_getProperty(backendHandle, backendPropPtrs);
19
20if (!error) {
21 *startAddrAlignment = bufferAlignmentReq.startAddrAlignment;
22 *sizeAlignment = bufferAlignmentReq.sizeAlignment;
23}
Explanation:
QnnLpaiBackend_BufferAlignmentReq_t holds the alignment requirements for memory buffers.
QNN_LPAI_BACKEND_GET_PROP_ALIGNMENT_REQ is the custom property option used to query alignment constraints.
The QnnBackend_getProperty() function retrieves the alignment values, which are then stored in startAddrAlignment and sizeAlignment.
These values must be respected when allocating memory buffers for input, output, scratch, or persistent memory.
IO Memory¶
IO memory contains the input and output tensors.
This memory can be user-provided or planned into the scratch memory pool.
By default, input/output tensors are planned into scratch memory.
If the user provides the input/output buffer, the starting address must be correctly aligned before passing it to the LPAI Backend.
Allocations¶
Both persistent and scratch memory buffers must be provided to LPAI before calling QnnGraph_finalize().
These buffers must remain accessible for the entire lifetime of the LPAI instance, until QnnContext_free(Context) is called.
The scratch memory buffer may be replaced during runtime, but there must always be an accessible buffer available.
QNN LPAI Data Structures and Enumerations¶
QnnBackend_Property_t¶
This structure provides backend property. This data structure is defined in QnnBackend header file present at <QNN_SDK_DIR>/include/QNN/.
Parameters |
Desctiption |
|---|---|
QnnBackend_PropertyOption_t option |
Option is used by clients to set or get any backend property. |
QnnBackend_CustomProperty_t customProperty |
Pointer to the backend property requested by client. |
QnnLpaiBackend_GetPropertyOption_t¶
This enum contains the set of properties supported by the LPAI backend. Objects of this type are to be referenced through QnnBackend_CustomProperty_t.
This enum is defined in QnnLpaiBackend header file present at <QNN_SDK_DIR>/include/QNN/LPAI/.
Property |
Desctiption |
|---|---|
QNN_LPAI_BACKEND_GET_PROP_ALIGNMENT_REQ |
Used to get the start address alignment and size alignment requirement of buffers. Struct: QnnLpaiBackend_BufferAlignmentReq_t |
QNN_LPAI_BACKEND_GET_PROP_REQUIRE_PERSISTENT_BINARY |
Used to query if cached binary buffer needs to be
persistent until |
QNN_LPAI_BACKEND_GET_PROP_UNDEFINED |
Unused |
QnnContext_Config_t¶
The QnnContext_ConfigOption_t structure provides context configuration. This data structure is defined in QnnContext header file present at <QNN_SDK_DIR>/include/QNN/.
Parameters |
Desctiption |
|---|---|
QnnContext_ConfigOption_t option |
Provides option to set context configs. See QnnContext_ConfigOption_t |
uint8_t isPersistentBinary |
Used with QNN_CONTEXT_CONFIG_PERSISTENT_BINARY |
QnnContext_ConfigOption_t¶
This enum defines context config options. This enum has multiple options, but the following option is specific to QNN-LPAI BE.
This enum is defined in QnnContext header file present at <QNN_SDK_DIR>/include/QNN/.
Property |
Desctiption |
|---|---|
QNN_CONTEXT_CONFIG_PERSISTENT_BINARY |
Indicates that the context binary pointer is
available during |
QnnLpaiDevice_DeviceInfoExtension_t¶
QnnDevice_getPlatformInfo() uses this structure to list the supported device features/information.
This data structure is defined in QnnLpaiDevice header file present at <QNN_SDK_DIR>/include/QNN/LPAI/
Parameters |
Desctiption |
|---|---|
uint32_t socModel |
An enum value defined in Qnn Header that represents SoC model |
uint32_t arch |
It shows the architecture of the device |
const char* domainName |
It shows the domain name of the device |
QnnLpaiGraph_Mem_t¶
QnnGraph_setConfig() API used this structure to set custom configs for scratch and persistent buffer.
This data structure is defined in QnnLpaiGraph header file present at <QNN_SDK_DIR>/include/QNN/LPAI.
Parameters |
Desctiption |
|---|---|
QnnLpaiMem_MemType_t memType |
An enum value defined in enum QnnLpaiMem_MemType_t to memory type of buffer. |
uint32_t size |
Size of buffer |
void* addr |
Pointer to buffer |
QnnLpaiMem_MemType_t¶
This enum contains memory type supported by LPAI backend.
This enum is defined in QnnLpaiMem header file present at <QNN_SDK_DIR>/include/QNN/LPAI.
Property |
Desctiption |
|---|---|
QNN_LPAI_MEM_TYPE_DDR |
Main memory, only available in non-island mode |
QNN_LPAI_MEM_TYPE_LLC |
Last level cache |
QNN_LPAI_MEM_TYPE_TCM |
Tightly coupled memory for hardware |
QNN_LPAI_MEM_TYPE_UNDEFINED |
Unused |
QnnGraph_Config_t¶
This structure provides graph configuration.
This data structure is declared in QnnLpaiGraph header file present at <QNN_SDK_DIR>/include/QNN/.
Parameters |
Desctiption |
|---|---|
QnnGraph_ConfigOption_t option |
An enum value defined in |
QnnGraph_CustomConfig_t customConfig |
Pointer to custom graph configs |
QnnLpaiGraph_CustomConfig_t¶
This structure is used by QnnGraph_setConfig() to set backend specific configurations before finalizing the graph.
This data structure is declared in QnnLpaiGraph header file present at <QNN_SDK_DIR>/include/QNN/LPAI/.
Parameters |
Desctiption |
|---|---|
uint32_t option |
An enum value defined in QnnLpaiGraph_SetConfigOption_t set backend specific configs to graph |
QnnLpaiGraph_SetConfigOption_t¶
This enum contains custom configs for LPAI backend graph.
This enum is defined in QnnLpaiGraph header file present at <QNN_SDK_DIR>/include/QNN/LPAI.
Property |
Desctiption |
|---|---|
QNN_LPAI_GRAPH_SET_CFG_SCRATCH_MEM |
Used to set scratch memory configs. Struct: QnnLpaiGraph_Mem_t |
QNN_LPAI_GRAPH_SET_CFG_PERSISTENT_MEM |
Used to set persistent memory configs. Struct: QnnLpaiGraph_Mem_t |
QNN_LPAI_GRAPH_SET_CFG_PERF_CFG |
Used to set custom client perf configs. Struct: QnnLpaiGraph_PerfCfg_t |
QNN_LPAI_GRAPH_SET_CFG_CORE_AFFINITY |
Used to set core affinity configs. Struct: QnnLpaiGraph_CoreAffinity_t |
QNN_LPAI_GRAPH_SET_CFG_UNDEFINED |
Unused |
QnnLpaiBackend_BufferAlignmentReq_t¶
This structure contains parameters needed to align the start address of buffer and size of buffer.
This data structure is declared in QnnLpaiBackend header file present at <QNN_SDK_DIR>/include/QNN/LPAI/.
Parameters |
Desctiption |
|---|---|
uint32_t startAddrAlignment |
Represents start address alignment of buffer. The start address of the buffer must be startAddrAlignment-byte aligned |
uint32_t sizeAlignment |
Represents buffer size alignment. The allocated buffer must be a multiple of sizeAlignment bytes |
QnnLpaiGraph_CustomProperty_t¶
This structure is used by QnnGraph_getProperty() API to get backend specific configurations.
This data structure is defined in QnnLpaiGraph header file present at <QNN_SDK_DIR>/include/QNN/LPAI/.
Parameters |
Desctiption |
|---|---|
uint32_t option |
An enum value defined in enum QnnLpaiGraph_GetPropertyOption_t to retrieve backend specific property. |
void* property |
Pointer to custom property |
QnnLpaiGraph_GetPropertyOption_t¶
This enum contains the set of properties supported by the LPAI backend. Objects of this type are to be referenced through QnnLpaiGraph_CustomProperty_t.
This enum is defined in QnnLpaiGraph header file present at <QNN_SDK_DIR>/include/QNN/LPAI/.
Property |
Desctiption |
|---|---|
QNN_LPAI_GRAPH_GET_PROP_SCRATCH_MEM_SIZE |
Get the size requirement of scratch memory |
QNN_LPAI_GRAPH_GET_PROP_PERSISTENT_MEM_SIZE |
Get the size requirement of persistent memory |
QNN_LPAI_GRAPH_GET_PROP_UNDEFINED |
Unused |
QnnLpaiGraph_CoreAffinity_t¶
This structure is used by QnnGraph_getProperty() to get backend specific configurations.
This data structure is defined in QnnLpaiGraph header file present at <QNN_SDK_DIR>/include/QNN/LPAI/.
Parameters |
Desctiption |
|---|---|
QnnLpaiGraph_CoreAffinityType_t affinity |
Used to set the affinity of selected eNPU core QnnLpaiGraph_CoreAffinityType_t |
uint32_t coreSelection |
Pointer to custom property |
QnnLpaiGraph_CoreAffinityType_t¶
This enum contains the possible set of affinities supported by eNPU HW.
This enum is defined in QnnLpaiGraph header file present at <QNN_SDK_DIR>/include/QNN/LPAI/.
Property |
Desctiption |
|---|---|
QNN_LPAI_GRAPH_CORE_AFFINITY_SOFT |
Used to set affinity to soft. Struct: QnnLpaiGraph_CoreAffinity_t. |
QNN_LPAI_GRAPH_CORE_AFFINITY_HARD |
Used to set affinity to hard Struct: QnnLpaiGraph_CoreAffinity_t. |
QNN_LPAI_GRAPH_CORE_AFFINITY_UNDEFINED |
Unused |
QnnLpaiGraph_PerfCfg_t¶
This structure is used to set Client’s performance requirement for eNPU Usage. User can configure it before finalizing the graph.
This data structure is declared in QnnLpaiGraph header file present at <QNN_SDK_DIR>/include/QNN/LPAI/.
Parameters |
Desctiption |
|---|---|
uint32_t fps |
Used to set frame per second (fps) |
uint32_t ftrtRatio |
Used to set FTRT ratio |
QnnLpaiGraph_ClientPerfType_t clientType |
Used to set client type (Real time or Non-real time) enum: QnnLpaiGraph_ClientPerfType_t |
QnnLpaiGraph_ClientPerfType_t¶
This enum contains the type of client which can be configured by user before finalizing the graph.
This data structure is declared in QnnLpaiGraph header file present at <QNN_SDK_DIR>/include/QNN/LPAI/.
Property |
Desctiption |
|---|---|
QNN_LPAI_GRAPH_CLIENT_PERF_TYPE_REAL_TIME |
Used to set client as REAL TIME. Struct: QnnLpaiGraph_PerfCfg_t. |
QNN_LPAI_GRAPH_CLIENT_PERF_TYPE_NON_REAL_TIME |
Used to set client as NON-REAL TIME Struct: QnnLpaiGraph_PerfCfg_t. |
QNN_LPAI_GRAPH_CLIENT_PERF_TYPE__UNDEFINED |
Unused |
QNN API Call Flow¶
The integration of a QNN model using the LPAI backend follows a structured three-phase process. Each phase is critical to ensuring the model is correctly initialized, executed, and deinitialized within the QNN runtime environment.
Initialization¶
The initialization phase prepares the QNN runtime and the LPAI backend for model execution. This phase ensures that all required interfaces, memory resources, and configurations are correctly established before inference begins. It consists of the following key steps:
Interface Extraction
Retrieve the necessary interfaces to interact with the QNN runtime and the LPAI backend:
LPAI Backend Interface
Use
QnnInterface_getProviders()to enumerate available backend providers.Identify the LPAI backend using the backend ID
QNN_LPAI_BACKEND_ID.This interface is essential for accessing backend-specific APIs and properties.
QNN System Interface
Use
QnnSystemInterface_getProviders()to obtain system-level interfaces.Provides APIs for managing contexts, graphs, and binary metadata.
Handle Creation
Create runtime handles to manage backend and system-level resources:
Backend Handle: Created using
QnnBackend_create(), this handle manages backend-specific operations.System Context Handle: Created using
QnnSystemContext_create(), this handle manages system-level context and graph lifecycle.
Buffer Alignment Query
Query memory alignment requirements to ensure compatibility with the backend:
Use
QnnBackend_getProperty()withQNN_LPAI_BACKEND_GET_PROP_ALIGNMENT_REQ.Retrieve:
Start Address Alignment: Required alignment for buffer base addresses.
Buffer Size Alignment: Required alignment for buffer sizes.
Proper alignment is critical for correctness on hardware accelerators.
Memory Allocation for Context Binary
Allocate memory for the context binary, ensuring:
Alignment constraints are met.
Memory is allocated from the appropriate pool (e.g., Island or Non-Island memory).
Context Creation from Binary
Instantiate the QNN context using
QnnContext_createFromBinary():The context is immutable and encapsulates the model structure, metadata, and backend configuration.
This step effectively loads the model into the runtime.
Platform-specific configuration requirements:
Island Use Case: Pass the custom configuration
QNN_LPAI_CONTEXT_SET_CFG_ENABLE_ISLANDto enable island execution.Native ADSP Path: Use the common configuration
QNN_CONTEXT_CONFIG_PERSISTENT_BINARYto enable persistent binary support.FastRPC Path: No additional configuration is required.
Graph Metadata Retrieval
Use
QnnSystemContext_getBinaryInfo()to extract metadata embedded in the binary:Graph names
Versioning information
Backend-specific metadata
Graph Retrieval
Retrieve the graph handle using
QnnGraph_retrieve():Pass the graph name obtained in the previous step.
The graph handle is used for further configuration and execution.
Note
The following steps are specific to the Hexagon (aDSP) LPAI backend and are required for proper memory and performance configuration.
Scratch and Persistent Memory Allocation
Query memory requirements using
QnnGraph_getProperty():QNN_LPAI_GRAPH_GET_PROP_SCRATCH_MEM_SIZE: Temporary memory used during inference.QNN_LPAI_GRAPH_GET_PROP_PERSISTENT_MEM_SIZE: Memory required across multiple inferences.
Allocate memory accordingly, ensuring alignment and memory pool selection.
Memory Configuration
Configure the graph with allocated memory using
QnnGraph_setConfig():QNN_LPAI_GRAPH_SET_CFG_SCRATCH_MEMQNN_LPAI_GRAPH_SET_CFG_PERSISTENT_MEM
This step binds the allocated memory to the graph for runtime use.
See QNN LPAI Memory Allocations for more details.
Performance and Core Affinity Configuration
Optimize execution by configuring:
Performance Profile:
QNN_LPAI_GRAPH_SET_CFG_PERF_CFG(e.g., balanced, high-performance, low-power)Core Affinity:
QNN_LPAI_GRAPH_SET_CFG_CORE_AFFINITY(e.g., assign execution to specific DSP cores)
These settings help balance performance and power consumption.
Client Priority Configuration
Set the execution priority of the graph using:
QnnGraph_setConfig(QNN_GRAPH_CONFIG_OPTION_PRIORITY)
This is useful in multi-client or multi-graph environments where scheduling priority matters.
Graph Finalization
Finalize the graph using
QnnGraph_finalize():Locks the graph configuration.
Prepares internal structures for execution.
Must be called before any inference is performed.
Tensor Allocation
Retrieve and prepare input/output tensors:
Use
QnnGraph_getInputTensors()andQnnGraph_getOutputTensors().Set tensor type to
QNN_TENSORTYPE_RAW.Allocate and bind client buffers to each tensor.
Proper tensor setup ensures correct data flow during inference.
LPAI Initialization Call Flow
Execution¶
The execution phase is responsible for running inference using the finalized QNN graph. This phase is typically repeated for each inference request and involves the following steps:
Input Buffer Preparation
Populate the input tensors with data from the client application.
Ensure that the data format, dimensions, and layout match the model’s input specification.
Input tensors must be bound to client-allocated buffers, typically of type
QNN_TENSORTYPE_RAW.
Graph Execution
Invoke the model using
QnnGraph_execute().This function triggers the execution of the graph on the target hardware (e.g., eNPU).
The execution is synchronous; the function returns only after inference is complete.
Execution Flow:
Input data is transferred to the backend.
The backend schedules and executes the graph operations.
Intermediate results are computed and stored in backend-managed memory.
Final outputs are written to the output buffers.
Output Retrieval
After execution, output tensors contain the inference results.
These results are available in the client-provided output buffers.
The application can now post-process or consume the output data as needed.
Optional: Profiling and Logging
If profiling is enabled (via –profiling_level), performance data is collected during execution.
Profiling logs are written to the output directory and can be visualized using qnn-profile-viewer.
Error Handling
Check the return status of
QnnGraph_execute().Handle any runtime errors, such as invalid inputs, memory access violations, or hardware faults.
Important
Input and output buffers must remain valid and accessible throughout the execution.
Ensure that memory alignment and size requirements are met to avoid execution failures.
LPAI Execution Call Flow
Deinitialization¶
The deinitialization phase is responsible for releasing all resources allocated during the initialization and execution phases. Proper deinitialization ensures that memory is freed, handles are closed, and the system is left in a clean state. This is especially important in embedded or resource-constrained environments.
The following steps outline the deinitialization process:
Release QNN Context Handle
Call
QnnContext_free()to release the context created viaQnnContext_createFromBinary().This step invalidates the context and all associated graph handles.
Release LPAI Backend Handle
Call
QnnBackend_free()to release the backend handle created during initialization.This step ensures that backend-specific resources (e.g., device memory, threads) are properly cleaned up.
Release QNN System Context Handle
Call
QnnSystemContext_free()to release the system context.This step finalizes the system-level interface and releases any associated metadata or configuration.
Free Scratch and Persistent Memory
If memory was allocated manually for scratch and persistent buffers (e.g., on Hexagon aDSP), it must be explicitly freed.
These buffers are typically allocated based on properties queried via
QnnGraph_getProperty().
Free Input and Output Tensors
Release memory associated with input and output tensors.
This includes: - Client-allocated buffers bound to tensors - Any metadata or auxiliary structures used for tensor management
Optional: Logging and Diagnostics Cleanup
If profiling or logging was enabled, ensure that any open file handles or logging streams are closed.
Optionally, flush logs or export profiling data before shutdown.
Important
All deinitialization steps must be performed in the reverse order of initialization to avoid resource leaks or undefined behavior.
Failure to properly deinitialize may result in memory leaks, dangling pointers, or device instability.
LPAI Deinitialization Call Flow