QNN LPAI Memory Allocations¶
There are three types of memory pools used by the LPAI runtime:
Each serves a distinct purpose in managing memory during network execution.
Scratch Memory¶
Scratch memory is used to hold intermediate results during network execution that can be reused (i.e., overwritten). This memory is essential for optimizing performance and minimizing memory footprint during inference.
Key characteristics:
The user must allocate this memory pool by querying the QNN API for the scratch memory requirements specific to their model.
The allocated memory is passed into the LPAI Backend.
All tensors using scratch memory are memory-planned offline, ensuring proper alignment and efficient access.
Querying Scratch Memory Requirements¶
The following code snippet demonstrates how to query the required scratch memory size using the QNN LPAI API:
1// Create QNN LPAI custom property
2QnnLpaiGraph_CustomProperty_t customGraphProp;
3customGraphProp.option = QNN_LPAI_GRAPH_GET_PROP_SCRATCH_MEM_SIZE;
4customGraphProp.property = scratchSize;
5
6// Create QNN property
7QnnGraph_Property_t graphProp;
8graphProp.option = QNN_GRAPH_PROPERTY_OPTION_CUSTOM;
9graphProp.customProperty = &customGraphProp;
10
11// Prepare property pointer array
12QnnGraph_Property_t *graphPropPtrs[2] = {0}; // graphPropPtrs[1] is nullptr
13graphPropPtrs[0] = &graphProp;
14
15// Query the graph for scratch memory size
16QnnGraph_getProperty(graphHandle, graphPropPtrs);
Allocating and Configuring Scratch Memory¶
Once the memory requirements are retrieved, it is the user’s responsibility to allocate the memory and pass the pointer back to the backend using the QnnGraph_setConfig() API:
1// Create LPAI memory configuration
2QnnLpaiGraph_Mem_t lpaiGraphMem;
3lpaiGraphMem.memType = memType;
4lpaiGraphMem.size = scratchSize;
5lpaiGraphMem.addr = scratchBuffer;
6
7// Create QNN LPAI custom config
8QnnLpaiGraph_CustomConfig_t customGraphCfg;
9customGraphCfg.option = QNN_LPAI_GRAPH_SET_CFG_SCRATCH_MEM;
10customGraphCfg.config = &lpaiGraphMem;
11
12// Create QNN config
13QnnGraph_Config_t graphConfig;
14graphConfig.option = QNN_GRAPH_CONFIG_OPTION_CUSTOM;
15graphConfig.customConfig = &customGraphCfg;
16
17// Prepare config pointer array
18QnnGraph_Config_t *graphCfgPtrs[2] = {0}; // graphCfgPtrs[1] is nullptr
19graphCfgPtrs[0] = &graphConfig;
20
21// Set the configuration for the graph
22QnnGraph_setConfig(graphHandle, (const QnnGraph_Config_t **)graphCfgPtrs);
Explanation:
QnnLpaiGraph_CustomProperty_t is used to specify the custom property type for LPAI.
QNN_LPAI_GRAPH_GET_PROP_SCRATCH_MEM_SIZE is the option used to request the scratch memory size.
The graphPropPtrs array is passed to QnnGraph_getProperty() to retrieve the required memory size.
The retrieved scratchSize is used to allocate memory, which is then passed
Persistent Memory¶
Persistent memory holds intermediate results that cannot be reused during execution (i.e., they persist across operations). This type of memory is essential for maintaining state across time steps or layers in models such as RNNs.
Key characteristics:
A typical example is the RNN operator, where tensors store the previous state.
Like scratch memory, the user must allocate this pool by querying the QNN API for persistent memory requirements.
These tensors are also memory-planned offline with proper alignment to ensure efficient access.
Querying Persistent Memory Requirements¶
The following code snippet demonstrates how to query the required persistent memory size using the QNN LPAI API:
1// Create QNN LPAI custom property
2QnnLpaiGraph_CustomProperty_t customGraphProp;
3customGraphProp.option = QNN_LPAI_GRAPH_GET_PROP_PERSISTENT_MEM_SIZE;
4customGraphProp.property = persistentSize;
5
6// Create QNN property
7QnnGraph_Property_t graphProp;
8graphProp.option = QNN_GRAPH_PROPERTY_OPTION_CUSTOM;
9graphProp.customProperty = &customGraphProp;
10
11// Prepare property pointer array
12QnnGraph_Property_t *graphPropPtrs[2] = {0}; // graphPropPtrs[1] is nullptr
13graphPropPtrs[0] = &graphProp;
14
15// Query the graph for persistent memory size
16QnnGraph_getProperty(graphHandle, graphPropPtrs);
Allocating and Configuring Persistent Memory¶
Once the memory requirements are retrieved, it is the user’s responsibility to allocate the memory and pass the pointer back to the backend using the QnnGraph_setConfig() API:
1// Create LPAI memory configuration
2QnnLpaiGraph_Mem_t lpaiGraphMem;
3lpaiGraphMem.memType = memType;
4lpaiGraphMem.size = persistentSize;
5lpaiGraphMem.addr = persistentBuffer;
6
7// Create QNN LPAI custom config
8QnnLpaiGraph_CustomConfig_t customGraphCfg;
9customGraphCfg.option = QNN_LPAI_GRAPH_SET_CFG_PERSISTENT_MEM;
10customGraphCfg.config = &lpaiGraphMem;
11
12// Create QNN config
13QnnGraph_Config_t graphConfig;
14graphConfig.option = QNN_GRAPH_CONFIG_OPTION_CUSTOM;
15graphConfig.customConfig = &customGraphCfg;
16
17// Prepare config pointer array
18QnnGraph_Config_t *graphCfgPtrs[2] = {0}; // graphCfgPtrs[1] is nullptr
19graphCfgPtrs[0] = &graphConfig;
20
21// Set the configuration for the graph
22QnnGraph_setConfig(graphHandle, (const QnnGraph_Config_t **)graphCfgPtrs);
Explanation:
QnnLpaiGraph_CustomProperty_t is used to define a custom property specific to LPAI.
QNN_LPAI_GRAPH_GET_PROP_PERSISTENT_MEM_SIZE is the option used to request the persistent memory size.
The QnnGraph_getProperty() function retrieves the required size, which is then used to allocate memory.
QnnGraph_setConfig() is used to pass the allocated memory back to the backend before finalizing the graph.
Get Memory Alignment Requirements¶
Before passing memory buffers to the LPAI Backend, the starting address must be correctly aligned. This ensures compatibility with hardware requirements and optimal performance.
To retrieve the alignment requirements for memory buffers, use the following QNN LPAI API call:
1QnnLpaiBackend_BufferAlignmentReq_t bufferAlignmentReq;
2
3// Create QNN LPAI backend custom property
4QnnLpaiBackend_CustomProperty_t customBackendProp;
5customBackendProp.option = QNN_LPAI_BACKEND_GET_PROP_ALIGNMENT_REQ;
6customBackendProp.property = &bufferAlignmentReq;
7
8// Create QNN property
9QnnBackend_Property_t backendProp;
10backendProp.option = QNN_BACKEND_PROPERTY_OPTION_CUSTOM;
11backendProp.customProperty = &customBackendProp;
12
13// Prepare property pointer array
14QnnBackend_Property_t *backendPropPtrs[2] = {0}; // backendPropPtrs[1] is nullptr
15backendPropPtrs[0] = &backendProp;
16
17// Query the backend for alignment requirements
18QnnBackend_getProperty(backendHandle, backendPropPtrs);
19
20if (!error) {
21 *startAddrAlignment = bufferAlignmentReq.startAddrAlignment;
22 *sizeAlignment = bufferAlignmentReq.sizeAlignment;
23}
Explanation:
QnnLpaiBackend_BufferAlignmentReq_t holds the alignment requirements for memory buffers.
QNN_LPAI_BACKEND_GET_PROP_ALIGNMENT_REQ is the custom property option used to query alignment constraints.
The QnnBackend_getProperty() function retrieves the alignment values, which are then stored in startAddrAlignment and sizeAlignment.
These values must be respected when allocating memory buffers for input, output, scratch, or persistent memory.
IO Memory¶
IO memory contains the input and output tensors.
This memory can be user-provided or planned into the scratch memory pool.
By default, input/output tensors are planned into scratch memory.
If the user provides the input/output buffer, the starting address must be correctly aligned before passing it to the LPAI Backend.
Allocations¶
Both persistent and scratch memory buffers must be provided to LPAI before calling QnnGraph_finalize().
These buffers must remain accessible for the entire lifetime of the LPAI instance, until QnnContext_free(Context) is called.
The scratch memory buffer may be replaced during runtime, but there must always be an accessible buffer available.