HTP VTCM Sharing¶
Warning
This feature is only enabled on Hexagon V73 onwards and select V69 SoCs. Sharing is impossible on previous platforms (such as V68, most V69, etc).
The VTCM sharing feature allows two entities that both want access to VTCM at the same time to divide the resource between each other.
Prior to the enablement of this feature, two threads (one being a QNN thread and the other a non-QNN thread) in the same process were unable to share the VTCM resource at the same time. Even if both threads requested page sizes that could fit side-by-side in VTCM (example: 2 x 4MB pages on an 8MB chip), QNN thread would always allocate the maximum VTCM size for each page which made sharing impossible prior to the enablement of this feature
With the new VTCM sharing feature, it is possible for another entity to share a subset of VTCM alongside QNN. Please note that QNN performance is very senitive to the amount of VTCM available.
Users of this feature will be able to:
Have a long-lived entity coexist with QNN.
Have another entity pass data to or from QNN through VTCM for an inference, reducing the number of copies to or from DDR.
HAP Compute Resource Manager API Shorthand¶
It is expected that users of this feature are fully familiar with the HAP Compute Resource Manager and have read the necessary HexagonSDK documentation. It is also expected that the programmer fully understands the benefits and limitations of using VTCM in their use case. Furthermore, given that VTCM sharing is an expert level feature, it is expected that users of this feature have very tight control over their use case and can reliably control their callflows.
The following shorthand will be used to enhance callflow legibility. Please note that only acquire and release will be shown to simplify the diagrams as much as possible.
1// setup(vtcm_size, page_size)
2HAP_compute_res_attr_set_cache_mode(&attr, true);
3HAP_compute_res_attr_set_vtcm_param_v2(&attr, vtcm_size, page_size, vtcm_size);
4auto ctx = HAP_compute_res_acquire(&attr, TIMEOUT_US);
5
6// acquire(vtcm_size, page_size)
7HAP_compute_res_acquire_cached(ctx, TIMEOUT_US);
8
9// release()
10HAP_compute_res_release_cached(ctx);
Overmap¶
An overmap occurs when page size is larger than the usable portion. In this example acquire(6,8),
VTCM size is 6MB; and page size is 8MB resulting in an 2MB overmap at the end; an unaddressable portion.
Fundamentals¶
In the following diagrams, C1 denotes client 1 which is a user of VTCM. Unless noted, all clients and QNN are the same priority.
The below diagram illustrates what occurs without VTCM sharing.
QNN requests a maximum-size page mapping at time [1] which prevents other
clients from using VTCM. Since C1 and QNN are the same priority C1 follows
the HAP Compute Resource Manager cooperative acquisition rules.
The below diagram illustrates the new callflow. C1 is free to access its 2MB page while QNN freely accesses its 6MB. With VTCM sharing there are two key changes:
The usable portion of the overmap is aligned to the end.
If another client’s page size fits in the overmap, it may coexist in the overmap.
VTCM Sharing Rules¶
This feature is only enabled for cached acquire and release APIs.
This feature activates when an overmap is requested, and cannot be disabled.
Clients must be in the same PD (Process Domain) for coexistence to occur. This is due to platform enforced security which prevents cross-PD sharing.
Approved overmap requests will place the usable portion at the end of VTCM, not the beginning.
To co-exist a non-overmapped entity must make an
acquirerequest wherepage_sizefits in the overmap space.Order of the two entitys does not matter, only that the requests fit.
This feature only works within a PD.
QNN must use >=50% of the hardware VTCM size for its graph size. If it is less this feature will not activate.
When sharing pointers in VTCM:
Warning
To share VTCM pointers the non-QNN entity must be maximum priority and both entities must be in a Signed PD.
If an entity wants to share a VTCM pointer with QNN, the entity must never release VTCM until QNN returns.
This prevents a race condition where the non-owner entity tries to access VTCM before acquire is called.
In regards to yielding:
Acquire requests that don’t fit will follow normal cooperative scheduling rules as enforced by the HAP Compute Resource Manager.
All VTCM clients are individually responsible to yield and save/restore VTCM.
Yielding is still based on priority.
If a yield requires multiple clients to be evicted, the HAP Compute Resource Manager will only do so if all active clients are at a lower priority than the new client requesting VTCM.
If the HAP Compute Resource Manager can evict one entity to service an acquire, it will do that instead of asking each client to release.
Native HTP Use Case¶
In this use case pointer sharing requires no work from the C1 entity because both C1 and QNN share the same address space.
All that is required is for C1 to hold its VTCM acquire call until QNN is finished at [6].
FastRPC Use Case¶
Like the Native HTP use case, C1 must hold VTCM for the duration of the inference. In addition, this use case requires sending the VTCM VA back to the CPU so that it can be passed to QNN.
To utilize this call flow the following code sample has been provided to demonstrate which API calls must be made.
//==============================================================================
//
// Copyright (c) 2022, 2024 Qualcomm Technologies, Inc.
// All Rights Reserved.
// Confidential and Proprietary - Qualcomm Technologies, Inc.
//
//==============================================================================
// From Backend-Specific API Headers
#include "QnnHtpMem.h"
int main(void) {
// Setup the QNN Device or else registration will fail
Qnn_DeviceHandle_t device = NULL;
if (QNN_SUCCESS != QnnDevice_create(NULL, NULL, &device) || device == NULL) {
// Do something
}
// Must bind the device to a context
Qnn_ContextHandle_t context = NULL;
QnnContext_create(backend, // user must provide this
device, // from above
NULL,
&context);
// This example will use an input tensor. You can do the
// same for any graph output tensors
Qnn_Tensor_t inputTensor = {};
// Call graph create setting the right VTCM size
QnnHtpGraph_CustomConfig_t customConfig;
customConfig.option = QNN_HTP_GRAPH_CONFIG_OPTION_VTCM_SIZE;
customConfig.vtcmSizeInMB = 7; // put a number less than hardware page size
QnnGraph_Config_t gConfig;
gConfig.option = QNN_GRAPH_CONFIG_OPTION_CUSTOM;
gConfig.customConfig = &customConfig;
const QnnGraph_Config_t* graphConfig[] = {&gConfig, NULL};
// Use the new graph size
Qnn_GraphHandle_t graph = NULL;
Qnn_ErrorHandle_t graphError = QnnGraph_create(
context, // from above
"QnnGraph", // name
graphConfig, // from above
&graph);
if (QNN_SUCCESS != graphError) {
// Do something
}
// Finalize the graph
// This call corresponds with step #2
uint32_t dspVirtualAddress = getDspVirtualAddress();
// Structs from QnnHtpMem.h
QnnMemHtp_Descriptor_t inputHtpDescriptor{
.type = QNN_HTP_MEM_QURT, // const from the header QnnHtpMem.h
.size = INPUT_TENSOR_SIZE, // user must provide this
.qurtAddress = dspVirtualAddress};
// Structs from QnnMem.h
Qnn_MemShape_t qnnMemShapeInput = {.numDim = INPUT_TENSOR_DIMS_LEN, // user must provide this
.dimSize = INPUT_TENSOR_DIMS, // user must provide this
.shapeConfig = NULL};
Qnn_MemDescriptor_t inputQnnDescriptor{
.memShape = qnnMemShapeInput, // from above
.dataType = INPUT_TENSOR_DTYPE, // user must provide this
.memType = QNN_MEM_TYPE_CUSTOM, // from QnnHtpMem.h
.customInfo = &inputHtpDescriptor, // above
};
Qnn_MemHandle_t inputMemHandle = NULL;
// Register the memory, note that the context is the same as above that
// used the Qnn_DeviceHandle_t
int status = QnnMem_register(context, &inputQnnDescriptor, 1, &inputMemHandle);
if (status != QNN_SUCCESS) {
// Do something
}
inputTensor.v1.memType = QNN_TENSORMEMTYPE_MEMHANDLE;
inputTensor.v1.memHandle = inputMemHandle;
// You can now use the inputTensor as an arugment to graph execute
}
Pre Emption¶
Pre-emption is still a concern in that the HAP Compute Resource Manager expects cooperative pre-emption from its clients.
Two Graphs in the Same PD¶
In this use case two graphs (QNN1,2) coexist with C1. Graph 2 wants to execute and because it is the high priority, the HAP Compute Resource Manager will trigger a preemption. C1 is not preempted because it is in the same PD as QNN, and it’s VTCM usage fits in Graph 2’s overmap space. Therefore, only graph 1 is asked to yield.
Two PDs - Incoming is Higher Priority¶
In this case, C1 and QNN are in the same PD, and PD2 is any client in another PD. VTCM sharing is impossible across PDs so normal scheduling rules follow. PD2 is a higher priority than either client in PD1 so the HAP Compute Resource Manager asks all active clients in PD1 to yield, then giving VTCM to PD2.
Two PDs - Incoming is Lower Priority¶
This case is identical to the previous, except one active client in PD1 is higher priority than the client in PD2. Following the normal queue rules, PD2 must wait until the high priority C1 entity decides to release. At that point the priority of PD2 is greater than the priority of PD1, and a release request is sent to QNN.
Warning
Please note that in this diagram the release at step [4] is not triggered by step [3]. Instead PD2 must wait until all high priority clients release, then the Hexagon OS will initiate preemption.
VTCM Windowing¶
Warning
Starting with Hexagon V79 certain chips may support the VTCM windowing functionality. Please refer to the Hexagon SDK documentation for a thorough list of supported platforms.
Please refer to the Hexagon SDK to fully understand the VTCM windowing feature and its limitations. In regards to VTCM Sharing, windowing will permit coexistence across PDs. Meaning, another process’s page size can fit into QNN’s VTCM overmap region allowing the two PDs to execute in parallel.
Pointer sharing across PDs is not allowed and will result in undefined behaviour.