HTP Yielding and Pre-Emption¶
Warning
Please note that this document is meant to accompany the HexagonSDK’s documentation on Resource Management and Multithreading notes. Please familiarize yourself with that material before proceeding.
Yielding and pre-emption of Hexagon clients (QNN Graphs or non-ML use-cases) are based on HexagonOS client thread priority.
Every graph in QNN will acquire VTCM with a priority specified by Qnn_Priority_t. The priority mappings are listed in the
table below.
Qnn_Priority_t |
Hexagon OS Thread Priority |
|---|---|
QNN_PRIORITY_LOW |
|
QNN_PRIORITY_NORMAL |
|
QNN_PRIORITY_NORMAL_HIGH |
|
QNN_PRIORITY_HIGH |
|
Warning
This page is not applicable to Windows platforms since the yielding and pre-emption
of QNN Graphs relies on the MCDM framework instead of Hexagon OS thread priority.
After QNN SDK 2.29.0, on Windows platforms all four priorities are mapped to the same
Hexagon OS Thread Priority 0xc0. Instead, the scheduling and preemption rely
on the MCDM framework in Windows OS. The graph priority is considered by the MCDM framework
but the OS determines the final priority and may not fully adhere to the priority set by clients.
Preemption enables a higher priority client to be given access to hardware resources (such as VTCM) and allows the high priority client to be schduled immediately.
1. Base Case: Single Client Pre-Emption¶
First we will document the base cases (these apply whether clients are either are in the same PD or different PDs). The client is any thread that has called vtcm cached acquire. As per the SDK docs:
1// Set Hexagon OS client priority, see above table for values
2// used by QNN. Only works prior to V81
3int err = nn_os_set_current_thread_priority(priority);
4
5// Initiate request for VTCM resources. This will trigger a pre-emption
6// or queing based on scenarios demonstrated below
7int err = HAP_compute_res_acquire_cached(vtcm_context, TIMEOUT_US);
1.1 Same Priority¶
Both Client A and Client B have the same priority. In this case graph B must wait for graph A.
1.2 Lower Priority¶
Client A is a higher priority than Client B. Client B must wait for Client A to finish, which is identical to the case where both graphs are the same priority.
1.3 Higher Priority¶
Client A is a lower priority than Client B. In this case graph A will be pre-empted.
Please note that although Client B is higher priority it must still wait for the Hexagon OS to evict Client A. Therefore, Client B isn’t able to execute right away.
1.4 Complex Example¶
In this use case there are three clients:
Two QNN Graphs (Graph 1 - high priority and Graph2 - low priority)
One customer application - priority inbetween graph’s 1 & 2
2. Resource Sharing under Concurrency¶
It is possible for any two clients to execute simultanouesly if they share HVX and VTCM resources, only one may hold HMX. Refer to HTP VTCM Sharing to understand how to share VTCM.
Please note there are a few scenarios where concurrency is not supported:
Concurrency is not supported across PDs in any case.
Concurrency is not supported prior to Hexagon V73, please see HTP VTCM Sharing documentation.
It is impossible for two QNN graphs to execute concurrently.
QNN will always take the HMX unit, so the other client may not utilize it if they wish to execute alongisde QNN. If another client has the HMX unit then QNN will pause until it can acquire it.
2.1. Same PD - 2 Graphs, 1 Client App¶
In this example there are 3 clients. Our device has 2 HVX threads, 1 HMX unit, and 4MB of VTCM.
Client |
Priority |
HMX units |
HVX units |
VTCM |
|---|---|---|---|---|
QNN Graph1 |
High |
1 |
1 |
3MB |
Client App |
Medium |
0 |
1 |
1MB |
QNN Graph2 |
Low |
1 |
1 |
2MB |
Time |
HMX |
HVX0 |
HVX1 |
VTCM 0 |
VTCM 1 |
VTCM 2 |
VTCM 3 |
|---|---|---|---|---|---|---|---|
0 |
Graph 1 |
Graph 1 |
Graph 1 |
Graph 1 |
Graph 1 |
||
10 |
Graph 1 |
Graph 1 |
Client App |
Client App |
Graph 1 |
Graph 1 |
Graph 1 |
30 |
Graph 2 |
Graph 2 |
Client App |
Client App |
Graph 2 |
Graph 2 |
|
50 |
Graph 1 |
Graph 1 |
Client App |
Client App |
Graph 1 |
Graph 1 |
Graph 1 |
60 |
Graph 1 |
Graph 1 |
Graph 1 |
Graph 1 |
Graph 1 |
||
70 |
Graph 2 |
Graph 2 |
Graph 2 |
Graph 2 |
2.2. 2 PDs - 2 Graphs, 1 Client App¶
In this example there are 3 clients. Our device has 2 HVX threads, 1 HMX unit, and 4MB of VTCM.
PD |
Client |
Priority |
HMX units |
HVX units |
VTCM |
|---|---|---|---|---|---|
1 |
QNN Graph1 |
High |
1 |
1 |
3MB |
1 |
Client App |
Low |
1 |
1 |
1MB |
2 |
QNN Graph2 |
Medium |
1 |
1 |
2MB |
Time |
HMX |
HVX0 |
HVX1 |
VTCM 0 |
VTCM 1 |
VTCM 2 |
VTCM 3 |
|---|---|---|---|---|---|---|---|
0 |
Graph 1 |
Graph 1 |
Graph 1 |
Graph 1 |
Graph 1 |
||
10 |
Graph 1 |
Graph 1 |
Client App |
Client App |
Graph 1 |
Graph 1 |
Graph 1 |
40 |
Graph 2 |
Graph 2 |
Graph 2 |
Graph 2 |
|||
70 |
Graph 1 |
Graph 1 |
Client App |
Client App |
Graph 1 |
Graph 1 |
Graph 1 |
100 |
Graph 2 |
Graph 2 |
Graph 2 |
Graph 2 |