HTP Yielding and Pre-Emption

Warning

Please note that this document is meant to accompany the HexagonSDK’s documentation on Resource Management and Multithreading notes. Please familiarize yourself with that material before proceeding.

Yielding and pre-emption of Hexagon clients (QNN Graphs or non-ML use-cases) are based on HexagonOS client thread priority. Every graph in QNN will acquire VTCM with a priority specified by Qnn_Priority_t. The priority mappings are listed in the table below.

QNN HTP Priority Mapping

Qnn_Priority_t

Hexagon OS Thread Priority

QNN_PRIORITY_LOW

0xc5

QNN_PRIORITY_NORMAL

0xc0

QNN_PRIORITY_NORMAL_HIGH

0xbd

QNN_PRIORITY_HIGH

0xbb

Warning

This page is not applicable to Windows platforms since the yielding and pre-emption of QNN Graphs relies on the MCDM framework instead of Hexagon OS thread priority. After QNN SDK 2.29.0, on Windows platforms all four priorities are mapped to the same Hexagon OS Thread Priority 0xc0. Instead, the scheduling and preemption rely on the MCDM framework in Windows OS. The graph priority is considered by the MCDM framework but the OS determines the final priority and may not fully adhere to the priority set by clients.

Preemption enables a higher priority client to be given access to hardware resources (such as VTCM) and allows the high priority client to be schduled immediately.

1. Base Case: Single Client Pre-Emption

First we will document the base cases (these apply whether clients are either are in the same PD or different PDs). The client is any thread that has called vtcm cached acquire. As per the SDK docs:

1// Set Hexagon OS client priority, see above table for values
2// used by QNN. Only works prior to V81
3int err = nn_os_set_current_thread_priority(priority);
4
5// Initiate request for VTCM resources. This will trigger a pre-emption
6// or queing based on scenarios demonstrated below
7int err = HAP_compute_res_acquire_cached(vtcm_context, TIMEOUT_US);

1.1 Same Priority

Both Client A and Client B have the same priority. In this case graph B must wait for graph A.

../../_static/resources/htp_yield/base_queue.png

1.2 Lower Priority

Client A is a higher priority than Client B. Client B must wait for Client A to finish, which is identical to the case where both graphs are the same priority.

../../_static/resources/htp_yield/base_queue.png

1.3 Higher Priority

Client A is a lower priority than Client B. In this case graph A will be pre-empted.

Please note that although Client B is higher priority it must still wait for the Hexagon OS to evict Client A. Therefore, Client B isn’t able to execute right away.

../../_static/resources/htp_yield/base_higher.png

1.4 Complex Example

In this use case there are three clients:

  • Two QNN Graphs (Graph 1 - high priority and Graph2 - low priority)

  • One customer application - priority inbetween graph’s 1 & 2

../../_static/resources/htp_yield/base_complex.png

2. Resource Sharing under Concurrency

It is possible for any two clients to execute simultanouesly if they share HVX and VTCM resources, only one may hold HMX. Refer to HTP VTCM Sharing to understand how to share VTCM.

Please note there are a few scenarios where concurrency is not supported:

  1. Concurrency is not supported across PDs in any case.

  2. Concurrency is not supported prior to Hexagon V73, please see HTP VTCM Sharing documentation.

  3. It is impossible for two QNN graphs to execute concurrently.

  4. QNN will always take the HMX unit, so the other client may not utilize it if they wish to execute alongisde QNN. If another client has the HMX unit then QNN will pause until it can acquire it.

2.1. Same PD - 2 Graphs, 1 Client App

In this example there are 3 clients. Our device has 2 HVX threads, 1 HMX unit, and 4MB of VTCM.

Client

Priority

HMX units

HVX units

VTCM

QNN Graph1

High

1

1

3MB

Client App

Medium

0

1

1MB

QNN Graph2

Low

1

1

2MB

../../_static/resources/htp_yield/same_pd.png

Time

HMX

HVX0

HVX1

VTCM 0

VTCM 1

VTCM 2

VTCM 3

0

Graph 1

Graph 1

Graph 1

Graph 1

Graph 1

10

Graph 1

Graph 1

Client App

Client App

Graph 1

Graph 1

Graph 1

30

Graph 2

Graph 2

Client App

Client App

Graph 2

Graph 2

50

Graph 1

Graph 1

Client App

Client App

Graph 1

Graph 1

Graph 1

60

Graph 1

Graph 1

Graph 1

Graph 1

Graph 1

70

Graph 2

Graph 2

Graph 2

Graph 2

2.2. 2 PDs - 2 Graphs, 1 Client App

In this example there are 3 clients. Our device has 2 HVX threads, 1 HMX unit, and 4MB of VTCM.

PD

Client

Priority

HMX units

HVX units

VTCM

1

QNN Graph1

High

1

1

3MB

1

Client App

Low

1

1

1MB

2

QNN Graph2

Medium

1

1

2MB

../../_static/resources/htp_yield/multi_pd.png

Time

HMX

HVX0

HVX1

VTCM 0

VTCM 1

VTCM 2

VTCM 3

0

Graph 1

Graph 1

Graph 1

Graph 1

Graph 1

10

Graph 1

Graph 1

Client App

Client App

Graph 1

Graph 1

Graph 1

40

Graph 2

Graph 2

Graph 2

Graph 2

70

Graph 1

Graph 1

Client App

Client App

Graph 1

Graph 1

Graph 1

100

Graph 2

Graph 2

Graph 2

Graph 2