Overview¶
Qualcomm® AI Engine Direct is the Qualcomm Technologies Inc. (QTI) software architecture for AI/ML use cases on QTI chipsets and AI acceleration cores.
The Qualcomm® AI Engine Direct architecture is designed to provide a unified API and modular and extensible per-accelerator libraries, which form a reusable basis for full stack AI solutions, both for QTI’s own and third-party frameworks (shown in AI Software Stack with Qualcomm AI Engine Direct).
AI Software Stack with Qualcomm AI Engine Direct
Features¶
Modularity based on hardware accelerators
The Qualcomm® AI Engine Direct architecture is designed to be modular and allows for clean separation in the software for different hardware cores/accelerators, such as the CPU, GPU, and DSP that are designated as backends.
Learn more about the Qualcomm® AI Engine Direct backends here.
The Qualcomm® AI Engine Direct backends for different hardware cores/accelerators are compiled into individual core-specific libraries that are packaged with the SDK.
Unified API across IP Cores
One of the key highlights of Qualcomm® AI Engine Direct is that it provides a unified API to delegate operations, such as graph creation and execution across all hardware accelerator backends. This allows users to treat Qualcomm® AI Engine Direct as a hardware abstraction API and port applications easily to different cores.
Right level of abstraction
The Qualcomm® AI Engine Direct API is designed to support an efficient execution model with capabilities, such as graph optimizations, to be taken care of internally. At the same time however, it leaves out broader functionality, such as model parsing and network partitioning to higher level frameworks.
Flexibility in composition
With Qualcomm® AI Engine Direct, users can choose appropriate tradeoffs between capabilities provided by the backends and the footprint in terms of library size and memory utilization. This offers the ability to compose a Qualcomm® AI Engine Direct Operation Package with only operations required to serve a set of models targeted by a use case 1. With this, users can create nimble applications with low-memory footprint that fits a wide variety of hardware products.
Extensible operation support
Qualcomm® AI Engine Direct also provides support for clients to integrate custom operations to work seamlessly alongside the built-in operations.
Improved execution performance
With optimized network loading and asynchronous execution support, Qualcomm® AI Engine Direct serves to provide a highly efficient interface for ML frameworks and applications to load and execute network graphs on their preferred hardware accelerator.
Supported Snapdragon devices¶
Snapdragon Device/Chip |
Supported Toolchains |
SOC Model 2 |
Hexagon Arch |
LPAI Arch |
|---|---|---|---|---|
Snapdragon X2 Elite Extreme (SC8480XP) |
aarch64-windows-msvc arm64x-windows-msvc |
88 |
V81 |
v5 |
Snapdragon 8cx Gen 4 (SC8380XP) |
aarch64-windows-msvc arm64x-windows-msvc |
60 |
V73 |
– |
Snapdragon 8cx Gen 3 (SC8280X) |
aarch64-windows-msvc |
37 |
V68 |
– |
Snapdragon 7c Gen 2 (SC7280X) |
aarch64-windows-msvc |
44 |
V68 |
– |
SD 8 Elite Gen 5 (SM8850) |
aarch64-android |
87 |
V81 |
v6 |
SD 8 Elite (SM8750) |
aarch64-android |
69 |
V79 |
v5 |
SD 8 Gen 3 (SM8650) |
aarch64-android |
57 |
V75 |
– |
SD 8 Gen 2 (SM8550) |
aarch64-android |
43 |
V73 |
– |
SD 8+ Gen 1 (SM8475) |
aarch64-android |
42 |
V69 |
– |
SD 8 Gen 1 (SM8450) |
aarch64-android |
36 |
V69 |
– |
888+ (SM8350P) 888 (SM8350) |
aarch64-android |
30 |
V68 |
– |
7 Gen 1 (SM7450) |
aarch64-android |
41 |
V69 |
– |
778G (SM7325) |
aarch64-android |
35 |
V68 |
– |
QCM6490 |
aarch64-android aarch64-ubuntu-gcc9.4 aarch64-oe-linux-gcc11.2 |
35 |
V68 |
– |
865 (SM8250) |
aarch64-android |
21 |
V66 |
– |
765 (SM7250) |
aarch64-android |
25 |
V66 |
– |
750G (SM7225) 690 (SM6350) |
aarch64-android |
29 |
V66 V66 |
– |
QRB5165 |
aarch64-ubuntu-gcc9.4 aarch64-oe-linux-gcc9.3 aarch64-oe-linux-gcc11.2 |
21 |
V66 |
– |
QCS7230 |
aarch64-android aarch64-oe-linux-gcc9.3 aarch64-oe-linux-gcc11.2 |
51 |
V66 |
– |
680 (SM6225) |
aarch64-android |
40 |
V66 |
– |
480 (SM4350) 695 (SM6375) |
aarch64-android |
31 |
V66 V66 |
– |
460 (SM4250) 662 (SM6115) QCM4290 |
aarch64-android |
28 |
V66 V66 V66 |
– |
QCS610 |
aarch64-android aarch64-oe-linux-gcc9.3 |
16 |
V66 |
– |
QCS410 |
aarch64-android aarch64-oe-linux-gcc9.3 |
33 |
V66 |
– |
QCM6125 |
aarch64-android |
19 |
V66 |
– |
QRB4210 |
aarch64-oe-linux-gcc9.3 |
49 |
V66 |
– |
QCM4490 |
aarch64-android |
59 |
N/A |
– |
780G (SM7350) |
aarch64-android |
32 |
V68 |
– |
SM8325 |
aarch64-android |
34 |
V68 |
– |
SM7315 |
aarch64-android |
38 |
V68 |
– |
6 Gen 1 (SM6450) |
aarch64-android |
50 |
V73 |
– |
7+ Gen 2 (SM7475) |
aarch64-android |
54 |
V69 |
– |
4 Gen 2 (SM4450) |
aarch64-android |
59 |
N/A |
– |
8s Gen 3 (SM8635) |
aarch64-android |
68 |
V73 |
– |
7+ Gen 3 (SM7675) |
aarch64-android |
70 |
V73 |
– |
QCS/QCM8550 |
aarch64-oe-linux-gcc11.2 aarch64-android |
66 |
V73 |
– |
QCS9100 |
aarch64-oe-linux-gcc11.2 |
77 |
V73 |
– |
QCS/QCM6690 |
aarch64-android |
78 |
V73 |
– |
QCS/QCM2290 |
aarch64-android |
83 |
N/A |
– |
XR2-Gen 2 (SXR2230P) |
aarch64-android |
53 |
V69 |
– |
AR2-Gen 1 (SAR2130P) |
aarch64-android |
46 |
V73 |
– |
AR1-Gen1 Luna1 (SSG2115P) |
aarch64-android |
46 |
V73 |
– |
AR1-Gen1 Luna2 (SSG2125P) |
aarch64-android |
58 |
V73 |
– |
QCS8625 |
aarch64-oe-linux-gcc11.2 aarch64-android |
90 |
V75 |
– |
Software architecture¶
The Qualcomm® AI Engine Direct API and the associated software stack provides all the constructs required by an application to construct, optimize, and execute network models on the preferred hardware accelerator core.
Key constructs are shown in Qualcomm AI Engine Direct Components - High Level View.
Qualcomm AI Engine Direct Components - High Level View
Device¶
The software abstraction of a hardware accelerator platform. Provides all constructs required to associate the preferred hardware accelerator resources for execution of user-composed graphs. A platform is broken down into potentially multiple devices. Devices may have multiple cores.
Backend¶
The backend is a top-level API component that hosts and manages most of the backend resources required for graph composition and execution, including an operation registry that stores all available operations.
Learn more about the Qualcomm® AI Engine Direct backends here.
Context¶
A construct that represents all Qualcomm® AI Engine Direct components required to sustain a user application. Hosts networks provided by the user and allows constructed entities to be cached into serialized objects for future use. It enables interoperability between multiple graphs by providing a shareable memory space in which tensors can be exchanged between graphs.
Graph¶
The Qualcomm® AI Engine Direct way of representing a loadable network model. Consists of nodes that represent operations and tensors that interconnect them to compose a directed acyclic graph. The Qualcomm® AI Engine Direct graph construct supports APIs that perform initialization, optimization, and execution of network models.
Integration workflow¶
The Qualcomm® AI Engine Direct SDK provides tools and extensible per-accelerator libraries with uniform API, enabling flexible integration and efficient execution of ML/DL neural networks on QTI chipsets. The Qualcomm® AI Engine Direct API is designed to support inference of trained neural networks and, as such, clients are responsible for training a ML/DL network in a training framework of their choice. The training process is typically performed on server hosts, off-device. Once a network is trained, clients can use Qualcomm® AI Engine Direct to get it ready to deploy and run on-device.
This workflow is shown in Training vs. Inference Workflow.
Training vs. Inference Workflow
The Qualcomm® AI Engine Direct SDK includes tools to aid clients in integrating trained DL networks into their applications.
The basic integration workflow is shown in Qualcomm AI Engine Direct Integration Workflow.
Qualcomm AI Engine Direct Integration Workflow
Clients call the Qualcomm® AI Engine Direct converter tool by providing their trained network model file as input. The network must be trained in a framework supported by the Qualcomm® AI Engine Direct converter tools. See Tools for more details on Qualcomm® AI Engine Direct converters.
When source models contain operations that are not supported natively by Qualcomm® AI Engine Direct backends, clients must provide OpPackage definition files to the converter, expressing custom/client-defined operations. Optionally, users can use the OpPackage generator tool to generate skeleton code to implement and compile custom operations into OpPackage libraries. See qnn-op-package-generator for usage details.
The Qualcomm® AI Engine Direct model converter is a tool to aid clients in writing a sequence of Qualcomm® AI Engine Direct API calls to construct a Qualcomm® AI Engine Direct graph representation of a trained network that was provided as input to the tool. The converter outputs the following files:
.cpp– Source file (e.g.,model.cpp) containing required Qualcomm® AI Engine Direct API calls to construct a network graph.bin– Binary file (e.g.,model.bin) containing network weights and biases as float32 data.
Clients can optionally direct the converter to output a quantized model instead of the default one, as indicated in the diagram above as quantized model.cpp. In this case, the
model.binfile will contain quantized data, andmodel.cppwill reference quantized tensor data types and include quantization encodings. Quantized models may be required by some Qualcomm® AI Engine Direct backend libraries, e.g., HTP or DSP (see general/api:Backend Supplements for information on supported data types). For details on converter quantization function and options, see Quantization Support.Clients optionally can use the Qualcomm® AI Engine Direct model library generator tool to produce a model library. See qnn-model-lib-generator for usage details.
Clients integrate the Qualcomm® AI Engine Direct model into their application by either dynamically loading a model library or compiling and statically linking
model.cppandmodel.binfiles. To prepare and execute the model (i.e., run inference), clients must load the required Qualcomm® AI Engine Direct backend accelerator and OpPackage libraries. The Qualcomm® AI Engine Direct OpPackage libraries are registered with and loaded by the backend.Clients can optionally save the context binary cache with prepared and finalized graphs. See Context caching for reference. Such graphs can be repeatedly loaded from the cache without the need for
model.cpp/ library. Loading a model graph from the cache is significantly faster than preparing through a sequence of graph composition API calls provided inmodel.cpp/ library. Cached graphs cannot be further modified; they are meant for deployment of prepared graphs, enabling faster initialization of client applications.Clients can optionally utilize Deep Learning Containers (DLCs) produced from Qualcomm Neural Processing SDK in conjunction with the provided
libQnnModelDlc.solibrary to produce QNN graph handles from DLC paths in their application. This provides a single format for use across products and support for large models that cannot be compiled into a shared model library. Details on usage can be found in Utilizing DLCs.
Developers on Linux¶
Executables and libraries can be found under the target folder of the SDK with linux, ubuntu, or android in the name. See Release Folder for Different Platforms for reference. The operation mentioned above can be run under Linux OS, such as Ubuntu system.
Integration workflow on Windows¶
Developers on Windows¶
The Qualcomm SDK provide three different platforms for Windows host. For users familiar to Linux operation, we suggest using WSL (Windows Subsystem for Linux) on Windows. For developers who want to use tools on Windows-PC directly through the PowerShell environment, Qualcomm® AI Engine Direct provides tools based on x86_64-windows. Check the following prerequisites.
WSL platform¶
For WSL developers: The workflow on a Windows host is the same as on a Linux host, though some steps will require execution on WSL (x86) and others will be executed natively on Windows as outlined below. Because WSL is run on a GNU/Linux environment, model tools and libraries should get from x86_64-linux-clang respectively. To understand more about the WSL setup, visit Linux Platform Dependency.
Windows platform¶
For Windows native/x86_64 PC developers:
The tools and libraries are located within the x86_64-windows-msvc folder. Tools are highly related to Python version and setting. Therefore,
the environment setup for PowerShell is required before operation. Refer to the settings for the Windows
Windows Platform Dependencies.
For Windows on Snapdragon developers:
The tools and libraries are located within the aarch64-windows-msvc folder. The environment setup for PowerShell is required before operation.
Refer to the settings for the Windows
Windows Platform Dependencies.
For OP Customization, the Op Package skeleton code is generated by running the Linux OpPackage Generator tool on WSL (x86).
For Context Binary Generation, clients can use either the Linux Context Binary Generator tool on WSL (x86) or windows-native powershell. The tool executable and libraries used must be from the corresponding folder, as mentioned in Release Folder for Different Platforms.
For Model Library Generation, the model library is produced by running the Windows Model Library Generator tool natively on Windows.
Tools mentioned in the integration workflow can be applied through WSL or Windows native x86. The supporting list of the tools by platform is demonstrated in Tools.
The ARM64X package format is supported for CPU and HTP backend on SC8380XP. The tools and libraries are located within the
arm64x-windows-msvcfolder. See ARM64X Tutorial for the usage and details.
Note
When using WSL, the Model Tools must be obtained from the linux folder as it is a subsystem for Linux.
Note
When run on Windows natively, the Model Library Generation Tool must be run with python. See Model Build on Windows Host section for an example.
Notes
- 1
Future feature.
- 2
The SOC Model number is intended to be used in QNN API calls, for example when configuring the QNN HTP backend.