Overview

Qualcomm® AI Engine Direct is the Qualcomm Technologies Inc. (QTI) software architecture for AI/ML use cases on QTI chipsets and AI acceleration cores.

The Qualcomm® AI Engine Direct architecture is designed to provide a unified API and modular and extensible per-accelerator libraries, which form a reusable basis for full stack AI solutions, both for QTI’s own and third-party frameworks (shown in AI Software Stack with Qualcomm AI Engine Direct).

AI Software Stack with Qualcomm AI Engine Direct

../_static/resources/qnn_software_stack.png

Features

Modularity based on hardware accelerators

The Qualcomm® AI Engine Direct architecture is designed to be modular and allows for clean separation in the software for different hardware cores/accelerators, such as the CPU, GPU, and DSP that are designated as backends.

Learn more about the Qualcomm® AI Engine Direct backends here.

The Qualcomm® AI Engine Direct backends for different hardware cores/accelerators are compiled into individual core-specific libraries that are packaged with the SDK.

Unified API across IP Cores

One of the key highlights of Qualcomm® AI Engine Direct is that it provides a unified API to delegate operations, such as graph creation and execution across all hardware accelerator backends. This allows users to treat Qualcomm® AI Engine Direct as a hardware abstraction API and port applications easily to different cores.

Right level of abstraction

The Qualcomm® AI Engine Direct API is designed to support an efficient execution model with capabilities, such as graph optimizations, to be taken care of internally. At the same time however, it leaves out broader functionality, such as model parsing and network partitioning to higher level frameworks.

Flexibility in composition

With Qualcomm® AI Engine Direct, users can choose appropriate tradeoffs between capabilities provided by the backends and the footprint in terms of library size and memory utilization. This offers the ability to compose a Qualcomm® AI Engine Direct Operation Package with only operations required to serve a set of models targeted by a use case 1. With this, users can create nimble applications with low-memory footprint that fits a wide variety of hardware products.

Extensible operation support

Qualcomm® AI Engine Direct also provides support for clients to integrate custom operations to work seamlessly alongside the built-in operations.

Improved execution performance

With optimized network loading and asynchronous execution support, Qualcomm® AI Engine Direct serves to provide a highly efficient interface for ML frameworks and applications to load and execute network graphs on their preferred hardware accelerator.

Supported Snapdragon devices

Snapdragon Device/Chip

Supported Toolchains

SOC Model 2

Hexagon Arch

LPAI Arch

Snapdragon X2 Elite Extreme (SC8480XP)

aarch64-windows-msvc

arm64x-windows-msvc

88

V81

v5

Snapdragon 8cx Gen 4 (SC8380XP)

aarch64-windows-msvc

arm64x-windows-msvc

60

V73

Snapdragon 8cx Gen 3 (SC8280X)

aarch64-windows-msvc

37

V68

Snapdragon 7c Gen 2 (SC7280X)

aarch64-windows-msvc

44

V68

SD 8 Elite Gen 5 (SM8850)

aarch64-android

87

V81

v6

SD 8 Elite (SM8750)

aarch64-android

69

V79

v5

SD 8 Gen 3 (SM8650)

aarch64-android

57

V75

SD 8 Gen 2 (SM8550)

aarch64-android

43

V73

SD 8+ Gen 1 (SM8475)

aarch64-android

42

V69

SD 8 Gen 1 (SM8450)

aarch64-android

36

V69

888+ (SM8350P)

888 (SM8350)

aarch64-android

30

V68

7 Gen 1 (SM7450)

aarch64-android

41

V69

778G (SM7325)

aarch64-android

35

V68

QCM6490

aarch64-android

aarch64-ubuntu-gcc9.4

aarch64-oe-linux-gcc11.2

35

V68

865 (SM8250)

aarch64-android

21

V66

765 (SM7250)

aarch64-android

25

V66

750G (SM7225)

690 (SM6350)

aarch64-android

29

V66

V66

QRB5165

aarch64-ubuntu-gcc9.4

aarch64-oe-linux-gcc9.3

aarch64-oe-linux-gcc11.2

21

V66

QCS7230

aarch64-android

aarch64-oe-linux-gcc9.3

aarch64-oe-linux-gcc11.2

51

V66

680 (SM6225)

aarch64-android

40

V66

480 (SM4350)

695 (SM6375)

aarch64-android

31

V66

V66

460 (SM4250)

662 (SM6115)

QCM4290

aarch64-android

28

V66

V66

V66

QCS610

aarch64-android

aarch64-oe-linux-gcc9.3

16

V66

QCS410

aarch64-android

aarch64-oe-linux-gcc9.3

33

V66

QCM6125

aarch64-android

19

V66

QRB4210

aarch64-oe-linux-gcc9.3

49

V66

QCM4490

aarch64-android

59

N/A

780G (SM7350)

aarch64-android

32

V68

SM8325

aarch64-android

34

V68

SM7315

aarch64-android

38

V68

6 Gen 1 (SM6450)

aarch64-android

50

V73

7+ Gen 2 (SM7475)

aarch64-android

54

V69

4 Gen 2 (SM4450)

aarch64-android

59

N/A

8s Gen 3 (SM8635)

aarch64-android

68

V73

7+ Gen 3 (SM7675)

aarch64-android

70

V73

QCS/QCM8550

aarch64-oe-linux-gcc11.2

aarch64-android

66

V73

QCS9100

aarch64-oe-linux-gcc11.2

77

V73

QCS/QCM6690

aarch64-android

78

V73

QCS/QCM2290

aarch64-android

83

N/A

XR2-Gen 2 (SXR2230P)

aarch64-android

53

V69

AR2-Gen 1 (SAR2130P)

aarch64-android

46

V73

AR1-Gen1 Luna1 (SSG2115P)

aarch64-android

46

V73

AR1-Gen1 Luna2 (SSG2125P)

aarch64-android

58

V73

QCS8625

aarch64-oe-linux-gcc11.2

aarch64-android

90

V75

Software architecture

The Qualcomm® AI Engine Direct API and the associated software stack provides all the constructs required by an application to construct, optimize, and execute network models on the preferred hardware accelerator core.

Key constructs are shown in Qualcomm AI Engine Direct Components - High Level View.

Qualcomm AI Engine Direct Components - High Level View

../_static/resources/qnn_highlevel_view.png

Device

The software abstraction of a hardware accelerator platform. Provides all constructs required to associate the preferred hardware accelerator resources for execution of user-composed graphs. A platform is broken down into potentially multiple devices. Devices may have multiple cores.

Backend

The backend is a top-level API component that hosts and manages most of the backend resources required for graph composition and execution, including an operation registry that stores all available operations.

Learn more about the Qualcomm® AI Engine Direct backends here.

Context

A construct that represents all Qualcomm® AI Engine Direct components required to sustain a user application. Hosts networks provided by the user and allows constructed entities to be cached into serialized objects for future use. It enables interoperability between multiple graphs by providing a shareable memory space in which tensors can be exchanged between graphs.

Graph

The Qualcomm® AI Engine Direct way of representing a loadable network model. Consists of nodes that represent operations and tensors that interconnect them to compose a directed acyclic graph. The Qualcomm® AI Engine Direct graph construct supports APIs that perform initialization, optimization, and execution of network models.

Operation Package registry

A registry that maintains a record of all operations available to execute a model. These operations can be built-in or supplied by the user as custom operations.

Learn more about Operation Packages here.

Integration workflow

The Qualcomm® AI Engine Direct SDK provides tools and extensible per-accelerator libraries with uniform API, enabling flexible integration and efficient execution of ML/DL neural networks on QTI chipsets. The Qualcomm® AI Engine Direct API is designed to support inference of trained neural networks and, as such, clients are responsible for training a ML/DL network in a training framework of their choice. The training process is typically performed on server hosts, off-device. Once a network is trained, clients can use Qualcomm® AI Engine Direct to get it ready to deploy and run on-device.

This workflow is shown in Training vs. Inference Workflow.

Training vs. Inference Workflow

../_static/resources/training_inference_workflow.png

The Qualcomm® AI Engine Direct SDK includes tools to aid clients in integrating trained DL networks into their applications.

The basic integration workflow is shown in Qualcomm AI Engine Direct Integration Workflow.

Qualcomm AI Engine Direct Integration Workflow

../_static/resources/qnn_basic_workflow.png
  1. Clients call the Qualcomm® AI Engine Direct converter tool by providing their trained network model file as input. The network must be trained in a framework supported by the Qualcomm® AI Engine Direct converter tools. See Tools for more details on Qualcomm® AI Engine Direct converters.

  2. When source models contain operations that are not supported natively by Qualcomm® AI Engine Direct backends, clients must provide OpPackage definition files to the converter, expressing custom/client-defined operations. Optionally, users can use the OpPackage generator tool to generate skeleton code to implement and compile custom operations into OpPackage libraries. See qnn-op-package-generator for usage details.

  3. The Qualcomm® AI Engine Direct model converter is a tool to aid clients in writing a sequence of Qualcomm® AI Engine Direct API calls to construct a Qualcomm® AI Engine Direct graph representation of a trained network that was provided as input to the tool. The converter outputs the following files:

    • .cpp – Source file (e.g., model.cpp) containing required Qualcomm® AI Engine Direct API calls to construct a network graph

    • .bin – Binary file (e.g., model.bin) containing network weights and biases as float32 data.

    Clients can optionally direct the converter to output a quantized model instead of the default one, as indicated in the diagram above as quantized model.cpp. In this case, the model.bin file will contain quantized data, and model.cpp will reference quantized tensor data types and include quantization encodings. Quantized models may be required by some Qualcomm® AI Engine Direct backend libraries, e.g., HTP or DSP (see general/api:Backend Supplements for information on supported data types). For details on converter quantization function and options, see Quantization Support.

  4. Clients optionally can use the Qualcomm® AI Engine Direct model library generator tool to produce a model library. See qnn-model-lib-generator for usage details.

  5. Clients integrate the Qualcomm® AI Engine Direct model into their application by either dynamically loading a model library or compiling and statically linking model.cpp and model.bin files. To prepare and execute the model (i.e., run inference), clients must load the required Qualcomm® AI Engine Direct backend accelerator and OpPackage libraries. The Qualcomm® AI Engine Direct OpPackage libraries are registered with and loaded by the backend.

  6. Clients can optionally save the context binary cache with prepared and finalized graphs. See Context caching for reference. Such graphs can be repeatedly loaded from the cache without the need for model.cpp/ library. Loading a model graph from the cache is significantly faster than preparing through a sequence of graph composition API calls provided in model.cpp/ library. Cached graphs cannot be further modified; they are meant for deployment of prepared graphs, enabling faster initialization of client applications.

  7. Clients can optionally utilize Deep Learning Containers (DLCs) produced from Qualcomm Neural Processing SDK in conjunction with the provided libQnnModelDlc.so library to produce QNN graph handles from DLC paths in their application. This provides a single format for use across products and support for large models that cannot be compiled into a shared model library. Details on usage can be found in Utilizing DLCs.

Developers on Linux

Executables and libraries can be found under the target folder of the SDK with linux, ubuntu, or android in the name. See Release Folder for Different Platforms for reference. The operation mentioned above can be run under Linux OS, such as Ubuntu system.

Integration workflow on Windows

Developers on Windows

The Qualcomm SDK provide three different platforms for Windows host. For users familiar to Linux operation, we suggest using WSL (Windows Subsystem for Linux) on Windows. For developers who want to use tools on Windows-PC directly through the PowerShell environment, Qualcomm® AI Engine Direct provides tools based on x86_64-windows. Check the following prerequisites.

WSL platform

For WSL developers: The workflow on a Windows host is the same as on a Linux host, though some steps will require execution on WSL (x86) and others will be executed natively on Windows as outlined below. Because WSL is run on a GNU/Linux environment, model tools and libraries should get from x86_64-linux-clang respectively. To understand more about the WSL setup, visit Linux Platform Dependency.

Windows platform

For Windows native/x86_64 PC developers: The tools and libraries are located within the x86_64-windows-msvc folder. Tools are highly related to Python version and setting. Therefore, the environment setup for PowerShell is required before operation. Refer to the settings for the Windows Windows Platform Dependencies.

For Windows on Snapdragon developers: The tools and libraries are located within the aarch64-windows-msvc folder. The environment setup for PowerShell is required before operation. Refer to the settings for the Windows Windows Platform Dependencies.

  1. For OP Customization, the Op Package skeleton code is generated by running the Linux OpPackage Generator tool on WSL (x86).

  2. For Context Binary Generation, clients can use either the Linux Context Binary Generator tool on WSL (x86) or windows-native powershell. The tool executable and libraries used must be from the corresponding folder, as mentioned in Release Folder for Different Platforms.

  3. For Model Library Generation, the model library is produced by running the Windows Model Library Generator tool natively on Windows.

  4. Tools mentioned in the integration workflow can be applied through WSL or Windows native x86. The supporting list of the tools by platform is demonstrated in Tools.

  5. The ARM64X package format is supported for CPU and HTP backend on SC8380XP. The tools and libraries are located within the arm64x-windows-msvc folder. See ARM64X Tutorial for the usage and details.

Note

When using WSL, the Model Tools must be obtained from the linux folder as it is a subsystem for Linux.

Note

When run on Windows natively, the Model Library Generation Tool must be run with python. See Model Build on Windows Host section for an example.

Notes

1

Future feature.

2

The SOC Model number is intended to be used in QNN API calls, for example when configuring the QNN HTP backend.