Using the qnn-op-package-generator

This section defines steps which demonstrate usage of the qnn-op-package-generator to generate skeleton code, along with makefiles for compilation, both of which are exercised in tandem to create a QNN Op Package shared library. The tools accepts an XML config input file that describes the package attributes, and produces a QNN Op Package directory structure.

Creating a QNN Op Package Skeleton

For the following section, we will assume that setup instructions have been run and the qnn-op-package-generator is accessible on the command line. 1 The first step to creating a package skeleton is to define an XML OpDef configuration file which describes package information such as the package name, version and domain, as well as the operations the package contains. The package info and operations are described with respect to a pre-defined XML schema, which primarily requires information about the operation’s inputs, outputs and parameters. For information on defining an XML Op Def, see XML OpDef Schema Breakdown.

Sample configs can also be found at Example XML Op Def Configs and in the SDK at:

${QNN_SDK_ROOT}/examples/QNN/OpPackageGenerator

Once an XML has been fully defined according to the spec, it can be passed as an argument to the tool using the –config_path or -p option. To generate multiple packages, the -p option can also be specified multiple times with a different config.

The tool can be run using a single XML config on the command line as:

On Linux

qnn-op-package-generator -p <QNN_SDK_ROOT>/examples/QNN/OpPackageGenerator/ExampleOpPackageHtp.xml -o <output_dir>

Note

The -p command line option can be specified multiple to generate different packages provided each package name is distinct. If the package name is not distinct, the tool will merge all ops defined in each config into a single package directory.

Directory Structure on Linux:

The example command in the previous section outputs a package skeleton directory called ExampleOpPackage at the specified output path. 2 In this context, the package name is ExampleOpPackage. 3 The package also contains two ops: Conv2D and Softmax.

The package directory tree is shown and expanded below:

|-- Makefile
|-- makefiles
    |-- Android.mk
    |-- Application.mk
|-- config
|   `-- ExampleOpPackage.xml
|-- include
`-- src
    |-- ExamplePackageInterface.cpp
    |-- utils
    `-- ops
        |-- Conv2D.cpp
        `-- Softmax.cpp
  • Makefile: This file contains make targets and rules to compile the package source files for various known architectures. Note that make commands are different across backends and that CPU targets require additional makefiles in the makefiles directory for android targets.

  • config: This directory contains all XML OpDef config(s) passed to the tool.

  • include:This directory is a placeholder for any additional include files the user may need to compile.

  • src: This directory contains generated source files for each operation defined in the config, as well as a generated interface source file.

    • ExamplePackageInterface: This file implements function pointers needed by the QNN API to load and execute a package. The file is always named <package_name>Interface.cpp. See Op Packages for more information about all other required function pointers.

    • utils: Helper utilities that enable ease of use across backends. Users should note that this directory will currently only be present for the CPU backend. The contents are left out here for brevity, and in general most users should not need to make changes to the files included.

    • ops: Each source file is named <op_name>.cpp. The source files implement API interface methods needed by a QNN Backend to enable op initialization, op destruction and kernel execution.

On Windows-x86

python ${QNN_SDK_ROOT}/bin/x86_64-windows-msvc/qnn-op-package-generator `
         -p ${QNN_SDK_ROOT}/examples/QNN/OpPackageGenerator/ReluOpPackageCpu.xml `
         -o <output_dir> `
         --gen_cmakelists

Note

Currently, the x86_64-windows-msvc platform only supports the CPU backend. qnn-op-package-generator must operate with python commands attached ahead and must use gen_cmakelists as an option to generate the CMakeLists.txt file that corresponds to the CMake build system within Developer PowerShell for VS 2022.

Directory Structure on Windows-x86:

The package directory tree is shown and explained below:

|-- CMakelists.txt
|-- config
|-- include
`-- src
    |-- CpuCustomOpPackage.cpp
    |-- ReluOpPackageInterface.cpp
    |-- ops
    `-- utils
  • CMakelists.txt: Contains rules, directives and basic commands in preparation for Window-x86 targets(.dll).

  • config: Contains XML configuration with OpDef configuration in use.

  • src/ops: Locates the .cpp files with basic API interface for that specific Op. The function body will need to be implemented by users.

Skeleton Code Overview

In this section, we will cover the two kinds of generated source files: interface and op-specific files. The interface need not require extra implementation in general, while the source files simply contain empty functions bodies that should be completed by users. The code used in this section references the generated package in Directory Structure. (Users on Windows-x86 platform please refer to Directory Structure Windows.)

Note

The content of the generated op-specific files may vary across backends.

Note

To have good performance and stability, it is required to avoid heap memory allocation in the completed op execution functions, that is, <op_name>Impl, <op_name>_executeOp, and execute functions for HTP, DSP, and CPU respectively which are executed during graph execution. The heap memory allocation includes but not limited to calling malloc, operator new, constructing STL container objects like std::vector with default allocator, and adding items like calling std::vector::push_back to STL container objects with default allocator.

The reason to avoid heap memory allocation is because the time to finish heap memory allocation is unbounded and may have huge variance. Especially for DSP and HTP, the heap memory allocation can trigger CPU request in some cases and significantly impact the inference speed. Also, the heap memory allocation can fail and return null pointers or throw exceptions. In such case, there is usually no good way to continue the execution. In applications with strict functional safety requirements, heap memory allocation after initialization is not even permitted.

If scratch buffer is required to carry out the op computation, here are some potential alternatives:

  • construct std::array instead of std::vector for local variables: Unlike std::vector, std::array uses stack memory. This works if the maximum memory size can be known in advance and the size is not large.

  • use output tensor space as scratch memory: Each execution function has at least one output tensor. You can use the space of the output tensor as the scratch buffer before you fill in the real output data. Please note that the output tensor space can only be safely written in the execution function which owns the output tensor.

Interface File

A snippet of the interface provider function obtained from the generated interface file is shown below:

1Qnn_ErrorHandle_t ExamplePackageInterfaceProvider(QnnOpPackage_Interface_t* interface) {
2   interface->interfaceVersion   = {1,3,0};
3   interface->v1_3.init          = ExamplePackageInit;
4   interface->v1_3.terminate     = ExamplePackageTerminate;
5   interface->v1_3.createKernels = ExamplePackageCreateKernels;
6   interface->v1_3.getInfo       = ExamplePackageGetInfo;
7......

The file contains generated functions in accordance with the Op Package API determined by the backend. The interface provider is needed for all backends as input to the qnn-net-run tool. The function on line 1 is always named <package_name>InterfaceProvider using the information obtained from the config at generation. Users should note that the package name must always be a valid C++ identifier, which means it can only contain alphanumeric characters and underscores. Additionally, all other functions following lines 5-7 are also prefixed with the package name.

Op Source Files

This section describes the source files generated by the tool for available backends. The example below highlights the output for the Conv2D op defined in the example config.

HTP Conv2D.cpp Example

 1/* execute functions for ops */
 2
 3template<typename TensorType,typename TensorType1>
 4GraphStatus conv2dImpl(TensorType& out_0,
 5                       const TensorType& in_0,
 6                       const TensorType& filter,
 7                       const TensorType1 &bias,
 8                       const Tensor& stride,
 9                       const Tensor& pad_amount,
10                       const Tensor& group,
11                       const Tensor& dilation) {
12  /*
13   * add code here
14   * */
15
16  return GraphStatus::Success;
17}
18
19__attribute__((unused)) static float conv2dCostFunc(const Op *op) {
20  /*
21  * add code here
22  * */
23
24  float cost = 0.0;  // add cost computation here
25  return cost;
26}

The function showed above is used by the QNN HTP Backend for execution and cost analysis. Both functions are always named as <op_name>Impl and <op_name>CostFunc respectively. Users should note that the op name must always be a valid C++ identifier, which means it can only contain alphanumeric characters and underscores. Each function is registered with the HTP backend using QNN HTP API macros, and should be completed by the user to enable accurate functionality. Each function to be completed has the add code here comment included in the function body. 4

Users should also note that the template types are deduced from the XML OpDef config to enable simple creation of multiple execution functions, and are not strictly required by the QNN HTP backend. Each function signature can be specialized at the user’s discretion.

Additionally, users should be aware of the DEF_PACKAGE_PARAM_ORDER macro that is auto-generated into the source code. Note that this macro is optional, and simply lists the order of parameters passed into execution functions and their corresponding default values if any. Importantly, users should note that all tensor and string parameters defined in this macro are always set to mandatory with a default null pointer value regardless of optionality . As such, users may need to manually change tensor param values to ensure accurate execution.

DSP Conv2D.cpp Example

 1Udo_ErrorType_t
 2conv2d_createOpFactory (QnnOpPackage_GlobalInfrastructure_t globalInfra,
 3   Udo_CoreType_t udoCoreType, void *perFactoryInfrastructure,
 4   Udo_String_t operationType, uint32_t numOfStaticParams,
 5   Udo_Param_t *staticParams, Udo_OpFactory_t *opFactory)
 6{
 7   if(operationType == NULL || opFactory == NULL) {
 8      return UDO_INVALID_ARGUMENT;
 9   }
10   if(strcmp(operationType, g_conv2dOpType) == 0) {
11      conv2dOpFactory_t* thisFactory = (conv2dOpFactory_t *)(*(globalInfra->dspGlobalInfra->hexNNv2Infra.udoMalloc))(sizeof(conv2dOpFactory_t));
12      int size = strlen(operationType) + 1; // +1 to hold the '\0' character
13      thisFactory->opType = (Udo_String_t)(*(globalInfra->dspGlobalInfra->hexNNv2Infra.udoMalloc))(size);
14      strlcpy((thisFactory->opType), operationType, size);
15      thisFactory->numOfStaticParams = numOfStaticParams;
16      /*
17       * if this op has static params, add code here
18       */
19      *opFactory = (Udo_OpFactory_t)thisFactory;
20   } else {
21      return UDO_INVALID_ARGUMENT;
22   }
23   return UDO_NO_ERROR;
24}
25
26Udo_ErrorType_t
27conv2d_releaseOpFactory(QnnOpPackage_GlobalInfrastructure_t globalInfra,
28                                             Udo_OpFactory_t opFactory)
29{
30   if(opFactory == NULL) {
31      return UDO_INVALID_ARGUMENT;
32   }
33   conv2dOpFactory_t* thisFactory = (conv2dOpFactory_t *)(opFactory);
34   (*(globalInfra->dspGlobalInfra->hexNNv2Infra.udoFree))((thisFactory->opType));
35   (*(globalInfra->dspGlobalInfra->hexNNv2Infra.udoFree))(thisFactory);
36   /*
37    * if this op has static params, add code here
38    */
39   return UDO_NO_ERROR;
40}
41
42Udo_ErrorType_t
43conv2d_validateOperation (Udo_String_t operationType, uint32_t numOfStaticParams,
44   const Udo_Param_t *staticParams) {
45   if(strcmp(operationType, g_conv2dOpType) == 0) {
46      if (numOfStaticParams != g_conv2dStaticParamsNum) {
47            return UDO_INVALID_ARGUMENT;
48      }
49      /*
50       * If this op should validate others, add code here
51       */
52   } else {
53      return UDO_INVALID_ARGUMENT;
54   }
55   return UDO_NO_ERROR;
56}
57
58Udo_ErrorType_t
59conv2d_executeOp (QnnOpPackage_GlobalInfrastructure_t globalInfra,
60   Udo_Operation_t operation, bool blocking, const uint32_t ID,
61   Udo_ExternalNotify_t notifyFunc) {
62   if(operation == NULL) {
63      return UDO_INVALID_ARGUMENT;
64   }
65   OpParams_t* m_Operation = (OpParams_t*) operation;
66   const char* opType = ((conv2dOpFactory_t*)(m_Operation->opFactory))->opType;
67   if(opType == NULL) {
68      return UDO_INVALID_ARGUMENT;
69   }
70   if(strcmp(opType, g_conv2dOpType) == 0) {
71      /*
72       * add code here
73       */
74      return UDO_NO_ERROR;
75   } else {
76      return UDO_INVALID_ARGUMENT;
77   }
78}

The function showed above is used by the QNN DSP Backend for createOpFactory, releaseOpFactory, validateOperation executeOp. These functions are always named as <op_name>_createOpFactory, <op_name>_releaseOpFactory, <op_name>_validateOperation and <op_name>_executeOp respectively. Users should note that the op name must always be a valid C++ identifier, which means it can only contain alphanumeric characters and underscores. Each function is used in the DSP backend, and should be completed by the user to enable accurate functionality. Each function to be completed has the add code here comment included in the function body.

 1typedef struct OpParams {
 2   Udo_OpFactory_t opFactory;
 3   uint32_t numInputParams;
 4   Udo_TensorParam_t *InputParams;
 5   uint32_t numOutputParams;
 6   Udo_TensorParam_t *outputParams;
 7   Udo_HexNNv2OpInfra_t opInfra;
 8} OpParams_t;
 9
10typedef struct conv2dOpFactory {
11   Udo_String_t opType;
12   uint32_t numOfStaticParams;
13   Udo_Param_t* staticParams;
14} conv2dOpFactory_t;

The conv2dOpFactory_t and OpParams_t are defined in include/DspOp.hpp.

CPU Conv2D.cpp Example

 1Qnn_ErrorHandle_t validateOpConfig(Qnn_OpConfig_t opConfig) {
 2   QNN_CUSTOM_BE_ENSURE_EQ(
 3       strcmp(opConfig.typeName, "Conv2D"), 0, QNN_OP_PACKAGE_ERROR_INVALID_ARGUMENT)
 4
 5   QNN_CUSTOM_BE_ENSURE_EQ(opConfig.numOfInputs, 3, QNN_OP_PACKAGE_ERROR_VALIDATION_FAILURE)
 6   QNN_CUSTOM_BE_ENSURE_EQ(opConfig.numOfOutputs, 1, QNN_OP_PACKAGE_ERROR_VALIDATION_FAILURE)
 7
 8   return QNN_SUCCESS;
 9}
10
11Qnn_ErrorHandle_t execute(CustomOp* operation) {
12   /**
13    * Add code here
14    **/
15
16  return QNN_SUCCESS;
17}
18
19CustomOpRegistration_t* register_Conv2DCustomOp() {
20   using namespace conv2d;
21   static CustomOpRegistration_t Conv2DRegister = {execute, finalize, free, validateOpConfig, populateFromNode};
22   return &Conv2DRegister;
23}
24
25REGISTER_OP(Conv2D, register_Conv2DCustomOp);

The registration structure shown above is associated with a custom op package object which is called indirectly by the interface functions shown in the previous section. The registration structure is defined below here and can also be located in <QNN_SDK_ROOT>/share/QNN/OpPackageGenerator/CustomOp/CustomOpRegister .hpp.

 1typedef struct _CustomOpRegistration_t {
 2   Qnn_ErrorHandle_t (*execute)(utils::CustomOp* operation);
 3   Qnn_ErrorHandle_t (*finalize)(const utils::CustomOp* operation);
 4   Qnn_ErrorHandle_t (*free)(utils::CustomOp& op);
 5
 6   QnnOpPackage_ValidateOpConfigFn_t validateOpConfig;
 7
 8   Qnn_ErrorHandle_t (*initialize)(const QnnOpPackage_Node_t opNode,
 9                                   QnnOpPackage_GraphInfrastructure_t graphInfrastructure,
10                                   utils::CustomOp* operation);
11} CustomOpRegistration_t;

The auto generated skeleton code contains free function definitions for the functions to be registered. Note that users can customize the behavior of these functions by completing the function body. Once the functions are fully defined, each registration function needs to be associated with an op package instance using the REGISTER_OP macro. The op package instance is a singleton that holds all registration structures and calls the appropriate function as directed by the QNN API (via the interface) based on the op type.

For example, the createKernels function pointer shown in the previous section triggers a call to the initialize function defined in the op package registration. Note that the manner in which each function is called can be easily observed either in the interface files or within the shared source code. Interested users are encouraged to explore the API for finer details. However, users should be aware that modifying the source code can adversely effect successful package loading and/or execution.

Lastly, users should note the CustomOp object shown above. This is a simply class that enables storage and retrieval of input, output and parameter data between the initialization, execution and finalization stages. This is a helper utility that users are free to modify to suit their needs. The utils are always included in the package once it is generated, and can also be found in <QNN_SDK_ROOT>/share/QNN/OpPackageGenerator/CustomOp/utils.

Compilation Instructions

The following sections describe compilation for each supported backend.

HTP Instructions

  1. The path to the QNN HTP include headers and the hexagon installation is set using:

    $ source <QNN_SDK_ROOT>/bin/x86_64-linux-clang/envsetup.sh
    $ source <HEXAGON_SDK_PATH>/setup_sdk_env.source
    

    (Optional) Users can export additional environment HEXAGON_TOOLS_VERSION, to overwrite the default hexagon tools version in the Makefile.

    Default HEXAGON_SDK_ROOT version:

    x86: QNN_HEXAGON_SDK_5.4.0 v68: QNN_HEXAGON_SDK_4.2.0 v69: QNN_HEXAGON_SDK_4.3.0 v73: QNN_HEXAGON_SDK_5.4.0 v75: QNN_HEXAGON_SDK_5.4.0

    HEXAGON_SDK_ROOT needs to be exported again if intend to make a different variant.

    Default HEXAGON_TOOLS_VERSION:

    x86: 8.6.02 v68: 8.4.09 v69: 8.5.03 v73: 8.6.02 v75: 8.7.03

  2. Required for x86: Ensure clang compiler is discoverable in your path, or set X86_CXX in your environment to point to a valid clang compiler path. 5

  3. Required for ARM prepare Both ARM and hexagon version of the op package should be compiled and registered for ARM prepare

  4. The package can then be compiled for a variety of targets using any of the following commands:

    • To generate both hexagon and linux targets:

      make all
      

      Note: “make all” includes htp_v68 as the default hexagon target. For v69 or above, user can replace htp_v68, or use following separate commands.

    • To generate hexagon target only:

      make htp_v68
      make htp_v69
      make htp_v73
      make htp_v75
      
    • To generate linux targets only:

      make htp_x86
      
    • To generate ARM targets only:

      make htp_aarch64
      

Following any make target selection in Step 4, a shared library would be produced at: <current_dir>/build/<target>/lib<package_name>.so.

DSP Instructions

  1. The path to the QNN DSP include headers and the hexagon installation is set using:

    $ source <QNN_SDK_ROOT>/bin/envsetup.sh
    $ source <HEXAGON_SDK_ROOT>/setup_sdk_env.source
    
  2. The package can then be compiled for DSP targets using the following command:

    $ make
    

After step 2, a shared library would be produced at: <current_dir>/build/DSP/libQnn<package_name>.so.

CPU Instructions

Linux Platform

  1. Setup the QNN environment:

    $ source <QNN_SDK_ROOT>/bin/envsetup.sh
    
  2. Required for x86: Ensure clang compiler is discoverable in your path, or set CXX in your environment to point to a valid clang compiler path. 5

  3. Required for android: Ensure android ndk-build compiler is discoverable in your path, and set ANDROID_NDK_ROOT to point to the location of the executable.

  4. Required for oe-targets: Ensure required oe-toolchains are pre-installed and discoverable in your path, and set ESDK_ROOT to point to the location of the toolchain. More details on installation can be found here.

  5. The package can then be compiled for a variety of targets using any of the following commands:

    • To generate both android and x86 targets:

      make all
      
    • To generate x86 target only:

      make cpu_x86
      
    • To generate android target only:

      make cpu_android
      
    • To generate aarch64-oe-linux-gcc11.2 target only:

      make cpu_oe-11.2
      
    • To generate aarch64-oe-linux-gcc9.3 target only:

      make cpu_oe-9.3
      

Following any make target selection in Step 3, a shared library would be produced at: <current_dir>/libs/<target>/lib<package_name>.so.

Note

  • Steps 1-2 can also be set manually in the makefiles or as a command line option to make without the use of scripts.

  • Step 3 can also be set manually or passed as an option to make.

Windows Platform

  1. Setup the QNN environment:

    After Windows Platform Dependencies have been satisfied, the user environment can be set with the provided envsetup.ps1 script.

    $ & "<QNN_SDK_ROOT>\bin\envsetup.ps1"
    
    • To generate for Windows-x86 targets. It must be executed within Developer PowerShell for VS 2022

    cmake -S . -B <build_dir> -A x64
    cd <build_dir>
    cmake --build . --config release
    

Note

When compiling the CPU Op Package for Windows, the user may select either the x64 or arm64 architecture using the following flag when invoking cmake: -A [x64 | arm64].

The above CMake operation for the Windows-x86 platform will produce the <package_name>.dll and <package_name>.lib files under the build/Release folder.

GPU Instructions

Linux Platform

  1. Setup the QNN environment:

    $ source <QNN_SDK_ROOT>/bin/envsetup.sh
    

    to point to a valid clang compiler path. 5

  2. Required for android: Ensure android ndk-build compiler is discoverable in your path, and set ANDROID_NDK_ROOT to point to the location of the executable.

  3. Required for oe-targets: Ensure required oe-toolchains are pre-installed and discoverable in your path, and set ESDK_ROOT to point to the location of the toolchain. More details on installation can be found here.

  4. The package can then be compiled for a variety of targets using any of the following commands:

    • To generate android target only:

      make gpu_android
      
    • To generate aarch64-oe-linux-gcc11.2 target only:

      make gpu_oe-11.2
      
    • To generate aarch64-oe-linux-gcc9.3 target only:

      make gpu_oe-9.3
      
    • make all only generates android target

Following any make target selection in Step 3, a shared library would be produced at: <current_dir>/libs/<target>/lib<package_name>.so.

Note

  • Steps 1-2 can also be set manually in the makefiles or as a command line option to make without the use of scripts.

  • Step 3 can also be set manually or passed as an option to make.

1

For instructions on QNN tool setup, see Setup.

2

If the directory already exists, the tool will only generate new files and attempt to append to existing files. To force a new package to be generated, use the –force-generation option.

3

The tool currently supports generation only for the HTP and CPU backend.

4

There may be additional functions generated for each op that may need to be completed, and macros that may need to specialized. Users should observe a generated package to see all generated functions and macros.

5(1,2,3)

The script <QNN_SDK_ROOT>/bin/check-linux-dependency.sh can also be used to download the appropriate clang version if it is not present.