Writing QNN HTP Op Package¶
Writing QNN HTP op packages can be divided into two major sections:
writing op package interface
writing op implementations and optimization rules
In the QNN HTP op package example located at examples/OpPackage/HTP/ in QNN SDK, these sections are divided into separate source files:
One op package interface file is needed per package: ExampleOpPackageInterface.cpp
Any number of individual op implementation files can be included: ExampleOpPackageRelu.cpp, ExampleOpPackageSoftmax.cpp, ExampleOpPackageMaxPool.cpp
This document explains the general rules and approaches to compose QNN HTP op packages, and it focuses on op package interface section, and it uses ExampleOpPackageInterface.cpp as the example file. For descriptions on how to write op implementations and optimization rules, please refer to other documents located at the same locations. For descriptions on ExampleOpPackageRelu.cpp, please refer to relu_example.html.
ExampleOpPackageInterface.cpp can be used as a template for any QNN HTP op package.
API’s in QnnOpPackage.h¶
To write a QNN op package, please exam QnnOpPackage.h located at include/ directory in QNN SDK first for data structure and API definitions. ExampleOpPackageInterface.cpp defines HTP specific op package structures and QNN op package API functions listed in QnnOpPackage.h. Currently HTP only supports and requires the following API’s:
typedef Qnn_ErrorHandle_t (*QnnOpPackage_InterfaceProvider_t)(QnnOpPackage_Interface_t* interface);
typedef Qnn_ErrorHandle_t (*QnnOpPackage_InitFn_t)(QnnOpPackage_GlobalInfrastructure_t infrastructure);
typedef Qnn_ErrorHandle_t (*QnnOpPackage_GetInfoFn_t)(const QnnOpPackage_Info_t** info);
typedef Qnn_ErrorHandle_t (*QnnOpPackage_LogInitializeFn_t)(QnnLog_Callback_t callback, QnnLog_Level_t maxLogLevel);
typedef Qnn_ErrorHandle_t (*QnnOpPackage_LogSetLevelFn_t)(QnnLog_Level_t maxLogLevel);
typedef Qnn_ErrorHandle_t (*QnnOpPackage_LogTerminateFn_t)(void);
typedef Qnn_ErrorHandle_t (*QnnOpPackage_TerminateFn_t)();
Aside: Following structs listed in QnnOpPackage.h are not defined or used in HTP backend :
QnnOpPackage_GlobalInfrastructure_t,QnnOpPackage_GraphInfrastructure_t,QnnOpPackage_Kernel_t,QnnOpPackage_Node_t,QnnOpPackage_Optimization_t
HTP Headers¶
HTP core provides a list of headers which HTP op packages can and shall use to implement ops, to define optimization rules, to register ops and optimization rules, to specify op parameter orders and more. These headers can be found in QNN SDK under include/HTP/core/, and more info can be found in htp_core_headers.html. Please directly include the following headers in HTP op package source files.
#include "QnnOpPackage.h"
#include "constraints.h"
#include "op_package_feature_support.h"
#include "op_register_ext.h"
#include "optimize.h"
#include "HTP/core/simple_reg.h"
#include "HTP/core/unique_types.h"
HTP Macros¶
HTP op package shall utilize a series of macros defined in HTP core headers to register ops, to define and register optimization rules, to list op parameter orders, to list axis parameters, and to list per-channel scale ops.
Op registration
Besides defining op execution functions for each op package op, HTP also requires op packages to register the ops.
/*
* op initialization
*
* needs to be global in the package
* one initialization per package before any op definitions
*/
INIT_PACKAGE_OP_DEF()
/*
* op registration
*
* unified core init function containing ops/opts registration
* needs to be global in the package
*/
INIT_PKG_CORE_INIT_FUNC()
/*
* op definition
*
* shall be used in individual op implementation source files
* needs to be global in the package
* one definition per op
* please refer to implementing_ops.html for more descriptions
*/
DEF_PACKAGE_OP(F,OP)
DEF_PACKAGE_OP_AND_COST_AND_FLAGS(F,OP,COST,...)
DEF_PACKAGE_OP_AND_COST_F_AND_FLAGS(F,OP,COST_F,...)
Set tensor properties
Define op tensor properties to centralize the decision-making on the Layout and Memory Placement of our tensors
DEF_TENSOR_PROPERTIES(Op("Op", "in", "in2"),
Flat("*", "in2"),
MainMemory("..."))
Constraint terms:¶
Flat: flat layout
Crouton: crouton layout
Tcm: in TCM
MainMemory: in main memory.
Specify the requirements for operators Optimization rule registration
Op packages can define optimization rules which alter HTP graphs in desired ways. For more info, please refer to optimization_grammar.html.
/*
* optimization initialization
*
* needs to be global in the package
* one initialization per package before any optimization definitions
*/
INIT_PACKAGE_OPTIMIZATION_DEF()
/*
* optimization definition
*
* shall be used in individual op implementation source files
* needs to be global in the package
* one definition per optimization
* please refer to implementing_ops.html for more descriptions
*/
DEF_PACKAGE_OPTIMIZATION(PRIORITY,MATCHCODE,CONSTRAINTCODE,REPLACECODE)
Op parameter order
Op packages can specify parameter orders and define default parameter values for any ops defined in the current package. This is optional in op packages, if no parameter order has been specified for an op, then the parameter order depends on the order provided at QnnGraph_addNode.
/*
* op parameter order initialization
*
* needs to be global in the package
* one initialization per package before any op parameter order definitions
*/
INIT_PACKAGE_PARAM_ORDER_DEF()
/*
* op parameter order registration
*
* needs to placed in op package initialization function
* registers all defined op parameter orders in the package
*/
REGISTER_PACKAGE_PARAM_ORDERS()
/*
* op parameter order definition
*
* shall be used in individual op implementation source files
* needs to be global in the package
* one definition per op, and this is optional
* please refer to implementing_ops.html for more descriptions
*/
DEF_PACKAGE_PARAM_ORDER(OP,PARAM1,MANDATORY1,DEFAULT1,PARAM2,MANDATORY2,DEFAULT2...)
Axis parameter (optional)
HTP currently only supports 4-dimensional tensors, however, QNN supports tensors of any dimensions, so HTP backend does tensor dimension backfilling. Due to this discrepancy, any op parameters that refer to axes need to be adjusted in HTP backend. HTP allows op packages to list the op parameter names which should be identified as axis parameter, and HTP backend does auto-adjustment to fit axis into 4-dimensional tensors. This is optional in op packages.
/*
* axis parameter name list
*
* optional
* needs to be global in the package
* one list per package
* e.g. LIST_PACKAGE_AXIS_PARAMS("Axis", "AXIS", "axis")
*/
LIST_PACKAGE_AXIS_PARAMS(...)
/*
* op axis parameter name registration
*
* optional
* uses with LIST_PACKAGE_AXIS_PARAMS(...)
* needs to placed in op package initialization function
* registers all axis parameter names in the package
*/
REGISTER_PACKAGE_AXIS_PARAMS()
Per-channel scale op (optional)
HTP supports usage of per-axis quantized tensors in op packages, however, the support limits to per-channel scale tensors only. That means the axis used in per-axis quantized tensors can only be along the last dimension, and the quantization offset value can only be zero. In QNN tensors, per-axis quantization info is packed into the tensors if QNN_QUANTIZATION_ENCODING_AXIS_SCALE_OFFSET is set as the quantization encoding type. In HTP backend, per-channel scale values come in as separate HTP tensors, in addition to the HTP tensors containing data values.
Op packages shall list the op names which support per-channel scale tensors using the macros shown below. For ops listed using the macros below, when any of the input tensors, parameters, and output tensors used in QnnGraph_addNode has QNN_QUANTIZATION_ENCODING_AXIS_SCALE_OFFSET quantization encoding type, HTP backend passes additional HTP tensors to op execution functions to represent per-channel scale values. For regular ops, HTP tensors passed into op execution functions are outputs, inputs, parameters. For per-channel scale ops, HTP tensors passed into op execution functions are outputs, inputs, parameters, output per- channel scale values, input per-channel scale values, parameter per-channel scale values. HTP fills default per-channel scale value of 1 for any non-QNN_QUANTIZATION_ENCODING_AXIS_SCALE_OFFSET tensors in the above case. Per-channel scale values of an output or an input or a parameter in op implementation functions can be accessed by tensor(0,0,0,d), given tensor being the corresponding per-channel scale value tensor, and given d being the channel of interest.
/*
* per-channel quantized op name list
*
* optional
* needs to be global in the package
* one list per package
*/
LIST_PACKAGE_PER_CHANNEL_QUANTIZED_OPS(...)
/*
* per-channel scale op name registration
*
* optional
* uses with LIST_PACKAGE_PER_CHANNEL_QUANTIZED_OPS(...)
* needs to placed in op package initialization function
* registers all per-channel scale op names in the package
*/
REGISTER_PACKAGE_PER_CHANNEL_QUANTIZED_OPS()
Compilation¶
Name of a QNN HTP op package is independent from source code, and the name must be passed as a compilation flag using the following format:
-DTHIS_PKG_NAME=YOUR_PACKAGE_NAME
Current QNN HTP op packages are required to be re-compiled after every QNN SDK release.
For example makefile, please refer to makefile located at examples/OpPackage/HTP/ in QNN SDK.