QNN HTP-FP16 Op Package - Relu Op Example¶
Overview¶
The typical usage of HTP-FP16 ops is to accelerate QNN float32 graphs. These graphs generally have tensors of TensorType QNN_TENSOR_TYPE_APP_WRITE and QNN_TENSOR_TYPE_APP_READ (See Enum Qnn_TensorType_t) and DataType QNN_DATATYPE_FLOAT32 (See Enum Qnn_DataType_t). Furthermore, the input and output activation tensors defined by the operations of these graphs have DataType QNN_DATATYPE_FLOAT32 (See Enum Qnn_DataType_t).
The client is expected to set up QNN graphs with float32 tensors and the QNN HTP accelerator will finalize and execute those QNN graphs using float16 math. See general/htp_backend/qnn-htp-precision.
This document outlines how to write ops in QNN HTP-FP16 op package with a basic example of relu op. The source code for this example is located at examples/OpPackage/HTP/ExampleOpPackageReluFp16.cpp.
For detailed descriptions about writing op implementations, defining optimization rules, specifying op parameter orders, please read Implementing Ops. In additon, Optimization Grammar provides more information on defining optimization rules.
Writing Relu Op with HTP-FP16¶
ExampleOpPackageReluFp16.cpp contains a generic relu op and two specializations (relu1 and reluX).
Op Registration¶
Op implementation functions need to be registered with an op name, op cost and flags. Op registration can be achieved using HTP core macros listed below, and these macros should be placed in global scope in individual op implementation source files.
Registration with user specified cost value and flags.
Syntax
/*
* F - op implementation function
*
* OP - op name
*
* COST - pre-defined cost value names, one of GLACIAL, SNAIL, FAST, FREE
* (listed in descending order of value).
* Op implementation with relatively lower cost will be chosen given all
* other criteria are met.
*
* ... - zero or more flags, available flags include IS_CONST, INHIBIT_CONST_PROP,
* RESOURCE_HVX.
* IS_CONST is used to mark an op should be treated as a constant op.
* INHIBIT_CONST_PROP marks an op should not participate in constant propagation.
* RESOURCE_HVX marks this op will use HVX resources.
*/
DEF_PACKAGE_OP_AND_COST_AND_FLAGS(F,OP,COST,...)
Example
DEF_PACKAGE_OP_AND_COST_AND_FLAGS((reluImplFp<PlainFloat16Tensor>), “Relufp16”, FAST)
Optimization Rule Definition¶
In order to correctly handle operations using float16 math, op-writers are required to add a DEF_PACKAGE_OPTIMIZATION_WITH_FLAGS macro that is set up at GRAPH_CLEANUP priority to convert the appropriate float32 tensors to float16 tensors. This DEF_PACKAGE_OPTIMIZATION_WITH_FLAGS is essentially an optimization rule that is applied on the graph during the graph optimization phase (which happens during QnnGraph_finalize()). The purpose of this optimization rule is to insert a QNN_Cast float32 to float16 on the inputs to the operation and a QNN_Cast float16 to float32 on the outputs of the operation.
At a later pass of graph optimization, any sequence of QNN_Cast float16 to float32 followed by QNN_Cast float32 to float16 between consecutive FP ops are cancelled out. This results in a QNN graph with only QNN_Cast float32 to float16 for the graph inputs and a QNN_Cast float16 to float32 at graph outputs.
Syntax
/*
* PRIORITY - unsigned integer value, used for indicating optimization pass number,
* smaller number indicates earlier optimization pass.
* Predefined values include GRAPH_CLEANUP(0), EARLY(2000), MIDDLE(3000),
* LATE(4000).
*
* FLAGS - used to trigger all rules containing that flag.
* relaxed_precision_flag - if overall flag for relaxed precision is
* enabled all rules containing this flag will be triggered
*
* MATCHCODE - subgraph matching pattern which this optimization rule should apply on
*
* CONSTRAINTCODE - constraints applied to the match pattern
*
* REPLACECODE - new subgraph pattern which should replace the matching pattern if the
* constraints are met
*/
DEF_PACKAGE_OPTIMIZATION_WITH_FLAGS(PRIORITY,FLAGS,MATCHCODE,CONSTRAINTCODE,REPLACECODE)
Example
1DEF_PACKAGE_OPTIMIZATION_WITH_FLAGS(
2 GRAPH_CLEANUP, // priority
3 relaxed_precision_flag, // flag to ensure that this op should run relaxed float math i.e. float16 math
4 Op(QNN_OP_RELU, "In"), // matchcode
5
6 //constaintcode
7 AND(EQ(DTYPE_OF("In"), DType::Float32), EQ(DTYPE_OF("*"), DType::Float32)),
8
9 // replacecode
10 WITH_OUTPUT_TYPE(DType::Float32, 0, 1.0f,
11 Op(FROM_DEFAULT_PACKAGE("Cast"),
12 WITH_SIZE("*",
13 WITH_OUTPUT_TYPE(DType::Float16, 0, 1.0f,
14 Op(OP,
15 WITH_SIZE("In",
16 WITH_OUTPUT_TYPE(DType::Float16, 0, 1.0f,
17 Op(FROM_DEFAULT_PACKAGE("Cast"), "In")
18 )
19 )
20 )
21 )
22 )
23 )
24 )
25)