Common terminology

This section provides some general Op definition and usage rules.

The QNN Ops are defined in QNN Operation Definitions.

Operation definition rules

For each operation:

  • Input tensors are defined in a list. Each has a designated index.

  • Output tensors are defined in a list. Each has a designated index.

  • Output tensors cannot be static.

  • Parameters are named and do not need to be ordered.

Each tensor has a data format that can be expressed via the field Qnn_TensorDataFormat_t in Qnn_Tensor_t. Generally, tensors are n-dimensional and formatted in memory as flat buffers, where the last dimension varies the fastest. The flat format of a tensor is supported by all backends, and each backend may optionally provide a list of other data formats that it supports. Backends advertise their specific support level details via general/api:Backend Supplements.

Input tensors, output tensors, and parameter properties are defined as follows:

  • Mandatory – true or false.

    • true means that the property is required and it cannot be omitted. This does not imply “must not be static”.

    • false means that the property is optional. If a tensor is optional, the client must mark the tensor type as QNN_TENSOR_TYPE_NULL, unless optional tensors are at the end of the list. If a parameter is optional, it is supported at least for the default value, and it can be omitted. If a tensor carries static data (such as weights and biases of an op), it must be marked to be of the QNN_TENSOR_TYPE_STATIC type. For any OpDef version, a backend will recognize all defined parameters and can choose to reject values it does not support, as documented in the backend specific Op definition supplement. The backend will error out when an unrecognized parameter name is provided.

  • Data type – one or more data types as defined by Qnn_DataType_t. Backend specific indicates that supported data types are identified in the backend specific Op definition supplement.

  • Shape – describes shape restrictions.

  • Default: – present for optional tensors or parameters.

Operations marked DEPRECATED are subject to removal in the future, after expiration of the deprecation period. The deprecation period is at least 60 days from announcement. Deprecation marking can be revoked.

Conventions

The following pseudo-functions are used throughout the operation definitions, and should be interpreted as indicated in this section.

bracket indexing

Bracket indexing (e.g., ‘in[i]’) refers to a member of a sequence, which may be the sequence of inputs or outputs of an operation, or the shape of a tensor. Indexing starts from 0.

floor(value)

The greatest integer less than or equal to value.

max(value1, value2, )

The greatest value amongst all arguments to the pseudo-function.

rank(tensor)

Refers to the rank, i.e., size of the runtime shape of the tensor.

shape(tensor)

Refers to the runtime shape of the tensor as a sequence of integer values.

size(tensor)

Refers to the number of elements in the tensor.

sum(var, expression)

The sum of a range of numerical values. var is used within expression to indicate the range from which values will be extracted, usually by indexing a collection established previously in the definition of the operation.

Element-wise operation rules

To perform element-wise operation on two input tensors, their dimensions must be compatible. Two dimensions are compatible when they are equal, or one of them is 1. The dimension that is equal to 1 is the one being broadcasted. The size of the output is the maximum size along each dimension of the input tensors.

Rank matching rules

For binary operations, the two input tensors do not need to have the same rank. The dimensions of the tensor with the lowest rank is 1-extended, starting with the trailing dimensions until the ranks are equal.

shape(in[0]) = [2, 5, 1, 3]
shape(in[1]) = [4, 1]
1-extended shape(in[1]) = [1, 1, 4, 1]
shape(out[0]) = [2, 5, 4, 3]

In some cases, only unidirectional broadcasting is allowed, e.g., from in[1] to in[0]. Refer to the backend supplement information for any possible backend specific limitations.

Operation configuration and tensor sizing notes

QNN graph nodes have fixed/static configuration, which means that no node configuration changes are allowed after node creation (i.e., after a graph has been finalized). Nodes are created/added to a graph based on a client-provided op configuration. The op configuration includes information about op parameters and shape, data, and tensor type of input/output tensors.

QNN tensor configuration provisions for dynamically-sized tensors. The Qnn_Tensor_t struct defines a dimensions field. During tensor creation and graph finalization, this field is interpreted as the tensor maximum dimensions. During graph execution, this field is interpreted as the current dimensions. The tensor current dimensions cannot exceed the tensor maximum dimensions. Dynamically-sized tensors are subject to QNN backend op support constraints.