Block Ops Definitions

This page documents detailed operator definitions for block ops supported by QAIRT tools.

Block ops use the special domain name qti_aisw.

Buffer

This block op accumulates inputs across inferences into a buffer of size buffer_size and outputs the buffer with the collected inputs. When the buffer is full the oldest existing inputs in the buffer are removed to make space for the incoming new input. The number of inputs to remove is determined by stride. The remaining inputs are shifted in the buffer to maintain the order they were received.

Signature
Buffer(input data: TensorT[batch, height, width, channel],
        input reset: Boolean,
        parameter buffer_size: uint32,
        parameter buffer_dim: uint32,
        parameter buffer_padding: uint32,
        parameter stride: uint32,
        parameter mode: enum(uint32)
        → output data: TensorT[batch, height, width, channel])
Inputs

Name

Description

inp: T (Tensor)

Input data to be accumulated.

  • Mandatory

  • Shape: A tensor of rank N

reset: Boolean

The reset input determines if the buffer should be reset.

When set to true all frames in the buffer are removed.

  • Optional

  • Default: False

  • Shape: 0D tensor containing scalar value

Outputs

Name

Description

out: T (Tensor)

Output after inputs are accumulated.

  • Mandatory

  • Shape: A tensor of rank N, same shape as inp except where dim[buffer_dim] is equal to buffer_size.

  • Constraints: Same datatype and rank as input

Attributes/Parameters

Name

Description

buffer_size: uint32

Determines the number of inputs that a buffer can store.

  • Mandatory

  • Shape: Scalar

  • Constraints: Must be evenly divisible by Shape(inp)[buffer_dim]

buffer_dim: uint32

Determines the dimension that inputs are accumulated on.

  • Mandatory

  • Shape: Scalar

  • Constraints: value must be in range [0, N-1]

buffer_padding: uint32

Determines the number of frames to pad with 0’s to the buffer initially or after reset.

  • Optional

  • Shape: Scalar

  • Default: BUFFER_PADDING_DEFAULT_VAL = 0

stride: uint32

Determines the number of inputs to remove from the buffer when the buffer is full to make space for the new incoming input.

The oldest existing inputs which reside at the beginning of the buffer are removed.

After removal the remaining inputs are kept in the order they were received.

  • Optional

  • Shape: Scalar

  • Default: BUFFER_STRIDE_DEFAULT_VAL = 1

  • Constraints: Value must be in range [1, buffer_size] and must be evenly divisible by Shape(inp)[buffer_dim]

mode: enum (uint32)

Determines blocking behavior. How the buffer is populated differs between the modes when the buffer is not full.

When the buffer is full eviction and population behavior is the same for all modes.

  • 0 – BLOCKING

    Execution is stopped on the existing branch of the graph if the buffer is not full.

    The buffer is populated from the beginning to the end.

    For example, an empty buffer with 3 slots (0,1,2) will be populated from slot 0 to slot 2.

  • 1 – NON_BLOCKING_LEFT

    The existing branch of the graph will always execute regardless if the buffer is full or not.

    The buffer is populated from the beginning to the end.

    For example, an empty buffer with 3 slots (0,1,2) will be populated from slot 0 to slot 2.

  • 2 – NON_BLOCKING_RIGHT

    The existing branch of the graph will always execute regardless if the buffer is full or not.

    The buffer is populated from the end.

    For example, an empty buffer with 3 slots (0,1,2) the first incoming input is placed at slot 2.

    For the next incoming input the previous input at slot 2 is now at slot 1 and the new input is placed at slot 2.

When the buffer is full the number of inputs removed is determined by stride.

Eviction behavior is the same for all modes where the oldest existing inputs are removed from the beginning of the buffer.

Population behavior is also the same for all modes when the buffer is full.

For example, a fully populated buffer with 3 slots (0,1,2) and a stride value of 2 will have the inputs at slot 0 and slot 1 removed and the input at slot 2 will now be at slot 0.

The incoming input is then placed at slot 1.

The next incoming input will then be placed at slot 2.

When the buffer is full again the same process is repeated.

Note

NON_BLOCKING_LEFT and NON_BLOCKING_RIGHT are zero filled for the output if the buffer is not completely full.

  • Optional

  • Shape: Scalar

  • Default: BUFFER_STRIDE_DEFAULT_VAL = BUFFER_MODE_BLOCKING

  • Values:

    • BUFFER_MODE_BLOCKING = 0,

    • BUFFER_MODE_NON_BLOCKING_LEFT = 1,

    • BUFFER_MODE_NON_BLOCKING_RIGHT = 2

Type Constraints

TypeName

Datatypes

T

tensor(float32)

Supported Backends
  • QNN-CPU

  • QNN-LPAI

  • QNN-HTP

Note

It is not feasible to mimic the Buffer block op in ONNX Runtime as it has blocking and stateful behavior. Hence, it is implemented as an identity op in ONNX Runtime and the output of this op may not match the expected output in ONNX Runtime.

Note

HTP backend only supports NON_BLOCKING modes of Buffer Op.

MaskedSoftmax

This block op applies a Softmax operation on masked portions of the input tensor. For each batch the mask tensor is broadcast on the input before softmax computation. A mask tensor must be provided in either an UNCOMPRESSED or COMPRESSED format depending on the parameter mode selected. See input mask for details on how a boolean mask can be converted to an UNCOMPRESSED or COMPRESSED mask tensor.

Signature
MaskedSoftmax(input data: TensorT[batch, height, width, channel],
            input mask: TensorT1[batch, M],
            parameter mode: enum(int)
            → output data: TensorT[batch, height, width, channel])
Inputs

Name

Description

data: T (Tensor)

Input data to be masked and softmax to be applied to.

  • Mandatory

  • Constraints: When parameter mode is COMPRESSED, width == channel

mask: T1 (Tensor)

The representation of this 2D tensor is determined by the parameter mode selected.

When parameter mode is set to UNCOMPRESSED, then M = channel or when set to COMPRESSED, then M = number of sequences.

  • Mandatory

  • Constraints: When parameter mode is set to COMPRESSED the sum of the values in each batch must be less than or equal to channel.

Consider a boolean mask where a mask value of 1 indicates the dimension on which Softmax should be performed and a mask value of 0 indicates the dimension that Softmax will not be performed. An uncompressed mask can be made from a boolean mask tensor by adding -1 or subtracting by 1 element-wise and multiplying the intermediate result element-wise by a large value.

mask = [[1,1,1,0,1]]
uncompressed_mask = (mask .+ -1) .* 10000
// uncompressed_mask = [[0,0,0,-10000,0]]

A compressed mask can be made from multiple boolean mask tensors of vector lengths that are concatenated into a single batch and summed across the 2nd axis. Consider the following:

Let there be 3 mask tensors that correspond to sequences of inputs that were used to make in[0] where 0’s represent where padding was added to make them the max sequence length.

mask1 = [1,0,0,0]
mask2 = [1,1,1,0]
mask3 = [1,1,1,1]

The concatenated mask would then be the following:

concatenated_mask = [
[1,0,0,0],
[1,1,1,0],
[1,1,1,1]]

The compressed mask representation would be made from summing across the 2nd axis:

compressed_mask = [[1,3,4]]
Outputs

Name

Description

out: T (Float32 Tensor)

Output after input is masked and softmax to be applied.

Attributes/Parameters

Name

Description

mode: enum(int)

Determines the format of the input mask. See input mask for details on the format of the mask tensor.

  • Optional

  • Default: UNCOMPRESSED

Type Constraints

TypeName

Datatypes

T

tensor(float32)

T1

tensor(int32)

Supported Backends
  • QNN-CPU

  • QNN-AIC

StatefulGRU

This Block op computes an one-layer GRU.

Equations:

\[ \begin{align}\begin{aligned}\begin{split}{\Large z_t = f(X_t*(Wz^T) + H_{t-1}*(Rz^{T}) + Wbz + Rbz)} \\ \\\end{split}\\\begin{split}{\Large r_t = f(X_t*(Wr^T) + H_{t-1}*(Rr^T) + Wbr + Rbr)} \\ \\\end{split}\\\begin{split}{\Large h_{t} = g(X_t*(Wh^T) + (r_t \odot H_{t-1})*(Rh^T) + Rbh + Wbh) \text{, When linear_before_reset = 0}} \\ \\\end{split}\\\begin{split}{\Large h_{t} = g(X_t*(Wh^T) + (r_t \odot (H_{t-1}*(Rh^T) + Rbh)) + Wbh) \text{, When linear_before_reset != 0}} \\ \\\end{split}\\\begin{split}{\Large H_t = (1 - z_t) \odot h_t + z_t \odot H_{t-1}} \\ \\\end{split}\end{aligned}\end{align} \]

Where,

  • \(X_{t}\) - input tensor

  • \(z_{t}\) - update gate

  • \(r_{t}\) - reset gate

  • \(h_{t}\) - hidden gate

  • \(t\) - time step (\(t-1\) means previous time step)

  • \(f\) - sigmoid activation function

  • \(g\) - tanh activation function

  • W[zrh] - W parameter weight matrix for update, reset, and hidden gates

  • R[zrh] - R recurrence weight matrix for update, reset, and hidden gates

  • Wb[zrh] - W bias vectors for update, reset, and hidden gates

  • Rb[zrh] - R bias vectors for update, reset, and hidden gates

  • WB[zrh] - W parameter weight matrix for backward update, reset, and hidden gates

  • RB[zrh] - R recurrence weight matrix for backward update, reset, and hidden gates

  • WBb[zrh] - W bias vectors for backward update, reset, and hidden gates

  • RBb[zrh] - R bias vectors for backward update, reset, and hidden gates

  • H - Hidden state

  • num_directions - 2 if direction == bidirectional else 1

  • \(\odot\) - element-wise product of two vectors.

Activation functions:

  • Tanh(x) - \((1 - e^{-2x})/(1 + e^{-2x})\)

  • Sigmoid(x) - \(1/(1 + e^{-x})\)

References:

ONNX: ops::GRU

Signature

StatefulGru(
        input X: TensorT[seq_length, batch_size, input_size],
        input W: TensorT[num_directions, 3*hidden_size, input_size],
        input R: TensorT[num_directions, 3*hidden_size, hidden_size],
        parameter hidden_size: int,
        input B: TensorT[num_directions, 6*hidden_size] = None,
        input sequence_lens: TensorT1[batch_size] = None,
        input initial_h: TensorT[num_directions, batch_size, hidden_size] = None,
        input reset: BOOL = False,
        parameter clip: float = None,
        parameter direction: str = "forward",
        parameter linear_before_reset: int = 0,
    )
    → output Y: TensorT[seq_length, num_directions, batch_size, hidden_size],
    output Y_h: TensorT[num_directions, batch_size, hidden_size]
Inputs

Name

Description

X: T (Tensor)

The input sequence.

  • Mandatory

  • Shape: a tensor of shape [seq_length, batch_size, input_size]

W: T (Tensor)

The weight tensor for the gates. Concatenation of W[zrh] and WB[zrh] (if bidirectional) along dimension 0.

  • Mandatory

  • Shape: a tensor of shape [num_directions, 3*hidden_size, input_size]

R: T (Tensor)

The recurrence weight tensor. Concatenation of R[zrh] and RB[zrh] (if bidirectional) along dimension 0.

  • Mandatory

  • Shape: a tensor of shape [num_directions, 3*hidden_size, hidden_size]

B: T (Tensor)

The bias tensor for input gate. Concatenation of [Wb[zrh], Rb[zrh]] and [WBb[zrh], RBb[zrh]] (if bidirectional) along dimension 0.

  • Optional: If not specified, values are assumed to be 0.

  • Shape: a tensor of shape [num_directions, 6*hidden_size]

sequence_lens : T1 (Tensor)

Optional tensor specifying lengths of the sequences in a batch.

  • Optional: If not specified, all sequences in the batch are assumed to have length seq_length

  • Shape: a tensor of shape [batch_size]

initial_h: T (Tensor)

Optional initial value of the hidden.

  • Optional: If not specified, values are assumed to be 0.

  • Shape: a tensor of shape [num_directions, batch_size, hidden_size]

reset: Boolean

Determines if the internal state should be reset from the beginning of an inference pass.

When set to true the internal state is reset by the input initial_h if it is provided, otherwise it is set to all zero values. When set to false the internal state is gotten from last step’s final_hidden tensor.

This input is used to indicate the reset of the internal state at the beginning of an inference pass across all batch elements at time-step 0.

  • Optional

  • Default: False

  • Shape: 0D tensor containing scalar value

Note

The reset input is ignored in ONNX Runtime and the behavior is same as Normal GRU.

Outputs

Name

Description

Y: T (Tensor)

A tensor that concats all the intermediate output values of the hidden.

  • Optional

  • Shape: a tensor of shape [seq_length, num_directions, batch_size, hidden_size].

Y_h: T (Tensor)

The last output value of the hidden.

  • Optional

  • Shape: a tensor of shape [num_directions, batch_size, hidden_size].

Attributes/Parameters

Name

Description

hidden_size: int

Number of neurons in the hidden layer

  • Mandatory

  • Shape: Scalar

clip: float

Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.

  • Optional

  • Shape: Scalar

direction: str

Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional.

  • Optional

  • Shape: Scalar

  • Default: “forward”

linear_before_reset: int

When computing the output of the hidden gate, apply the linear transformation before multiplying by the output of the reset gate.

  • Optional

  • Shape: Scalar

  • Default: 0

Type Constraints

TypeName

Datatypes

T

tensor(float32)

T1

tensor(int32)

Supported Backends
  • QNN-CPU

  • QNN-LPAI

  • QNN-HTP

StatefulLSTM

This Block op computes an one-layer LSTM.

Equations:

\[ \begin{align}\begin{aligned}\begin{split}{\Large i_t = f(X_t*(Wi^T) + H_{t-1}*(Ri^T) + Pi \odot C_{t-1} + Wbi + Rbi)} \\ \\\end{split}\\\begin{split}{\Large f_t = f(X_t*(Wf^T) + H_{t-1}*(Rf^T) + Pf \odot C_{t-1} + Wbf + Rbf)} \\ \\\end{split}\\\begin{split}{\Large c_t = g(X_t*(Wc^T) + H_{t-1}*(Rc^T) + Wbc + Rbc)} \\ \\\end{split}\\\begin{split}{\Large C_t = f_t \odot C_{t-1} + i_t \odot c_t} \\ \\\end{split}\\\begin{split}{\Large o_t = f(X_t*(Wo^T) + H_{t-1}*(Ro^T) + Po \odot C_t + Wbo + Rbo)} \\ \\\end{split}\\\begin{split}{\Large H_t = o_t \odot h(C_t)} \\ \\\end{split}\end{aligned}\end{align} \]

Where,

  • \(X_{t}\) - input tensor

  • \(i_{t}\) - input gate

  • \(o_{t}\) - output gate

  • \(f_{t}\) - forget gate

  • \(c_{t}\) - cell gate

  • \(t\) - time step (\(t-1\) means previous time step)

  • \(f\) - sigmoid activation function

  • \(g\) - tanh activation function

  • \(h\) - tanh activation function

  • W[iofc] - W parameter weight matrix for input, output, forget, and cell gates

  • R[iofc] - R recurrence weight matrix for input, output, forget, and cell gates

  • Wb[iofc] - W bias vectors for input, output, forget, and cell gates

  • Rb[iofc] - R bias vectors for input, output, forget, and cell gates

  • P[iof] - P peephole weight vector for input, output, and forget gates

  • WB[iofc] - W parameter weight matrix for backward input, output, forget, and cell gates

  • RB[iofc] - R recurrence weight matrix for backward input, output, forget, and cell gates

  • WBb[iofc] - W bias vectors for backward input, output, forget, and cell gates

  • RBb[iofc] - R bias vectors for backward input, output, forget, and cell gates

  • PB[iof] - P peephole weight vector for backward input, output, and forget gates

  • H - Hidden state

  • num_directions - 2 if direction == bidirectional else 1

  • \(\odot\) - element-wise product of two vectors.

Activation functions:

  • Tanh(x) - \((1 - e^{-2x})/(1 + e^{-2x})\)

  • Sigmoid(x) - \(1/(1 + e^{-x})\)

References:

ONNX: ops::LSTM

Signature
StatefulLstm(
        input X: TensorT[seq_length, batch_size, input_size],
        input W: TensorT[num_directions, 4*hidden_size, input_size],
        input R: TensorT[num_directions, 4*hidden_size, hidden_size],
        parameter hidden_size: int,
        input B: TensorT[num_directions, 8*hidden_size] = None,
        input sequence_lens: TensorT1[batch_size] = None,
        input initial_h: TensorT[num_directions, batch_size, hidden_size] = None,
        input initial_c: TensorT[num_directions, batch_size, hidden_size] = None,
        input P: TensorT[num_directions, 3*hidden_size] = None,
        input reset: BOOL = False,
        parameter clip: float = None,
        parameter direction: str = "forward",
        parameter input_forget: int = 0,
    )
    → output Y: TensorT[seq_length, num_directions, batch_size,hidden_size],
    output Y_h: TensorT[num_directions, batch_size, hidden_size],
    output Y_c: TensorT[num_directions, batch_size, hidden_size]
Inputs

Name

Description

X: T (Tensor)

The input sequence.

  • Mandatory

  • Shape: a tensor of shape [seq_length, batch_size, input_size]

W: T (Tensor)

The weight tensor for the gates. Concatenation of W[iofc] and WB[iofc] (if bidirectional) along dimension 0.

  • Mandatory

  • Shape: a tensor of shape [num_directions, 4*hidden_size, input_size]

R: T (Tensor)

The recurrence weight tensor. Concatenation of R[iofc] and RB[iofc] (if bidirectional) along dimension 0.

  • Mandatory

  • Shape: a tensor of shape [num_directions, 4*hidden_size, hidden_size]

B: T (Tensor)

The bias tensor for input gate. Concatenation of [Wb[iofc], Rb[iofc]], and [WBb[iofc], RBb[iofc]] (if bidirectional) along dimension 0.

  • Optional: If not specified - assumed to be 0.

  • Shape: a tensor of shape [num_directions, 8*hidden_size]

sequence_lens : T1 (Tensor)

Optional tensor specifying lengths of the sequences in a batch.

  • Optional: If not specified - assumed all sequences in the batch to have length seq_length

  • Shape: a tensor of shape [batch_size]

initial_h: T (Tensor)

Optional initial value of the hidden.

  • Optional: If not specified - assumed to be 0.

  • Shape: a tensor of shape [num_directions, batch_size, hidden_size]

initial_c: T (Tensor)

Optional initial value of the cell.

  • Optional: If not specified - assumed to be 0.

  • Shape: a tensor of shape [num_directions, batch_size, hidden_size]

P: T (Tensor)

The weight tensor for peepholes. Concatenation of P[iof] and PB[iof] (if bidirectional) along dimension 0.

  • Optional: If not specified - assumed to be 0.

  • Shape: a tensor of shape [num_directions, 3*hidden_size]

reset: Boolean

Determines if the internal cell and hidden state should be reset from the beginning of an inference pass.

When set to true the internal states are reset by the input initial_h and initial_c if they are provided, otherwise they are set to all zero values. When set to false the internal states are gotten from last step’s final_hidden and final_cell tensors.

This input is used to indicate the reset of the internal states at the beginning of an inference pass across all batch elements at time-step 0.

  • Optional

  • Default: False

  • Shape: 0D tensor containing scalar value

Note

The reset input is ignored in ONNX Runtime and the behavior is same as Normal LSTM.

Outputs

Name

Description

Y: T (Tensor)

A tensor that concats all the intermediate output values of the hidden.

  • Optional

  • Shape: a tensor of shape [seq_length, num_directions, batch_size, hidden_size].

Y_h: T (Tensor)

The last output value of the hidden.

  • Optional

  • Shape: a tensor of shape [num_directions, batch_size, hidden_size].

Y_c: T (Tensor)

The last output value of the cell.

  • Optional

  • Shape: a tensor of shape [seq_length, num_directions, batch_size, hidden_size].

Attributes/Parameters

Name

Description

hidden_size: int

Number of neurons in the hidden layer

  • Mandatory

  • Shape: Scalar

clip: float

Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.

  • Optional

  • Shape: Scalar

direction: str

Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional.

  • Optional

  • Shape: Scalar

  • Default: “forward”

input_forget: int

Couple the input and forget gates if 1.

  • Optional

  • Shape: Scalar

  • Default: 0

Type Constraints

TypeName

Datatypes

T

tensor(float32)

T1

tensor(int32)

Supported Backends
  • QNN-CPU

  • QNN-LPAI

  • QNN-HTP