Block Ops Definitions¶
This page documents detailed operator definitions for block ops supported by QAIRT tools.
Block ops use the special domain name qti_aisw.
Buffer¶
This block op accumulates inputs across inferences into a buffer of size buffer_size and outputs the buffer with the collected inputs. When the buffer is full the oldest existing inputs in the buffer are removed to make space for the incoming new input. The number of inputs to remove is determined by stride. The remaining inputs are shifted in the buffer to maintain the order they were received.
- Signature
Buffer(input data: TensorT[batch, height, width, channel], input reset: Boolean, parameter buffer_size: uint32, parameter buffer_dim: uint32, parameter buffer_padding: uint32, parameter stride: uint32, parameter mode: enum(uint32) → output data: TensorT[batch, height, width, channel])- Inputs
Name
Description
inp: T (Tensor)
Input data to be accumulated.
Mandatory
Shape: A tensor of rank N
reset: Boolean
The reset input determines if the buffer should be reset.
When set to true all frames in the buffer are removed.
Optional
Default: False
Shape: 0D tensor containing scalar value
- Outputs
Name
Description
out: T (Tensor)
Output after inputs are accumulated.
Mandatory
Shape: A tensor of rank N, same shape as inp except where dim[buffer_dim] is equal to buffer_size.
Constraints: Same datatype and rank as input
- Attributes/Parameters
Name
Description
buffer_size: uint32
Determines the number of inputs that a buffer can store.
Mandatory
Shape: Scalar
Constraints: Must be evenly divisible by Shape(inp)[buffer_dim]
buffer_dim: uint32
Determines the dimension that inputs are accumulated on.
Mandatory
Shape: Scalar
Constraints: value must be in range [0, N-1]
buffer_padding: uint32
Determines the number of frames to pad with 0’s to the buffer initially or after reset.
Optional
Shape: Scalar
Default: BUFFER_PADDING_DEFAULT_VAL = 0
stride: uint32
Determines the number of inputs to remove from the buffer when the buffer is full to make space for the new incoming input.
The oldest existing inputs which reside at the beginning of the buffer are removed.
After removal the remaining inputs are kept in the order they were received.
Optional
Shape: Scalar
Default: BUFFER_STRIDE_DEFAULT_VAL = 1
Constraints: Value must be in range [1, buffer_size] and must be evenly divisible by Shape(inp)[buffer_dim]
mode: enum (uint32)
Determines blocking behavior. How the buffer is populated differs between the modes when the buffer is not full.
When the buffer is full eviction and population behavior is the same for all modes.
0 – BLOCKING
Execution is stopped on the existing branch of the graph if the buffer is not full.
The buffer is populated from the beginning to the end.
For example, an empty buffer with 3 slots (0,1,2) will be populated from slot 0 to slot 2.
1 – NON_BLOCKING_LEFT
The existing branch of the graph will always execute regardless if the buffer is full or not.
The buffer is populated from the beginning to the end.
For example, an empty buffer with 3 slots (0,1,2) will be populated from slot 0 to slot 2.
2 – NON_BLOCKING_RIGHT
The existing branch of the graph will always execute regardless if the buffer is full or not.
The buffer is populated from the end.
For example, an empty buffer with 3 slots (0,1,2) the first incoming input is placed at slot 2.
For the next incoming input the previous input at slot 2 is now at slot 1 and the new input is placed at slot 2.
When the buffer is full the number of inputs removed is determined by stride.
Eviction behavior is the same for all modes where the oldest existing inputs are removed from the beginning of the buffer.
Population behavior is also the same for all modes when the buffer is full.
For example, a fully populated buffer with 3 slots (0,1,2) and a stride value of 2 will have the inputs at slot 0 and slot 1 removed and the input at slot 2 will now be at slot 0.
The incoming input is then placed at slot 1.
The next incoming input will then be placed at slot 2.
When the buffer is full again the same process is repeated.
Note
NON_BLOCKING_LEFT and NON_BLOCKING_RIGHT are zero filled for the output if the buffer is not completely full.
Optional
Shape: Scalar
Default: BUFFER_STRIDE_DEFAULT_VAL = BUFFER_MODE_BLOCKING
Values:
BUFFER_MODE_BLOCKING = 0,
BUFFER_MODE_NON_BLOCKING_LEFT = 1,
BUFFER_MODE_NON_BLOCKING_RIGHT = 2
- Type Constraints
TypeName
Datatypes
T
tensor(float32)
- Supported Backends
QNN-CPU
QNN-LPAI
QNN-HTP
Note
It is not feasible to mimic the Buffer block op in ONNX Runtime as it has blocking and stateful behavior. Hence, it is implemented as an identity op in ONNX Runtime and the output of this op may not match the expected output in ONNX Runtime.
Note
HTP backend only supports NON_BLOCKING modes of Buffer Op.
MaskedSoftmax¶
This block op applies a Softmax operation on masked portions of the input tensor. For each batch the mask tensor is broadcast on the input before softmax computation. A mask tensor must be provided in either an UNCOMPRESSED or COMPRESSED format depending on the parameter mode selected. See input mask for details on how a boolean mask can be converted to an UNCOMPRESSED or COMPRESSED mask tensor.
- Signature
MaskedSoftmax(input data: TensorT[batch, height, width, channel], input mask: TensorT1[batch, M], parameter mode: enum(int) → output data: TensorT[batch, height, width, channel])- Inputs
Name
Description
data: T (Tensor)
Input data to be masked and softmax to be applied to.
Mandatory
Constraints: When parameter mode is COMPRESSED, width == channel
mask: T1 (Tensor)
The representation of this 2D tensor is determined by the parameter mode selected.
When parameter mode is set to UNCOMPRESSED, then M = channel or when set to COMPRESSED, then M = number of sequences.
Mandatory
Constraints: When parameter mode is set to COMPRESSED the sum of the values in each batch must be less than or equal to channel.
Consider a boolean mask where a mask value of 1 indicates the dimension on which Softmax should be performed and a mask value of 0 indicates the dimension that Softmax will not be performed. An uncompressed mask can be made from a boolean mask tensor by adding -1 or subtracting by 1 element-wise and multiplying the intermediate result element-wise by a large value.
mask = [[1,1,1,0,1]] uncompressed_mask = (mask .+ -1) .* 10000 // uncompressed_mask = [[0,0,0,-10000,0]]
A compressed mask can be made from multiple boolean mask tensors of vector lengths that are concatenated into a single batch and summed across the 2nd axis. Consider the following:
Let there be 3 mask tensors that correspond to sequences of inputs that were used to make in[0] where 0’s represent where padding was added to make them the max sequence length.
mask1 = [1,0,0,0] mask2 = [1,1,1,0] mask3 = [1,1,1,1]
The concatenated mask would then be the following:
concatenated_mask = [ [1,0,0,0], [1,1,1,0], [1,1,1,1]]
The compressed mask representation would be made from summing across the 2nd axis:
compressed_mask = [[1,3,4]]
- Outputs
Name
Description
out: T (Float32 Tensor)
Output after input is masked and softmax to be applied.
- Attributes/Parameters
Name
Description
mode: enum(int)
Determines the format of the input mask. See input mask for details on the format of the mask tensor.
Optional
Default: UNCOMPRESSED
- Type Constraints
TypeName
Datatypes
T
tensor(float32)
T1
tensor(int32)
- Supported Backends
QNN-CPU
QNN-AIC
StatefulGRU¶
This Block op computes an one-layer GRU.
Equations:
Where,
\(X_{t}\) - input tensor
\(z_{t}\) - update gate
\(r_{t}\) - reset gate
\(h_{t}\) - hidden gate
\(t\) - time step (\(t-1\) means previous time step)
\(f\) - sigmoid activation function
\(g\) - tanh activation function
W[zrh] - W parameter weight matrix for update, reset, and hidden gates
R[zrh] - R recurrence weight matrix for update, reset, and hidden gates
Wb[zrh] - W bias vectors for update, reset, and hidden gates
Rb[zrh] - R bias vectors for update, reset, and hidden gates
WB[zrh] - W parameter weight matrix for backward update, reset, and hidden gates
RB[zrh] - R recurrence weight matrix for backward update, reset, and hidden gates
WBb[zrh] - W bias vectors for backward update, reset, and hidden gates
RBb[zrh] - R bias vectors for backward update, reset, and hidden gates
H - Hidden state
num_directions - 2 if direction == bidirectional else 1
\(\odot\) - element-wise product of two vectors.
Activation functions:
Tanh(x) - \((1 - e^{-2x})/(1 + e^{-2x})\)
Sigmoid(x) - \(1/(1 + e^{-x})\)
References:
ONNX: ops::GRU
Signature
StatefulGru(
input X: TensorT[seq_length, batch_size, input_size],
input W: TensorT[num_directions, 3*hidden_size, input_size],
input R: TensorT[num_directions, 3*hidden_size, hidden_size],
parameter hidden_size: int,
input B: TensorT[num_directions, 6*hidden_size] = None,
input sequence_lens: TensorT1[batch_size] = None,
input initial_h: TensorT[num_directions, batch_size, hidden_size] = None,
input reset: BOOL = False,
parameter clip: float = None,
parameter direction: str = "forward",
parameter linear_before_reset: int = 0,
)
→ output Y: TensorT[seq_length, num_directions, batch_size, hidden_size],
output Y_h: TensorT[num_directions, batch_size, hidden_size]
- Inputs
Name
Description
X: T (Tensor)
The input sequence.
Mandatory
Shape: a tensor of shape [seq_length, batch_size, input_size]
W: T (Tensor)
The weight tensor for the gates. Concatenation of W[zrh] and WB[zrh] (if bidirectional) along dimension 0.
Mandatory
Shape: a tensor of shape [num_directions, 3*hidden_size, input_size]
R: T (Tensor)
The recurrence weight tensor. Concatenation of R[zrh] and RB[zrh] (if bidirectional) along dimension 0.
Mandatory
Shape: a tensor of shape [num_directions, 3*hidden_size, hidden_size]
B: T (Tensor)
The bias tensor for input gate. Concatenation of [Wb[zrh], Rb[zrh]] and [WBb[zrh], RBb[zrh]] (if bidirectional) along dimension 0.
Optional: If not specified, values are assumed to be 0.
Shape: a tensor of shape [num_directions, 6*hidden_size]
sequence_lens : T1 (Tensor)
Optional tensor specifying lengths of the sequences in a batch.
Optional: If not specified, all sequences in the batch are assumed to have length seq_length
Shape: a tensor of shape [batch_size]
initial_h: T (Tensor)
Optional initial value of the hidden.
Optional: If not specified, values are assumed to be 0.
Shape: a tensor of shape [num_directions, batch_size, hidden_size]
reset: Boolean
Determines if the internal state should be reset from the beginning of an inference pass.
When set to true the internal state is reset by the input initial_h if it is provided, otherwise it is set to all zero values. When set to false the internal state is gotten from last step’s final_hidden tensor.
This input is used to indicate the reset of the internal state at the beginning of an inference pass across all batch elements at time-step 0.
Optional
Default: False
Shape: 0D tensor containing scalar value
Note
The reset input is ignored in ONNX Runtime and the behavior is same as Normal GRU.
- Outputs
Name
Description
Y: T (Tensor)
A tensor that concats all the intermediate output values of the hidden.
Optional
Shape: a tensor of shape [seq_length, num_directions, batch_size, hidden_size].
Y_h: T (Tensor)
The last output value of the hidden.
Optional
Shape: a tensor of shape [num_directions, batch_size, hidden_size].
- Attributes/Parameters
Name
Description
hidden_size: int
Number of neurons in the hidden layer
Mandatory
Shape: Scalar
clip: float
Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.
Optional
Shape: Scalar
direction: str
Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional.
Optional
Shape: Scalar
Default: “forward”
linear_before_reset: int
When computing the output of the hidden gate, apply the linear transformation before multiplying by the output of the reset gate.
Optional
Shape: Scalar
Default: 0
- Type Constraints
TypeName
Datatypes
T
tensor(float32)
T1
tensor(int32)
- Supported Backends
QNN-CPU
QNN-LPAI
QNN-HTP
StatefulLSTM¶
This Block op computes an one-layer LSTM.
Equations:
Where,
\(X_{t}\) - input tensor
\(i_{t}\) - input gate
\(o_{t}\) - output gate
\(f_{t}\) - forget gate
\(c_{t}\) - cell gate
\(t\) - time step (\(t-1\) means previous time step)
\(f\) - sigmoid activation function
\(g\) - tanh activation function
\(h\) - tanh activation function
W[iofc] - W parameter weight matrix for input, output, forget, and cell gates
R[iofc] - R recurrence weight matrix for input, output, forget, and cell gates
Wb[iofc] - W bias vectors for input, output, forget, and cell gates
Rb[iofc] - R bias vectors for input, output, forget, and cell gates
P[iof] - P peephole weight vector for input, output, and forget gates
WB[iofc] - W parameter weight matrix for backward input, output, forget, and cell gates
RB[iofc] - R recurrence weight matrix for backward input, output, forget, and cell gates
WBb[iofc] - W bias vectors for backward input, output, forget, and cell gates
RBb[iofc] - R bias vectors for backward input, output, forget, and cell gates
PB[iof] - P peephole weight vector for backward input, output, and forget gates
H - Hidden state
num_directions - 2 if direction == bidirectional else 1
\(\odot\) - element-wise product of two vectors.
Activation functions:
Tanh(x) - \((1 - e^{-2x})/(1 + e^{-2x})\)
Sigmoid(x) - \(1/(1 + e^{-x})\)
References:
ONNX: ops::LSTM
- Signature
StatefulLstm( input X: TensorT[seq_length, batch_size, input_size], input W: TensorT[num_directions, 4*hidden_size, input_size], input R: TensorT[num_directions, 4*hidden_size, hidden_size], parameter hidden_size: int, input B: TensorT[num_directions, 8*hidden_size] = None, input sequence_lens: TensorT1[batch_size] = None, input initial_h: TensorT[num_directions, batch_size, hidden_size] = None, input initial_c: TensorT[num_directions, batch_size, hidden_size] = None, input P: TensorT[num_directions, 3*hidden_size] = None, input reset: BOOL = False, parameter clip: float = None, parameter direction: str = "forward", parameter input_forget: int = 0, ) → output Y: TensorT[seq_length, num_directions, batch_size,hidden_size], output Y_h: TensorT[num_directions, batch_size, hidden_size], output Y_c: TensorT[num_directions, batch_size, hidden_size]- Inputs
Name
Description
X: T (Tensor)
The input sequence.
Mandatory
Shape: a tensor of shape [seq_length, batch_size, input_size]
W: T (Tensor)
The weight tensor for the gates. Concatenation of W[iofc] and WB[iofc] (if bidirectional) along dimension 0.
Mandatory
Shape: a tensor of shape [num_directions, 4*hidden_size, input_size]
R: T (Tensor)
The recurrence weight tensor. Concatenation of R[iofc] and RB[iofc] (if bidirectional) along dimension 0.
Mandatory
Shape: a tensor of shape [num_directions, 4*hidden_size, hidden_size]
B: T (Tensor)
The bias tensor for input gate. Concatenation of [Wb[iofc], Rb[iofc]], and [WBb[iofc], RBb[iofc]] (if bidirectional) along dimension 0.
Optional: If not specified - assumed to be 0.
Shape: a tensor of shape [num_directions, 8*hidden_size]
sequence_lens : T1 (Tensor)
Optional tensor specifying lengths of the sequences in a batch.
Optional: If not specified - assumed all sequences in the batch to have length seq_length
Shape: a tensor of shape [batch_size]
initial_h: T (Tensor)
Optional initial value of the hidden.
Optional: If not specified - assumed to be 0.
Shape: a tensor of shape [num_directions, batch_size, hidden_size]
initial_c: T (Tensor)
Optional initial value of the cell.
Optional: If not specified - assumed to be 0.
Shape: a tensor of shape [num_directions, batch_size, hidden_size]
P: T (Tensor)
The weight tensor for peepholes. Concatenation of P[iof] and PB[iof] (if bidirectional) along dimension 0.
Optional: If not specified - assumed to be 0.
Shape: a tensor of shape [num_directions, 3*hidden_size]
reset: Boolean
Determines if the internal cell and hidden state should be reset from the beginning of an inference pass.
When set to true the internal states are reset by the input initial_h and initial_c if they are provided, otherwise they are set to all zero values. When set to false the internal states are gotten from last step’s final_hidden and final_cell tensors.
This input is used to indicate the reset of the internal states at the beginning of an inference pass across all batch elements at time-step 0.
Optional
Default: False
Shape: 0D tensor containing scalar value
Note
The reset input is ignored in ONNX Runtime and the behavior is same as Normal LSTM.
- Outputs
Name
Description
Y: T (Tensor)
A tensor that concats all the intermediate output values of the hidden.
Optional
Shape: a tensor of shape [seq_length, num_directions, batch_size, hidden_size].
Y_h: T (Tensor)
The last output value of the hidden.
Optional
Shape: a tensor of shape [num_directions, batch_size, hidden_size].
Y_c: T (Tensor)
The last output value of the cell.
Optional
Shape: a tensor of shape [seq_length, num_directions, batch_size, hidden_size].
- Attributes/Parameters
Name
Description
hidden_size: int
Number of neurons in the hidden layer
Mandatory
Shape: Scalar
clip: float
Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.
Optional
Shape: Scalar
direction: str
Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional.
Optional
Shape: Scalar
Default: “forward”
input_forget: int
Couple the input and forget gates if 1.
Optional
Shape: Scalar
Default: 0
- Type Constraints
TypeName
Datatypes
T
tensor(float32)
T1
tensor(int32)
- Supported Backends
QNN-CPU
QNN-LPAI
QNN-HTP