Struct Qnn_BwScaleOffset_t¶
Defined in File QnnTypes.h
Struct Documentation¶
-
struct Qnn_BwScaleOffset_t¶
A struct to express quantization parameters as a positive scale with a zero offset and a bitwidth.
float_value = (quantized_value + offset) * scale
bitwidth must be > 0, and is used to express the true number of bits used to quantize the value, which may be different from the bitwidth of the tensor indicated by its data type. For example: the quantization encoding for a tensor of type QNN_DATATYPE_UFIXED_POINT_8 that is quantized to 4-bit precision may be expressed by setting bitwidth = 4. In such circumstances, data quantized to a lower precision will still occupy the full extent of bits allotted to the tensor as per its data type in unpacked form.
The datatype used must be the smallest type which can accommodate the bitwidth. For example: a tensor quantized to 4-bit precision must use an 8-bit datatype, 16-bit or larger datatypes are not permitted.
Tensor elements are expected to occupy the least significant bits of the total size alloted to the datatype, and all bits above the specified bitwidth will be ignored. For example: an 8-bit datatype tensor quantized to 4-bit precision will be interpreted as a 4-bit value contained in the lower 4 bits of each element, and the upper 4 bits will be ignored. For signed datatypes, the value will be interpreted as a two’s complement integer where the signed bit is the most significant bit permitted by the specified bitwidth. For example: -3 would be represented as 0b11111101 as a signed 8-bit integer, but can also be represented as 0b00001101 as a signed 4-bit integer stored in an 8-bit container. Either of these representations are valid to express -3 as a 4-bit signed integer in an 8-bit container, and will be treated identically because the upper 4 bits will be ignored.