INT4 encodings for weights¶
We have added support for int4 weight encodings for several ops and plan to support few more in the next releases. The details of the ops which support int4 are given in the table below. There are few things to keep in mind when using int4:
Weights should be static/ const data.
The weight values must satisfy the following relation:
\[-8 <= INT4\_Weight\_Value-Offset <= 7\]
Software supports both per-tensor and per-channel quantization for INT4 for these ops
Following is a table listing Ops and Op configurations which will have INT4 benefits
Op name |
Op configurations having power benefits |
Op configurations having IPS (latency) benefits. (Note latency benefits arise solely from the lower transfer of data to/fro VTCM due to smaller weights) |
|---|---|---|
Conv2D (1x1 only) |
out_channels > 32 and stride 1x1 and filter_height = 1 and filter_width = 1 |
out_channels > 32 and stride 1x1 and filter_height = 1 and filter_width = 1 |
Fully Connected |
out_channels > 32 |
out_channels > 32 (*See Note above) |
MatMul |
out_channels > 32 |
out_channels > 32 (*See Note above) |