Choice of Activation functions¶
The use of simple piecewise linear or quadratic activation functions can significantly improve the performance while having negligible effect on accuracy. Some of the examples are listed below:
ReLU is the best choice for an activation function as:
it is simplest to compute and
it generates lots of 0s in the result which reduces the downstream MAC/memory usage and energy.
It is denoted as:
\[f(x) = max(0, x)\]Simple alternatives to Swish, such as:
HMX hardware supports fusion of several activation functions with the previous convolution. It is recommended to use one of the supported activation functions such that fusion is possible and HMX hardware can be utilized fully. In addition to trivial fusion of ReLU, ReLUX and ReLUMinMax in A8 and A16 precisions (8 bit and 16 bit activations respectively), we also support fusion of ReLUX and ReLUMinMax in FP16 precision (16 bit float activations) on HMX. Fusion of PreLU/ LeakyReLU and HardSwish is also supported across all precisions (A8, A16 and FP16) on HMX. ReLU, ReLUX and ReLUMinMax are the preferred activations for A8 and A16 runtimes.