Quantization

Alternating Multi-bit Quantization for Recurrent Neural Networks

We address the latency problem of RNN by quantizing the network, both weights and activations, into multiple binary codes {−1, +1}. We formulate the quantization as an optimization problem.