HARP-Net: Hyper-Autoencoded Reconstruction Propagation for Scalable Neural Audio Coding

Darius Petermann, Minje Kim

SAIGE, Indiana University





Audio Examples:

48kbps Models (+16kbps LPC)

Model Target Bitrate Num. Params SNR (dB) Num. Skips Audio#1 Audio#2 Audio#3
Reference - - - -
MP3 - - - -
Baseline, Large Bitrate, 1 48kbps 218k 6.24dB -
HARPNET, Large Bitrate, 1 48kbps 216k 14.19dB 1
Baseline, Large Bitrate, 2 48kbps 275k 6.95dB -
HARPNET, Large Bitrate, 2 48kbps 257k 13.52dB 2
Baseline, Large Bitrate, 3 48kbps 314k 9.63dB -
HARPNET, Large Bitrate, 3 48kbps 298k 14.12dB 3
Baseline, Large Bitrate, 4 48kbps 350k 4.92dB -
HARPNET, Large Bitrate, 4 48kbps 339k 14.05dB 4



24kbps Models (+16kbps LPC)

Model Target Bitrate Num. Params SNR (dB) Num. Skips Audio#1 Audio#2 Audio#3
Reference - - - -
MP3 - - - -
Baseline, Small Bitrate, 1 24kbps 218k 3.75dB -
HARPNET, Small Bitrate, 1 24kbps 216k 7.25dB 1
Baseline, Small Bitrate, 2 24kbps 275k 6.22dB -
HARPNET, Small Bitrate, 2 24kbps 257k 7.62dB 2
Baseline, Small Bitrate, 3 24kbps 314k 9.08dB -
HARPNET, Small Bitrate, 3 24kbps 298k 10.35dB 3
Baseline, Small Bitrate, 4 24kbps 350k 1.41dB -
HARPNET, Small Bitrate, 4 24kbps 339k 8.36dB 4








[1] S. Kankanahalli, "End-To-End Optimized Speech Coding with Deep Neural Networks," 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 2018, pp. 2521-2525, doi: 10.1109/ICASSP.2018.8461487.

[2] K. Zhen, M. S. Lee, J. Sung, S. Beack and M. Kim, "Efficient and Scalable Neural Residual Waveform Coding with Collaborative Quantization," ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 361-365, doi: 10.1109/ICASSP40776.2020.9054347

[3] Agustsson, Eirikur, Fabian Mentzer, Michael Tschannen, L. Cavigelli, R. Timofte, L. Benini and L. Gool. “Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations.” NIPS (2017).