HARP-Net: Hyper-Autoencoded Reconstruction Propagation for Scalable Neural Audio Coding
Darius Petermann, Minje Kim
SAIGE, Indiana University
Audio Examples:
48kbps Models (+16kbps LPC)
Model
Target Bitrate
Num. Params
SNR (dB)
Num. Skips
Audio#1
Audio#2
Audio#3
Reference
-
-
-
-
MP3
-
-
-
-
Baseline, Large Bitrate, 1
48kbps
218k
6.24dB
-
HARPNET, Large Bitrate, 1
48kbps
216k
14.19dB
1
Baseline, Large Bitrate, 2
48kbps
275k
6.95dB
-
HARPNET, Large Bitrate, 2
48kbps
257k
13.52dB
2
Baseline, Large Bitrate, 3
48kbps
314k
9.63dB
-
HARPNET, Large Bitrate, 3
48kbps
298k
14.12dB
3
Baseline, Large Bitrate, 4
48kbps
350k
4.92dB
-
HARPNET, Large Bitrate, 4
48kbps
339k
14.05dB
4
24kbps Models (+16kbps LPC)
Model
Target Bitrate
Num. Params
SNR (dB)
Num. Skips
Audio#1
Audio#2
Audio#3
Reference
-
-
-
-
MP3
-
-
-
-
Baseline, Small Bitrate, 1
24kbps
218k
3.75dB
-
HARPNET, Small Bitrate, 1
24kbps
216k
7.25dB
1
Baseline, Small Bitrate, 2
24kbps
275k
6.22dB
-
HARPNET, Small Bitrate, 2
24kbps
257k
7.62dB
2
Baseline, Small Bitrate, 3
24kbps
314k
9.08dB
-
HARPNET, Small Bitrate, 3
24kbps
298k
10.35dB
3
Baseline, Small Bitrate, 4
24kbps
350k
1.41dB
-
HARPNET, Small Bitrate, 4
24kbps
339k
8.36dB
4
[1] S. Kankanahalli, "End-To-End Optimized Speech Coding with Deep Neural Networks," 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 2018, pp. 2521-2525, doi: 10.1109/ICASSP.2018.8461487.
[2] K. Zhen, M. S. Lee, J. Sung, S. Beack and M. Kim, "Efficient and Scalable Neural Residual Waveform Coding with Collaborative Quantization," ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 361-365, doi: 10.1109/ICASSP40776.2020.9054347
[3] Agustsson, Eirikur, Fabian Mentzer, Michael Tschannen, L. Cavigelli, R. Timofte, L. Benini and L. Gool. “Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations.” NIPS (2017).