Adam Finkelstein and Jiaqi Su: AI-driven method for producing high-quality audio recordings

Oct. 27, 2021

A new method could improve the listening experience for podcasts, video voice-overs and audio books by using artificial intelligence (AI) to transform low-quality recordings of human speech into crisp and clear studio-quality tracks.

Adam Finkelstein
Adam Finkelstein, Professor of Computer Science

Voice recordings made with consumer-grade equipment in natural environments — including interviews conducted by phone or video chat — typically include background noise, reverberation and distortion. Existing AI-based methods for improving speech recordings have generally tackled a single aspect of audio quality, such as filtering out background noise or removing reverb.

The new method, which the researchers call HiFi-GAN (generative adversarial networks), is more of an all-in-one tool. Ultimately, the researchers hope to apply their framework to enable fully automated real-time speech enhancement.

The approach uses artificial neural networks, which are key tools of deep learning that mimic the interconnected architecture of biological neurons. The researchers train two separate networks that compete to improve audio quality. One network, called a generator, produces cleaned-up recordings of speech. The other network, called a discriminator, analyzes recordings to try to determine whether they are real studio-quality recordings or audio that has been cleaned by the generator.

Jiaqi Su
Jiaqi Su, Graduate Student in Computer Science

The competition between these generative adversarial networks improves the method’s ability to produce clear audio. The generator and discriminator networks engage in a kind of arms race. The two of them ratchet their way up, each becoming more and more effective during training. When that process is complete, the discriminator is discarded, and what remains is a generator capable of producing clear audio.

"Deep learning has already had a huge impact in audio processing, and we expect it to become even more profound in the coming decade."  – Adam Finkelstein

Recorded speech patterns for auditory mediums

An AI-powered approach provides automatic cleanup of recorded speech for podcasts, interviews, video voice-overs and audio books.

Innovators:
Adam Finkelstein, Professor of Computer Science
Jiaqi Su, Graduate Student in Computer Science

Collaborators:
Zeyu Jin, Princeton Ph.D. 2017, Adobe Research

Team members:
Pranay Manocha and Yunyun Wang, Graduate Students in Computer Science

Funding:
Princeton University Dean for Research Innovation Fund for New Industrial Collaborations; Adobe Research

Learn more:
Email: [email protected]
Website: cs.princeton.edu/~af