Waveform Autoencoding at the Edge of Perceivable Latency

Franco Caspe; Andrew McPherson; Mark Sandler

Waveform Autoencoding at the Edge of Perceivable Latency

Franco Caspe, Andrew McPherson, and Mark Sandler

Proceedings of the International Conference on New Interfaces for Musical Expression

Year: 2025
Location: Canberra, Australia
Track: Paper
Pages: 73–76
Article Number: 10
DOI: 10.5281/zenodo.15699550 (Link to paper and supplementary files)
PDF Link

Abstract

We introduce an audio plugin implementation of BRAVE, a waveform autoencoder presented recently, that affords Neural Audio Synthesis with low latency and jitter. As a redesign of the well-known RAVE model, BRAVE introduces a series of architectural modifications for supporting instrumental interaction with almost imperceptible latency (<10 ms) and jitter (~ 3 ms). By comparing both designs, we highlight key architectural differences between the models that impact their instrumental performance capability, arguing that no model fits all purposes, and calling for their careful selection for each interactive design. Finally, we discuss challenges and opportunities for leveraging low-latency waveform autoencoders to develop interactive systems, such as Digital Musical Instruments, that can foster control intimacy through enhanced responsiveness and space for nuance.

Citation

Franco Caspe, Andrew McPherson, and Mark Sandler. 2025. Waveform Autoencoding at the Edge of Perceivable Latency. Proceedings of the International Conference on New Interfaces for Musical Expression. DOI: 10.5281/zenodo.15699550 [PDF]

BibTeX Entry

@article{nime2025_10,
 abstract = {We introduce an audio plugin implementation of BRAVE, a waveform autoencoder presented recently, that affords Neural Audio Synthesis with low latency and jitter. As a redesign of the well-known RAVE model, BRAVE introduces a series of architectural modifications for supporting instrumental interaction with almost imperceptible latency (<10 ms) and jitter (~ 3 ms). By comparing both designs, we highlight key architectural differences between the models that impact their instrumental performance capability, arguing that no model fits all purposes, and calling for their careful selection for each interactive design. Finally, we discuss challenges and opportunities for leveraging low-latency waveform autoencoders to develop interactive systems, such as Digital Musical Instruments, that can foster control intimacy through enhanced responsiveness and space for nuance.},
 address = {Canberra, Australia},
 articleno = {10},
 author = {Franco Caspe and Andrew McPherson and Mark Sandler},
 booktitle = {Proceedings of the International Conference on New Interfaces for Musical Expression},
 doi = {10.5281/zenodo.15699550},
 editor = {Doga Cavdir and Florent Berthaut},
 issn = {2220-4806},
 month = {June},
 numpages = {4},
 pages = {73--76},
 title = {Waveform Autoencoding at the Edge of Perceivable Latency},
 track = {Paper},
 url = {http://nime.org/proceedings/2025/nime2025_10.pdf},
 year = {2025}
}