Waveform Autoencoding at the Edge of Perceivable Latency
Franco Caspe, Andrew McPherson, and Mark Sandler
Proceedings of the International Conference on New Interfaces for Musical Expression
- Year: 2025
- Location: Canberra, Australia
- Track: Paper
- Pages: 73–76
- Article Number: 10
- DOI: 10.5281/zenodo.15699550 (Link to paper and supplementary files)
- PDF Link
Abstract
We introduce an audio plugin implementation of BRAVE, a waveform autoencoder presented recently, that affords Neural Audio Synthesis with low latency and jitter. As a redesign of the well-known RAVE model, BRAVE introduces a series of architectural modifications for supporting instrumental interaction with almost imperceptible latency (<10 ms) and jitter (~ 3 ms). By comparing both designs, we highlight key architectural differences between the models that impact their instrumental performance capability, arguing that no model fits all purposes, and calling for their careful selection for each interactive design. Finally, we discuss challenges and opportunities for leveraging low-latency waveform autoencoders to develop interactive systems, such as Digital Musical Instruments, that can foster control intimacy through enhanced responsiveness and space for nuance.
Citation
Franco Caspe, Andrew McPherson, and Mark Sandler. 2025. Waveform Autoencoding at the Edge of Perceivable Latency. Proceedings of the International Conference on New Interfaces for Musical Expression. DOI: 10.5281/zenodo.15699550 [PDF]
BibTeX Entry
@article{nime2025_10, abstract = {We introduce an audio plugin implementation of BRAVE, a waveform autoencoder presented recently, that affords Neural Audio Synthesis with low latency and jitter. As a redesign of the well-known RAVE model, BRAVE introduces a series of architectural modifications for supporting instrumental interaction with almost imperceptible latency (<10 ms) and jitter (~ 3 ms). By comparing both designs, we highlight key architectural differences between the models that impact their instrumental performance capability, arguing that no model fits all purposes, and calling for their careful selection for each interactive design. Finally, we discuss challenges and opportunities for leveraging low-latency waveform autoencoders to develop interactive systems, such as Digital Musical Instruments, that can foster control intimacy through enhanced responsiveness and space for nuance.}, address = {Canberra, Australia}, articleno = {10}, author = {Franco Caspe and Andrew McPherson and Mark Sandler}, booktitle = {Proceedings of the International Conference on New Interfaces for Musical Expression}, doi = {10.5281/zenodo.15699550}, editor = {Doga Cavdir and Florent Berthaut}, issn = {2220-4806}, month = {June}, numpages = {4}, pages = {73--76}, title = {Waveform Autoencoding at the Edge of Perceivable Latency}, track = {Paper}, url = {http://nime.org/proceedings/2025/nime2025_10.pdf}, year = {2025} }