Cross-Modal Sig2Sig Machine Translation with Deep Generative Modeling for NIME Design

Lucy Strauss, Prashanth Thattai Ravikumar, and Matthew Yee-King

Proceedings of the International Conference on New Interfaces for Musical Expression

Abstract

NIME researchers frequently work with sensor signals that lack interpretability, such as signals from movement sensors and bioelectric sensors. However, there is a lack of NIME-specific approaches for building and evaluating deep generative models (DGM) of such signals, even though DGM are increasingly prevalent in NIME. Our research focuses on cross-modal Sig2Sig machine translation, a sensor-sound mapping task using DGM. We present the Muscle-Listening Machine Learning Model for Live Music (MLMLMLM), a novel DGM intended for use within an interactive music system. MLMLMLM is trained on a bespoke time-aligned dataset of audio and electromyographic (EMG) signals and features a decoder-only Transformer and two RVQ-VAEs. We position the technical work of designing bespoke DGM architectures as a NIME practice in its own right and employ a Technical Practice Research (TPR) approach to document the process of building MLMLMLM. Through our TPR process, a new evaluation method emerged for DGM with low-interpretability signals. The contributions of this research are two-fold: 1) a novel DGM architecture for EMG-conditioned sequence generation of audio signals; 2) a method for more effectively developing and evaluating DGMs of multi-channel time-domain signals with low-interpretability.

Citation

Lucy Strauss, Prashanth Thattai Ravikumar, and Matthew Yee-King. 2026. Cross-Modal Sig2Sig Machine Translation with Deep Generative Modeling for NIME Design. Proceedings of the International Conference on New Interfaces for Musical Expression. DOI: 10.5281/zenodo.20784411 [PDF]

BibTeX Entry

@inproceedings{nime2026_133,
 abstract = {NIME researchers frequently work with sensor signals that lack interpretability, such as signals from movement sensors and bioelectric sensors. However, there is a lack of NIME-specific approaches for building and evaluating deep generative models (DGM) of such signals, even though DGM are increasingly prevalent in NIME. Our research focuses on cross-modal Sig2Sig machine translation, a sensor-sound mapping task using DGM. We present the Muscle-Listening Machine Learning Model for Live Music (MLMLMLM), a novel DGM intended for use within an interactive music system. MLMLMLM is trained on a bespoke time-aligned dataset of audio and electromyographic (EMG) signals and features a decoder-only Transformer and two RVQ-VAEs. We position the technical work of designing bespoke DGM architectures as a NIME practice in its own right and employ a Technical Practice Research (TPR) approach to document the process of building MLMLMLM. Through our TPR process, a new evaluation method emerged for DGM with low-interpretability signals. The contributions of this research are two-fold: 1) a novel DGM architecture for EMG-conditioned sequence generation of audio signals; 2) a method for more effectively developing and evaluating DGMs of multi-channel time-domain signals with low-interpretability.},
 address = {London, United Kingdom},
 articleno = {133},
 author = {Lucy Strauss and Prashanth Thattai Ravikumar and Matthew Yee-King},
 booktitle = {Proceedings of the International Conference on New Interfaces for Musical Expression},
 doi = {10.5281/zenodo.20784411},
 editor = {Benedict Gaster and João Tragtenberg and Anna Xambó and Tom Mitchell},
 issn = {2220-4806},
 month = {June},
 note = {},
 numpages = {18},
 pages = {1084--1101},
 presentation-video = {https://youtu.be/Z7-ySfuF7lg},
 title = {Cross-Modal Sig2Sig Machine Translation with Deep Generative Modeling for NIME Design},
 track = {paper},
 url = {http://nime.org/proceedings/2026/nime2026_133.pdf},
 year = {2026}
}