Cross-Modal Sig2Sig Machine Translation with Deep Generative Modeling for NIME Design
Lucy Strauss, Prashanth Thattai Ravikumar, and Matthew Yee-King
Proceedings of the International Conference on New Interfaces for Musical Expression
- Year: 2026
- Location: London, United Kingdom
- Track: paper
- Pages: 1084–1101
- Article Number: 133
- DOI: 10.5281/zenodo.20784411 (Link to paper and supplementary files)
- PDF Link
- Presentation/Demo Video
Abstract
NIME researchers frequently work with sensor signals that lack interpretability, such as signals from movement sensors and bioelectric sensors. However, there is a lack of NIME-specific approaches for building and evaluating deep generative models (DGM) of such signals, even though DGM are increasingly prevalent in NIME. Our research focuses on cross-modal Sig2Sig machine translation, a sensor-sound mapping task using DGM. We present the Muscle-Listening Machine Learning Model for Live Music (MLMLMLM), a novel DGM intended for use within an interactive music system. MLMLMLM is trained on a bespoke time-aligned dataset of audio and electromyographic (EMG) signals and features a decoder-only Transformer and two RVQ-VAEs. We position the technical work of designing bespoke DGM architectures as a NIME practice in its own right and employ a Technical Practice Research (TPR) approach to document the process of building MLMLMLM. Through our TPR process, a new evaluation method emerged for DGM with low-interpretability signals. The contributions of this research are two-fold: 1) a novel DGM architecture for EMG-conditioned sequence generation of audio signals; 2) a method for more effectively developing and evaluating DGMs of multi-channel time-domain signals with low-interpretability.
Citation
Lucy Strauss, Prashanth Thattai Ravikumar, and Matthew Yee-King. 2026. Cross-Modal Sig2Sig Machine Translation with Deep Generative Modeling for NIME Design. Proceedings of the International Conference on New Interfaces for Musical Expression. DOI: 10.5281/zenodo.20784411 [PDF]
BibTeX Entry
@inproceedings{nime2026_133,
abstract = {NIME researchers frequently work with sensor signals that lack interpretability, such as signals from movement sensors and bioelectric sensors. However, there is a lack of NIME-specific approaches for building and evaluating deep generative models (DGM) of such signals, even though DGM are increasingly prevalent in NIME. Our research focuses on cross-modal Sig2Sig machine translation, a sensor-sound mapping task using DGM. We present the Muscle-Listening Machine Learning Model for Live Music (MLMLMLM), a novel DGM intended for use within an interactive music system. MLMLMLM is trained on a bespoke time-aligned dataset of audio and electromyographic (EMG) signals and features a decoder-only Transformer and two RVQ-VAEs. We position the technical work of designing bespoke DGM architectures as a NIME practice in its own right and employ a Technical Practice Research (TPR) approach to document the process of building MLMLMLM. Through our TPR process, a new evaluation method emerged for DGM with low-interpretability signals. The contributions of this research are two-fold: 1) a novel DGM architecture for EMG-conditioned sequence generation of audio signals; 2) a method for more effectively developing and evaluating DGMs of multi-channel time-domain signals with low-interpretability.},
address = {London, United Kingdom},
articleno = {133},
author = {Lucy Strauss and Prashanth Thattai Ravikumar and Matthew Yee-King},
booktitle = {Proceedings of the International Conference on New Interfaces for Musical Expression},
doi = {10.5281/zenodo.20784411},
editor = {Benedict Gaster and João Tragtenberg and Anna Xambó and Tom Mitchell},
issn = {2220-4806},
month = {June},
note = {},
numpages = {18},
pages = {1084--1101},
presentation-video = {https://youtu.be/Z7-ySfuF7lg},
title = {Cross-Modal Sig2Sig Machine Translation with Deep Generative Modeling for NIME Design},
track = {paper},
url = {http://nime.org/proceedings/2026/nime2026_133.pdf},
year = {2026}
}