Shifting Time Scales: Supporting Live Gesture-Controlled Generative Music with Speculative Execution

Jason Smith, and Bryan Pardo

Proceedings of the International Conference on New Interfaces for Musical Expression

Abstract

Generative AI has enabled the creation of new interfaces for musical expression (NIMEs) that dynamically generate sounds in response to user input. These systems have focused on coarse, text-based instructions delivered at time scales that are not suitable for fine-grained control of sound enabled by conducting-style gestures during a live performance. Additionally, audio generation introduces latency that impedes gesture-based control, limiting the ability of AI-based NIMEs to synchronize musical output with input gestures in real time. This paper presents Gesture Vocabulary, an interactive generative music system prototype that uses user-defined hand gestures and motions as input. This system employs speculative execution, generating sound based on predicted future gestures to mitigate latency and produce audio in response to gestures at a time scale suitable for real-time performance within a constrained gesture space. By examining the role of AI in music through the time scales of its interactions with users, we aim to support more expressive performance practices through the embodied control of generative music systems.

Citation

Jason Smith, and Bryan Pardo. 2026. Shifting Time Scales: Supporting Live Gesture-Controlled Generative Music with Speculative Execution. Proceedings of the International Conference on New Interfaces for Musical Expression. DOI: 10.5281/zenodo.20784464 [PDF]

BibTeX Entry

@inproceedings{nime2026_153,
 abstract = {Generative AI has enabled the creation of new interfaces for musical expression (NIMEs) that dynamically generate sounds in response to user input. These systems have focused on coarse, text-based instructions delivered at time scales that are not suitable for fine-grained control of sound enabled by conducting-style gestures during a live performance. Additionally, audio generation introduces latency that impedes gesture-based control, limiting the ability of AI-based NIMEs to synchronize musical output with input gestures in real time. This paper presents Gesture Vocabulary, an interactive generative music system prototype that uses user-defined hand gestures and motions as input. This system employs speculative execution, generating sound based on predicted future gestures to mitigate latency and produce audio in response to gestures at a time scale suitable for real-time performance within a constrained gesture space. By examining the role of AI in music through the time scales of its interactions with users, we aim to support more expressive performance practices through the embodied control of generative music systems.},
 address = {London, United Kingdom},
 articleno = {153},
 author = {Jason Smith and Bryan Pardo},
 booktitle = {Proceedings of the International Conference on New Interfaces for Musical Expression},
 doi = {10.5281/zenodo.20784464},
 editor = {Benedict Gaster and João Tragtenberg and Anna Xambó and Tom Mitchell},
 issn = {2220-4806},
 month = {June},
 note = {},
 numpages = {7},
 pages = {1244--1250},
 title = {Shifting Time Scales: Supporting Live Gesture-Controlled Generative Music with Speculative Execution},
 track = {paper},
 url = {http://nime.org/proceedings/2026/nime2026_153.pdf},
 year = {2026}
}