Gesture-Driven DDSP Synthesis for Digitizing the Chinese Erhu
Wenqi WU, and Hanyu QU
Proceedings of the International Conference on New Interfaces for Musical Expression
- Year: 2025
- Location: Canberra, Australia
- Track: Paper
- Pages: 505–510
- Article Number: 73
- DOI: 10.5281/zenodo.15698942 (Link to paper and supplementary files)
- PDF Link
Abstract
This paper presents a gesture-controlled digital Erhu system that merges traditional Chinese instrumental techniques with contemporary machine learning and interactive technologies. By leveraging the Erhu’s expressive techniques, we develop a dual-hand spatial interaction framework using real-time gesture tracking. Hand movement data is mapped to sound synthesis parameters to control pitch, timbre, and dynamics, while a differentiable digital signal processing (DDSP) model, trained on a custom Erhu dataset, transforms basic waveforms into authentic timbre which remians sincere to the instrument’s nuanced articulations. The system bridges traditional musical aesthetics with digital interactivity, emulating Erhu bowing dynamics and expressive techniques through embodied interaction. The study contributes a novel framework for digitizing Erhu performance practices, explores methods to align culturally informed gestures with DDSP-based synthesis, and offers insights into preserving traditional instruments within digital music interfaces.
Citation
Wenqi WU, and Hanyu QU. 2025. Gesture-Driven DDSP Synthesis for Digitizing the Chinese Erhu. Proceedings of the International Conference on New Interfaces for Musical Expression. DOI: 10.5281/zenodo.15698942 [PDF]
BibTeX Entry
@article{nime2025_73, abstract = {This paper presents a gesture-controlled digital Erhu system that merges traditional Chinese instrumental techniques with contemporary machine learning and interactive technologies. By leveraging the Erhu’s expressive techniques, we develop a dual-hand spatial interaction framework using real-time gesture tracking. Hand movement data is mapped to sound synthesis parameters to control pitch, timbre, and dynamics, while a differentiable digital signal processing (DDSP) model, trained on a custom Erhu dataset, transforms basic waveforms into authentic timbre which remians sincere to the instrument’s nuanced articulations. The system bridges traditional musical aesthetics with digital interactivity, emulating Erhu bowing dynamics and expressive techniques through embodied interaction. The study contributes a novel framework for digitizing Erhu performance practices, explores methods to align culturally informed gestures with DDSP-based synthesis, and offers insights into preserving traditional instruments within digital music interfaces.}, address = {Canberra, Australia}, articleno = {73}, author = {Wenqi WU and Hanyu QU}, booktitle = {Proceedings of the International Conference on New Interfaces for Musical Expression}, doi = {10.5281/zenodo.15698942}, editor = {Doga Cavdir and Florent Berthaut}, issn = {2220-4806}, month = {June}, numpages = {6}, pages = {505--510}, title = {Gesture-Driven DDSP Synthesis for Digitizing the Chinese Erhu}, track = {Paper}, url = {http://nime.org/proceedings/2025/nime2025_73.pdf}, year = {2025} }