Gesture-Driven DDSP Synthesis for Digitizing the Chinese Erhu

Wenqi WU; Hanyu QU

Gesture-Driven DDSP Synthesis for Digitizing the Chinese Erhu

Wenqi WU, and Hanyu QU

Proceedings of the International Conference on New Interfaces for Musical Expression

Year: 2025
Location: Canberra, Australia
Track: Paper
Pages: 505–510
Article Number: 73
DOI: 10.5281/zenodo.15698942 (Link to paper and supplementary files)
PDF Link

Abstract

This paper presents a gesture-controlled digital Erhu system that merges traditional Chinese instrumental techniques with contemporary machine learning and interactive technologies. By leveraging the Erhu’s expressive techniques, we develop a dual-hand spatial interaction framework using real-time gesture tracking. Hand movement data is mapped to sound synthesis parameters to control pitch, timbre, and dynamics, while a differentiable digital signal processing (DDSP) model, trained on a custom Erhu dataset, transforms basic waveforms into authentic timbre which remians sincere to the instrument’s nuanced articulations. The system bridges traditional musical aesthetics with digital interactivity, emulating Erhu bowing dynamics and expressive techniques through embodied interaction. The study contributes a novel framework for digitizing Erhu performance practices, explores methods to align culturally informed gestures with DDSP-based synthesis, and offers insights into preserving traditional instruments within digital music interfaces.

Citation

Wenqi WU, and Hanyu QU. 2025. Gesture-Driven DDSP Synthesis for Digitizing the Chinese Erhu. Proceedings of the International Conference on New Interfaces for Musical Expression. DOI: 10.5281/zenodo.15698942 [PDF]

BibTeX Entry

@article{nime2025_73,
 abstract = {This paper presents a gesture-controlled digital Erhu system that merges traditional Chinese instrumental techniques with contemporary machine learning and interactive technologies. By leveraging the Erhu’s expressive techniques, we develop a dual-hand spatial interaction framework using real-time gesture tracking. Hand movement data is mapped to sound synthesis parameters to control pitch, timbre, and dynamics, while a differentiable digital signal processing (DDSP) model, trained on a custom Erhu dataset, transforms basic waveforms into authentic timbre which remians sincere to  the instrument’s nuanced articulations. The system bridges traditional musical aesthetics with digital interactivity, emulating Erhu bowing dynamics and expressive techniques through embodied interaction. The study contributes a novel framework for digitizing Erhu performance practices, explores methods to align culturally informed gestures with DDSP-based synthesis, and offers insights into preserving traditional instruments within digital music interfaces.},
 address = {Canberra, Australia},
 articleno = {73},
 author = {Wenqi WU and Hanyu QU},
 booktitle = {Proceedings of the International Conference on New Interfaces for Musical Expression},
 doi = {10.5281/zenodo.15698942},
 editor = {Doga Cavdir and Florent Berthaut},
 issn = {2220-4806},
 month = {June},
 numpages = {6},
 pages = {505--510},
 title = {Gesture-Driven DDSP Synthesis for Digitizing the Chinese Erhu},
 track = {Paper},
 url = {http://nime.org/proceedings/2025/nime2025_73.pdf},
 year = {2025}
}