Syntex: parametric audio texture datasets for conditional training of instrumental interfaces.

Wyse, Lonce; Ravikumar, Prashanth Thattai

Syntex: parametric audio texture datasets for conditional training of instrumental interfaces.

Lonce Wyse, and Prashanth Thattai Ravikumar

Proceedings of the International Conference on New Interfaces for Musical Expression

Year: 2022
Location: The University of Auckland, New Zealand
Article Number: 15
DOI: 10.21428/92fbeb44.0fe70450 (Link to paper and supplementary files)
PDF Link
Video

Abstract

An emerging approach to building new musical instruments is based on training neural networks to generate audio conditioned upon parametric input. We use the term "generative models" rather than "musical instruments" for the trained networks because it reflects the statistical way the instruments are trained to "model" the association between parameters and the distribution of audio data, and because "musical" carries historical baggage as a reference to a restricted domain of sound. Generative models are musical instruments in that they produce a prescribed range of sound playable through the expressive manipulation of an interface. To learn the mapping from interface to audio, generative models require large amounts of parametrically labeled audio data. This paper introduces the Synthetic Audio Textures (Syn- Tex1) collection of data set generators. SynTex is a database of parameterized audio textures and a suite of tools for creating and labeling datasets designed for training and testing generative neural networks for parametrically conditioned sound synthesis. While there are many existing labeled speech and traditional musical instrument databases available for training generative models, most datasets of general (e.g. environmental) audio are oriented and labeled for the purpose of classification rather than expressive musical generation. SynTex is designed to provide an open shareable reference set of audio for creating generative sound models including their interfaces. SynTex sound sets are synthetically generated. This facilitates dense and accurate labeling necessary for conditionally training generative networks conditionally dependent on input parameter values. SynTex has several characteristics designed to support a data-centric approach to developing, exploring, training, and testing generative models.

Citation

Lonce Wyse, and Prashanth Thattai Ravikumar. 2022. Syntex: parametric audio texture datasets for conditional training of instrumental interfaces.. Proceedings of the International Conference on New Interfaces for Musical Expression. DOI: 10.21428/92fbeb44.0fe70450

BibTeX Entry

@inproceedings{NIME22_15,
 abstract = {An emerging approach to building new musical instruments is based on training neural networks to generate audio conditioned upon parametric input. We use the term "generative models" rather than "musical instruments" for the trained networks because it reflects the statistical way the instruments are trained to "model" the association between parameters and the distribution of audio data, and because "musical" carries historical baggage as a reference to a restricted domain of sound. Generative models are musical instruments in that they produce a prescribed range of sound playable through the expressive manipulation of an interface. To learn the mapping from interface to audio, generative models require large amounts of parametrically labeled audio data. This paper introduces the Synthetic Audio Textures (Syn- Tex1) collection of data set generators. SynTex is a database of parameterized audio textures and a suite of tools for creating and labeling datasets designed for training and testing generative neural networks for parametrically conditioned sound synthesis. While there are many existing labeled speech and traditional musical instrument databases available for training generative models, most datasets of general (e.g. environmental) audio are oriented and labeled for the purpose of classification rather than expressive musical generation. SynTex is designed to provide an open shareable reference set of audio for creating generative sound models including their interfaces. SynTex sound sets are synthetically generated. This facilitates dense and accurate labeling necessary for conditionally training generative networks conditionally dependent on input parameter values. SynTex has several characteristics designed to support a data-centric approach to developing, exploring, training, and testing generative models.},
 address = {The University of Auckland, New Zealand},
 articleno = {15},
 author = {Wyse, Lonce and Ravikumar, Prashanth Thattai},
 booktitle = {Proceedings of the International Conference on New Interfaces for Musical Expression},
 doi = {10.21428/92fbeb44.0fe70450},
 issn = {2220-4806},
 month = {jun},
 pdf = {128.pdf},
 presentation-video = {https://youtu.be/KZHXck9c75s},
 title = {Syntex: parametric audio texture datasets for conditional training of instrumental interfaces.},
 url = {https://doi.org/10.21428%2F92fbeb44.0fe70450},
 year = {2022}
}