Cross-modal Sound Mapping Using Deep Learning

Ohad Fried, and Rebecca Fiebrink

Proceedings of the International Conference on New Interfaces for Musical Expression

Abstract:

We present a method for automatic feature extraction and cross-modal mappingusing deep learning. Our system uses stacked autoencoders to learn a layeredfeature representation of the data. Feature vectors from two (or more)different domains are mapped to each other, effectively creating a cross-modalmapping. Our system can either run fully unsupervised, or it can use high-levellabeling to fine-tune the mapping according a user's needs. We show severalapplications for our method, mapping sound to or from images or gestures. Weevaluate system performance both in standalone inference tasks and incross-modal mappings.

Citation:

Ohad Fried, and Rebecca Fiebrink. 2013. Cross-modal Sound Mapping Using Deep Learning. Proceedings of the International Conference on New Interfaces for Musical Expression. DOI: 10.5281/zenodo.1178528

BibTeX Entry:

  @inproceedings{Fried2013,
 abstract = {We present a method for automatic feature extraction and cross-modal mappingusing deep learning. Our system uses stacked autoencoders to learn a layeredfeature representation of the data. Feature vectors from two (or more)different domains are mapped to each other, effectively creating a cross-modalmapping. Our system can either run fully unsupervised, or it can use high-levellabeling to fine-tune the mapping according a user's needs. We show severalapplications for our method, mapping sound to or from images or gestures. Weevaluate system performance both in standalone inference tasks and incross-modal mappings.},
 address = {Daejeon, Republic of Korea},
 author = {Ohad Fried and Rebecca Fiebrink},
 booktitle = {Proceedings of the International Conference on New Interfaces for Musical Expression},
 doi = {10.5281/zenodo.1178528},
 issn = {2220-4806},
 keywords = {Deep learning, feature learning, mapping, gestural control},
 month = {May},
 pages = {531--534},
 publisher = {Graduate School of Culture Technology, KAIST},
 title = {Cross-modal Sound Mapping Using Deep Learning},
 url = {http://www.nime.org/proceedings/2013/nime2013_111.pdf},
 year = {2013}
}