Multimodal Dialogue

2021-08-26
2 min read
Featured Image

Human dialogue and interaction are not limited to words. We rely so much on words that they tend to appear to be at the center of them, but it may not really be at the center. It is known that there are thousands of languages ​​used by humankind, and there are many different types of vocabulary, grammar, and part of speech (nouns, verbs, etc.), but the way of dialogue is almost the same for humankind in contrast to the difference in language. (Of course, there are some cultural differences).

Multimodal means that there is more than one method and channel for transmitting information. Various factors such as face orientation, posture, line of sight, voice volume, voice tone and intonation affect the dialogue. We are conducting research to properly process and integrate multimodal information so that machines can interpret and interact with human language.

Currently, we are conducting research on the relationship between breathing and dialogue.

[1] Laperrière, Lam, Funakoshi: “Packing, Stacking, and Tracking: An Empirical Study of Online User Adaptation”, 11th International Workshop on Spoken Dialogue Systems, pp.319-336 (2020) https://doi.org/10.1007/978-981-15-8395-7_24

[2] Malik, Saunier, Funakoshi, Pauchet: “Who Speaks Next? Turn Change and Next Speaker Prediction in Multimodal Multiparty Interaction”, 32nd IEEE International Conference on Tools with Artificial Intelligence, pp.349-354 (2020) https://doi.org/10.1109/ICTAI50040.2020.00062

[3] Funakoshi, Yamagami, Sugano, Nakano: “Response Obligation Estimation That Considers Users' Repetitive Utterances Using Knowledge-Guided Random Forest”, 2019 IEEE-RAS International Conference on Humanoid Robots (2019) https://doi.org/10.1109/Humanoids43949.2019.9035079