TY - JOUR
AU - Misu, Teruhisa
AB -  Visual Saliency and Crowdsourcing-based Priors for an In-car Situated Dialog System Teruhisa Misu Honda Research Institute 375 Ravendale Drive Suite B Mountain View, CA, 94043 tmisu@honda-ri.com ABSTRACT This paper addresses issues in situated language understanding in a moving car. We propose a reference resolution method to identify user queries about specific target objects in their surroundings. We investigate methods of predicting which target object is likely to be queried given a visual scene and what kind of linguistic cues users naturally provide to describe a given target object in a situated environment. We propose methods to incorporate the visual saliency of the visual scene as a prior. Crowdsourced statistics of how people describe an object are also used as a prior. We have collected situated utterances from drivers using our research system, which was embedded in a real vehicle. We demonstrate that the proposed algorithms improve target identification rate by 15.1%. Previous studies that have analyzed in-car interactions between an expert co-pilot and a driver [4] have shown that people frequently use referring expressions about their surroundings (e.g., What is that big building on the right?). In addition, conversations between a driver and a passenger focus on the 
TI - Visual Saliency and Crowdsourcing-based Priors for an In-car Situated Dialog System
DA - 2015-11-09
UR - https://www.deepdyve.com/lp/association-for-computing-machinery/visual-saliency-and-crowdsourcing-based-priors-for-an-in-car-situated-MNlxsisvms
DP - DeepDyve
ER -