TY - JOUR AU - AB - We demonstrate the surprising strength of uni- F L R U D E F L R U D E modal baselines in multimodal domains, and make concrete recommendations for best prac- t3 t tices in future research. Where existing work often compares against random or majority class baselines, we argue that unimodal ap- F L R U D E F L R U D E proaches better capture and reflect dataset bi- ases and therefore provide an important com- Actions: Forward, turn Left & Right, tilt Up & Down, End parison when assessing the performance of Figure 1: Navigating without vision leads to sensible multimodal techniques. We present unimodal navigation trajectories in response to commands like ablations on three recent datasets in visual nav- “walk past the bar and turn right”. At t , “forward” is igation and QA, seeing an up to 29% absolute unavailable as the agent would collide with the wall. gain in performance over published baselines. 1 Introduction We ablate models from three recent papers: (1) navigation (Figure 1) using images of real All datasets have biases. Baselines should cap- homes paired with crowdsourced language de- ture these regularities so that outperforming them scriptions (Anderson TI - Shifting the Baseline: Single Modality Performance on Visual Navigation & JF - Proceedings of the 2019 Conference of the North DO - 10.18653/v1/n19-1197 DA - 2019-01-01 UR - https://www.deepdyve.com/lp/unpaywall/shifting-the-baseline-single-modality-performance-on-visual-navigation-fCbr0cmobK DP - DeepDyve ER -