TY - JOUR AU - Schwenk, Holger AB - *y * * *y * Patrick Lewis Barlas Oguz ˘ Ruty Rinott Sebastian Riedel Holger Schwenk * y Facebook AI Research University College London fplewis,barlaso,ruty,sriedel,schwenkg@fb.com Abstract are two reasons why this lack of data prevents in- ternationalization of QA systems. First, we can- Question answering (QA) models have shown not measure progress on multilingual QA with- rapid progress enabled by the availability of out relevant benchmark data. Second, we cannot large, high-quality benchmark datasets. Such easily train end-to-end QA models on the task, annotated datasets are difficult and costly to and arguably most recent successes in QA have collect, and rarely exist in languages other been in fully supervised settings. Given recent than English, making building QA systems progress in cross-lingual tasks such as document that work well in other languages challeng- ing. In order to develop such systems, it is classification (Lewis et al., 2004; Klementiev et al., crucial to invest in high quality multilingual 2012; Schwenk and Li, 2018), semantic role la- evaluation benchmarks to measure progress. belling (Akbik et al., 2015) and NLI (Conneau We present MLQA, a multi-way aligned ex- et al., 2018), we argue that while multilingual QA tractive QA evaluation benchmark intended to TI - MLQA: Evaluating Cross-lingual Extractive Question Answering JF - Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics DO - 10.18653/v1/2020.acl-main.653 DA - 2020-01-01 UR - https://www.deepdyve.com/lp/unpaywall/mlqa-evaluating-cross-lingual-extractive-question-answering-bwFlTY8a0K DP - DeepDyve ER -