TY - JOUR
AU - Schwenk, Holger
AB - *y * * *y * Patrick Lewis Barlas Oguz ˘ Ruty Rinott Sebastian Riedel Holger Schwenk * y Facebook AI Research University College London fplewis,barlaso,ruty,sriedel,schwenkg@fb.com Abstract are two reasons why this lack of data prevents in- ternationalization of QA systems. First, we can- Question answering (QA) models have shown not measure progress on multilingual QA with- rapid progress enabled by the availability of out relevant benchmark data. Second, we cannot large, high-quality benchmark datasets. Such easily train end-to-end QA models on the task, annotated datasets are difﬁcult and costly to and arguably most recent successes in QA have collect, and rarely exist in languages other been in fully supervised settings. Given recent than English, making building QA systems progress in cross-lingual tasks such as document that work well in other languages challeng- ing. In order to develop such systems, it is classiﬁcation (Lewis et al., 2004; Klementiev et al., crucial to invest in high quality multilingual 2012; Schwenk and Li, 2018), semantic role la- evaluation benchmarks to measure progress. belling (Akbik et al., 2015) and NLI (Conneau We present MLQA, a multi-way aligned ex- et al., 2018), we argue that while multilingual QA tractive QA evaluation benchmark intended to 
TI - MLQA: Evaluating Cross-lingual Extractive Question Answering
JF - Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
DO - 10.18653/v1/2020.acl-main.653
DA - 2020-01-01
UR - https://www.deepdyve.com/lp/unpaywall/mlqa-evaluating-cross-lingual-extractive-question-answering-bwFlTY8a0K
DP - DeepDyve
ER -