TY - JOUR
AU - Lippincott, Tom
AB - Book Review Automatic Language Identiﬁcation in Texts Tommi Jauhiainen, Marcos Zampieri, Timothy Baldwin, and Krister Linden ´ University of Helsinki, George Mason University, MBZUAI, and University of Helsinki Springer (Synthesis Lectures on Human Language Technologies, edited by Graeme Hirst, volume 44), 2024, xiv+148 pp; ISBN 978-3-031-45821-7; ebook, ISBN 978-3-031-45822-4; doi:10.1007/978-3-031-45822-4 Reviewed by Tom Lippincott Johns Hopkins University Language identiﬁcation (LI) for text data, in the ideal scenario, determines the human languages used at every location in a corpus. In practice this often means choosing the likeliest language at the document level: This is already quite useful, for example, when presenting a webpage to the user and deciding (a) whether to translate it and (b) which model to use for that purpose. However, nuances like code-switching (language alternation), dialect variation, and ambiguously short content are increasingly com- mon with the ubiquity of digital communication like text messaging and micro-blogs. Geographic areas like Africa and the Indian subcontinent bring enormous linguistic diversity and ﬂexibility that break the document-level LI paradigm. While standard references (Jurafsky and Martin 2023) introduce LI, touch on these subtleties, and often present related methods and models in other contexts, Automatic Language Identiﬁcation in Texts is speciﬁcally dedicated 
TI - Automatic Language Identification in Texts
JF - Computational Linguistics
DO - 10.1162/coli_r_00521
DA - 2025-03-15
UR - https://www.deepdyve.com/lp/mit-press/automatic-language-identification-in-texts-6W3KxgLMgT
SP - 339
EP - 341
VL - 51
IS - 1
DP - DeepDyve
ER -