TY - JOUR AU - Lippincott, Tom AB - Book Review Automatic Language Identification in Texts Tommi Jauhiainen, Marcos Zampieri, Timothy Baldwin, and Krister Linden ´ University of Helsinki, George Mason University, MBZUAI, and University of Helsinki Springer (Synthesis Lectures on Human Language Technologies, edited by Graeme Hirst, volume 44), 2024, xiv+148 pp; ISBN 978-3-031-45821-7; ebook, ISBN 978-3-031-45822-4; doi:10.1007/978-3-031-45822-4 Reviewed by Tom Lippincott Johns Hopkins University Language identification (LI) for text data, in the ideal scenario, determines the human languages used at every location in a corpus. In practice this often means choosing the likeliest language at the document level: This is already quite useful, for example, when presenting a webpage to the user and deciding (a) whether to translate it and (b) which model to use for that purpose. However, nuances like code-switching (language alternation), dialect variation, and ambiguously short content are increasingly com- mon with the ubiquity of digital communication like text messaging and micro-blogs. Geographic areas like Africa and the Indian subcontinent bring enormous linguistic diversity and flexibility that break the document-level LI paradigm. While standard references (Jurafsky and Martin 2023) introduce LI, touch on these subtleties, and often present related methods and models in other contexts, Automatic Language Identification in Texts is specifically dedicated TI - Automatic Language Identification in Texts JF - Computational Linguistics DO - 10.1162/coli_r_00521 DA - 2025-03-15 UR - https://www.deepdyve.com/lp/mit-press/automatic-language-identification-in-texts-6W3KxgLMgT SP - 339 EP - 341 VL - 51 IS - 1 DP - DeepDyve ER -