TY - JOUR
AU - Morariu, Vlad
AB - Abstract: We introduce Dessurt, a relatively simple document understanding transformer capable of being fine-tuned on a greater variety of document tasks than prior methods. It receives a document image and task string as input and generates arbitrary text autoregressively as output. Because Dessurt is an end-to-end architecture that performs text recognition in addition to the document understanding, it does not require an external recognition model as prior methods do. Dessurt is a more flexible model than prior methods and is able to handle a variety of document domains and tasks. We show that this model is effective at 9 different dataset-task combinations.
TI - End-to-end Document Recognition and Understanding with Dessurt
JF - Computing Research Repository
DO - 10.48550/arxiv.2203.16618
DA - 2022-03-30
UR - https://www.deepdyve.com/lp/arxiv-cornell-university/end-to-end-document-recognition-and-understanding-with-dessurt-0AZm61RUnC
VL - 2023
IS - 2203
DP - DeepDyve
ER -