AnandaSky: A Vision-Language Model for Line-Level Transcription of Historical Sinographic Documents

We present AnandaSky, a vision--language model for line-level transcription of historical sinographic documents. The model combines a compact high-resolution visual encoder with global attention, 10 px patches, uncompressed visual prefix and a Qwen3-0.6B autoregressive decoder. It is trained at scale on 4M annotated lines from documents produced in China and Korea between the 8th and 20th centuries. Across in-domain and held-out public benchmarks, AnandaSky achieves sub-1% CER on five of eight datasets, sets a new state of the art on MTHv2 with 0.92% CER, and shows strong transfer to unseen collections. For EvaHan 2026, full fine-tuning on the organizers' data to match task-specific annotation conventions reduces CER relative to the official baseline by 5.2% on prints and 12.1% on manuscripts, despite using one-tenth as many parameters.

Mots clés

Domaines

Fichier principal

AnandaSky_Technical_Report-4.pdf (1.98 Mo)

Origine	Fichiers produits par l'(les) auteur(s)
licence	CC BY-NC 4.0 - Attribution - Utilisation non commerciale

Connectez-vous pour contacter le contributeur

https://hal.science/hal-05548531

Soumis le : mardi 24 mars 2026-20:52:27

Dernière modification le : vendredi 27 mars 2026-03:22:33

Dates et versions

hal-05548531 , version 1 (12-03-2026)

hal-05548531 , version 2 (24-03-2026)

Licence

CC BY-NC 4.0 - Attribution - Utilisation non commerciale

Identifiants

HAL Id : hal-05548531 , version 2

Citer

Colin Brisson, Ayoub Kahfy, Frédéric Constant, Marc Bui. AnandaSky: A Vision-Language Model for Line-Level Transcription of Historical Sinographic Documents. The Fourth Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA 2026), May 2026, Majorca/Spain, Spain. ⟨hal-05548531v2⟩

Exporter

Collections

956 Consultations

207 Téléchargements