Communication Dans Un Congrès Année : 2026

AnandaSky: A Vision-Language Model for Line-Level Transcription of Historical Sinographic Documents

Résumé

We present AnandaSky, a vision--language model for line-level transcription of historical sinographic documents. The model combines a compact high-resolution visual encoder with global attention, 10 px patches, uncompressed visual prefix and a Qwen3-0.6B autoregressive decoder. It is trained at scale on 4M annotated lines from documents produced in China and Korea between the 8th and 20th centuries. Across in-domain and held-out public benchmarks, AnandaSky achieves sub-1% CER on five of eight datasets, sets a new state of the art on MTHv2 with 0.92% CER, and shows strong transfer to unseen collections. For EvaHan 2026, full fine-tuning on the organizers' data to match task-specific annotation conventions reduces CER relative to the official baseline by 5.2% on prints and 12.1% on manuscripts, despite using one-tenth as many parameters.

Fichier principal
Vignette du fichier
AnandaSky_Technical_Report-4.pdf (1.98 Mo) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)
licence

Dates et versions

hal-05548531 , version 1 (12-03-2026)
hal-05548531 , version 2 (24-03-2026)

Licence

Identifiants

  • HAL Id : hal-05548531 , version 2

Citer

Colin Brisson, Ayoub Kahfy, Frédéric Constant, Marc Bui. AnandaSky: A Vision-Language Model for Line-Level Transcription of Historical Sinographic Documents. The Fourth Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA 2026), May 2026, Majorca/Spain, Spain. ⟨hal-05548531v2⟩
956 Consultations
207 Téléchargements

Partager

  • More