Communication Dans Un Congrès Année : 2025

Build and query indexes of clinical documents with easy-to-reuse pipelines

Résumé

Electronic Health Records are a central source of healthcare data, containing structured data alongside unstructured clinical texts. The latter capture detailed reasoning, observations, treatment plans and clinical evolutions, which are crucial for phenotyping, and real-world evidence generation. Natural language processing enables the extraction, thus the subsequent use, of these crucial elements; however, these extractions remain one-off, study-specific efforts. This is detrimental as the extracted elements could be valuable for future research. We present medkit Seshat, an open source Python pipeline that: (1) ingests free text , (2) recognizes relevant entities, (3) normalizes them with OMOP vocabularies, (4) builds an index that can either be searched by concept or by document. In addition, we share a flexible web UI to illustrate the interest of built indexes in term of search, text analysis and export. Seshat aims at facilitating the reuse and adaptation of this prototypical pipeline to various purposes, with the main objective of enabling the secondary use of results of phenotyping campaigns.

Fichier principal
Vignette du fichier
2026_berthou_et_al_mie.pdf (966.78 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)
licence

Dates et versions

hal-05569688 , version 1 (27-03-2026)

Licence

Identifiants

  • HAL Id : hal-05569688 , version 1

Citer

Félix Berthou, Ghilsain Vaillant, Bastien Rance, Adrien Coulet. Build and query indexes of clinical documents with easy-to-reuse pipelines. MIE 2026 - Medical Informatics Europe, EFMI, May 2026, Genova, Italy. ⟨hal-05569688⟩
693 Consultations
83 Téléchargements

Partager

  • More