Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases

Ole Tørresen; Bastiaan Star; Pablo Mier; Miguel Andrade-Navarro; Alex Bateman; Patryk Jarnot; Aleksandra Gruca; Marcin Grynberg; Andrey Kajava; Vasilis Promponas; Maria Anisimova; Kjetill Jakobsen; Dirk Linke

doi:10.1093/nar/gkz841

Article Dans Une Revue Nucleic Acids Research Année : 2019

Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases

, , , , , , , , (1) , , , ,

Ole Tørresen

Fonction : Auteur

Bastiaan Star

Fonction : Auteur

Pablo Mier

Fonction : Auteur

Miguel Andrade-Navarro

Fonction : Auteur

Alex Bateman

Fonction : Auteur

Patryk Jarnot

Fonction : Auteur

Aleksandra Gruca

Fonction : Auteur

Marcin Grynberg

Fonction : Auteur

Andrey Kajava

Fonction : Auteur
PersonId : 180819
IdHAL : andrei-kaiava
ORCID : 0000-0002-2342-6886
IdRef : 111884926

Centre de recherche en Biologie cellulaire de Montpellier

Vasilis Promponas

Fonction : Auteur

Maria Anisimova

Fonction : Auteur
PersonId : 927367
ORCID : 0000-0001-8145-7966

Kjetill Jakobsen

Fonction : Auteur

Dirk Linke

Fonction : Auteur

Résumé

The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with 'ready-to-use' deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotationdeposition workflow, and that may proliferate in public database repositories affecting all downstream

Domaines

Bio-informatique [q-bio.QM]

Fichier principal

Ole-NAR-2020.pdf (782.09 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Andrei KAIAVA : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03089273

Soumis le : mercredi 30 décembre 2020-17:03:06

Dernière modification le : mercredi 17 avril 2024-15:30:07

Archivage à long terme le : mercredi 31 mars 2021-18:47:54

Dates et versions

hal-03089273 , version 1 (30-12-2020)

Identifiants

HAL Id : hal-03089273 , version 1
DOI : 10.1093/nar/gkz841

Citer

Ole Tørresen, Bastiaan Star, Pablo Mier, Miguel Andrade-Navarro, Alex Bateman, et al.. Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Research, 2019, 47 (21), pp.10994-11006. ⟨10.1093/nar/gkz841⟩. ⟨hal-03089273⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS CRBM BS UNIV-MONTPELLIER

77 Consultations

233 Téléchargements

Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager