Applying convolutional neural networks to speed up environmental DNA annotation in a highly diverse ecosystem

Abstract High-throughput DNA sequencing is becoming an increasingly important tool to monitor and better understand biodiversity responses to environmental changes in a standardized and reproducible way. Environmental DNA (eDNA) from organisms can be captured in ecosystem samples and sequenced using metabarcoding, but processing large volumes of eDNA data and annotating sequences to recognized taxa remains computationally expensive. Speed and accuracy are two major bottlenecks in this critical step. Here, we evaluated the ability of convolutional neural networks (CNNs) to process short eDNA sequences and associate them with taxonomic labels. Using a unique eDNA data set collected in highly diverse Tropical South America, we compared the speed and accuracy of CNNs with that of a well-known bioinformatic pipeline (OBITools) in processing a small region (60 bp) of the 12S ribosomal DNA targeting freshwater fishes. We found that the taxonomic labels from the CNNs were comparable to those from OBITools, with high correlation levels for the composition of the regional fish fauna. The CNNs enabled the processing of raw fastq files at a rate of approximately 1 million sequences per minute, which was about 150 times faster than with OBITools. Given the good performance of CNNs in the highly diverse ecosystem considered here, the development of more elaborate CNNs promises fast deployment for future biodiversity inventories using eDNA.

Domaines

Biodiversité et Ecologie Milieux et Changements globaux

Fichier principal

s41598-022-13412-w.pdf (2.56 Mo)

Origine	Fichiers éditeurs autorisés sur une archive ouverte
Licence	Paternité

Isabelle Vidal Ayouba : Connectez-vous pour contacter le contributeur

https://hal.umontpellier.fr/hal-03824009

Soumis le : lundi 5 juin 2023-15:04:05

Dernière modification le : mercredi 18 décembre 2024-10:08:56

Archivage à long terme le : mercredi 6 septembre 2023-19:07:35

Dates et versions

hal-03824009 , version 1 (05-06-2023)

Licence

Paternité

Identifiants

HAL Id : hal-03824009 , version 1
DOI : 10.1038/s41598-022-13412-w
WOS : 000812565400032

Citer

Benjamin Flück, Laëtitia Mathon, Stéphanie Manel, Alice Valentini, Tony Dejean, et al.. Applying convolutional neural networks to speed up environmental DNA annotation in a highly diverse ecosystem. Scientific Reports, 2022, 12 (1), pp.10247. ⟨10.1038/s41598-022-13412-w⟩. ⟨hal-03824009⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

IRD UNIV-SAVOIE EPHE UGA CNRS UNIV-MONTP3 OSUG LECA CEFE INSMI PSL UNIV-MONTPELLIER MARBEC INSTITUT-AGRO-MONTPELLIER INRAE INRAEOCCITANIEMONTPELLIER ANR UNIV-UT3 UT3-TOULOUSEINP DPT_ECODIV INEE-CNRS DECOD OMP-CRBE

63 Consultations

23 Téléchargements