Self-Supervised Models for Phoneme Recognition: Applications in Children's Speech for Reading Learning

Lucas Block Medin; Thomas Pellegrini; Lucile Gelin

doi:10.21437/Interspeech.2024-1095

Communication Dans Un Congrès Année : 2024

Self-Supervised Models for Phoneme Recognition: Applications in Children's Speech for Reading Learning

(1) , (2) , (2, 1)

1
2

Lucas Block Medin

Fonction : Auteur

Lalilo [Paris]

Thomas Pellegrini

Fonction : Auteur
PersonId : 741962
IdHAL : thomas-pellegrini
ORCID : 0000-0001-8984-1399
IdRef : 127577955

Équipe Structuration, Analyse et MOdélisation de documents Vidéo et Audio

Lucile Gelin

Fonction : Auteur
PersonId : 742641
IdHAL : lucile-gelin
ORCID : 0000-0002-5623-9438
IdRef : 263409759

Équipe Structuration, Analyse et MOdélisation de documents Vidéo et Audio

Lalilo [Paris]

Résumé

Child speech recognition is still an underdeveloped area of research due to the lack of data (especially on non-English languages) and the specific difficulties of this task. Having explored various architectures for child speech recognition in previous work, in this article we tackle recent self-supervised models. We first compare wav2vec 2.0, HuBERT and WavLM models adapted to phoneme recognition in French child speech, and continue our experiments with the best of them, WavLM base+. We then further adapt it by unfreezing its transformer blocks during fine-tuning on child speech, which greatly improves its performance and makes it significantly outperform our base model, a Transformer+CTC. Finally, we study in detail the behaviour of these two models under the real conditions of our application, and show that WavLM base+ is more robust to various reading tasks and noise levels.

Mots clés

speech recognition child speech self-supervised learning

Domaines

Intelligence artificielle [cs.AI] Multimédia [cs.MM] Réseau de neurones [cs.NE] Traitement du signal et de l'image [eess.SP]

Fichier principal

Interspeech_2024-6.pdf (141.86 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Lucile Gelin : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04694927

Soumis le : mercredi 11 septembre 2024-17:23:52

Dernière modification le : vendredi 13 septembre 2024-03:19:30

Dates et versions

hal-04694927 , version 1 (11-09-2024)

Identifiants

HAL Id : hal-04694927 , version 1
DOI : 10.21437/Interspeech.2024-1095

Citer

Lucas Block Medin, Thomas Pellegrini, Lucile Gelin. Self-Supervised Models for Phoneme Recognition: Applications in Children's Speech for Reading Learning. 25th Interspeech Conference (Interspeech 2024), Sep 2024, Kos, Greece. pp.5168--5172, ⟨10.21437/Interspeech.2024-1095⟩. ⟨hal-04694927⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-TLSE2 CNRS UT1-CAPITOLE IRIT IRIT-SAMOVA IRIT-SI TOULOUSE-INP UNIV-UT3 UT3-TOULOUSEINP

166 Consultations

77 Téléchargements

Self-Supervised Models for Phoneme Recognition: Applications in Children's Speech for Reading Learning

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager