Improving Spoken Language Modeling with Phoneme Classification: A Simple Fine-tuning Approach - PaRis AI Research InstitutE
Pré-Publication, Document De Travail Année : 2024

Improving Spoken Language Modeling with Phoneme Classification: A Simple Fine-tuning Approach

Maxime Poli
  • Fonction : Auteur
Emmanuel Dupoux
  • Fonction : Auteur

Résumé

Recent progress in Spoken Language Modeling has shown that learning language directly from speech is feasible. Generating speech through a pipeline that operates at the text level typically loses nuances, intonations, and nonverbal vocalizations. Modeling directly from speech opens up the path to more natural and expressive systems. On the other hand, speechonly systems require up to three orders of magnitude more data to catch up to their text-based counterparts in terms of their semantic abilities. We show that fine-tuning speech representation models on phoneme classification leads to more context-invariant representations, and language models trained on these units achieve comparable lexical comprehension to ones trained on hundred times more data.

Fichier principal
Vignette du fichier
2410.00025v2.pdf (320.47 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04847733 , version 1 (19-12-2024)

Identifiants

Citer

Maxime Poli, Emmanuel Chemla, Emmanuel Dupoux. Improving Spoken Language Modeling with Phoneme Classification: A Simple Fine-tuning Approach. 2024. ⟨hal-04847733⟩
0 Consultations
0 Téléchargements

Altmetric

Partager

More