Improving Spoken Language Modeling with Phoneme Classification: A Simple Fine-tuning Approach

Maxime Poli; Emmanuel Chemla; Emmanuel Dupoux

doi:10.18653/v1/2024.emnlp-main.302

Pré-Publication, Document De Travail Année : 2024

Improving Spoken Language Modeling with Phoneme Classification: A Simple Fine-tuning Approach

, (1) ,

Maxime Poli

Fonction : Auteur

Emmanuel Chemla

Fonction : Auteur
PersonId : 742721
IdHAL : emmanuel-chemla
ORCID : 0000-0002-8423-5880
IdRef : 179351133

Laboratoire de sciences cognitives et psycholinguistique

Emmanuel Dupoux

Fonction : Auteur

Résumé

Recent progress in Spoken Language Modeling has shown that learning language directly from speech is feasible. Generating speech through a pipeline that operates at the text level typically loses nuances, intonations, and nonverbal vocalizations. Modeling directly from speech opens up the path to more natural and expressive systems. On the other hand, speechonly systems require up to three orders of magnitude more data to catch up to their text-based counterparts in terms of their semantic abilities. We show that fine-tuning speech representation models on phoneme classification leads to more context-invariant representations, and language models trained on these units achieve comparable lexical comprehension to ones trained on hundred times more data.

Mots clés

Computation and Language (cs.CL) Sound (cs.SD) Audio and Speech Processing (eess.AS) FOS: Computer and information sciences FOS: Electrical engineering electronic engineering information engineering

Domaines

Sciences cognitives

Fichier principal

2410.00025v2.pdf (320.47 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Emmanuel Chemla : Connectez-vous pour contacter le contributeur

https://cnrs.hal.science/hal-04847733

Soumis le : jeudi 19 décembre 2024-10:11:57

Dernière modification le : dimanche 22 décembre 2024-03:16:56

Dates et versions

hal-04847733 , version 1 (19-12-2024)

Identifiants

HAL Id : hal-04847733 , version 1
ARXIV : 2410.00025v2
DOI : 10.18653/v1/2024.emnlp-main.302

Citer

Maxime Poli, Emmanuel Chemla, Emmanuel Dupoux. Improving Spoken Language Modeling with Phoneme Classification: A Simple Fine-tuning Approach. 2024. ⟨hal-04847733⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS CNRS EHESS LSCP DEC GENCI PSL ANR PRAIRIE-IA

0 Consultations

0 Téléchargements

Improving Spoken Language Modeling with Phoneme Classification: A Simple Fine-tuning Approach

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager