Communication Dans Un Congrès Année : 2026

Investigating speaker pronunciation variability in speech embeddings: speaker and L1 effects on French as a Second Language

Résumé

Speech variation between native and non-native speakers of French is addressed with a low-resource method based on a frame-wise comparison of wav2vec2 acoustic embeddings, using fine-grained phonetic transcriptions by expert annotators as baseline. z-normalisation and t-normalisation are explored to assess what the embeddings contain in terms of phonetically analysable information. We explore non-supervised methods for solving basic speech-related research questions. Adapting Dynamic Time Warping to speech embeddings, we compare phonologically similar recordings of sentences read-aloud by native vs. non-native speakers of French. The question is whether XLSR-53 embeddings are more robust than MFCCs to inter-speaker vs. intra-speaker variability for different occurrences of the same words. Then we investigate whether native speaker productions are more or less stable than those of non-native speakers. Results suggest that the model allows phonetically meaningful correlative analyses. Working on the raw embeddings shows however that the representations are not speaker-independent, so with a view to address issues in relationship with L2 pronunciation variability, we show that t-normalisation brings us a way to separate fluency and accuracy effects in L2-speech. This shows that wav2vec2 encapsulates time-dependent phonetic information in the embeddings, including speaker accent which can not easily be disentangled from other speaker-specific characteristics.

Fichier principal
Vignette du fichier
30_Paper-finalfinal.pdf (1.69 Mo) Télécharger le fichier

Dates et versions

hal-05577663 , version 1 (02-04-2026)

Licence

Identifiants

  • HAL Id : hal-05577663 , version 1

Citer

Maxime Fily, Martine Adda-Decker, Guillaume Wisniewski. Investigating speaker pronunciation variability in speech embeddings: speaker and L1 effects on French as a Second Language. Speech Language Models in Low-Resource Settings: Performance, Evaluation, and Bias Analysis (SPEAKABLE, collocated in LREC), ELRA Language Resources Association, May 2026, Palma De Majorque, Spain. ⟨hal-05577663⟩
324 Consultations
16 Téléchargements

Partager

  • More