A musicological pipeline for singing voice style analysis with neural voice processing and alignment
Résumé
The study of singing style is of great interest both for expressive vocal synthesis and for the musicological analysis of vocal performances, inciting to a fruitful convergence between signal processing and musicology. However, for musicologists, these studies often come up against the absence of automatic analysis tools for voices recorded in a musical context, leading to long and tedious manual annotation work. This constraint imposes either to limit oneself to a restricted corpus, or to circumscribe one's study to experimental corpora of voices without instrumental accompaniment, thus depriving oneself of the unequalled interest that commercial recordings represent, as accomplished artistic works. This article introduces a new protocol using deep learning techniques to provide musicologists with powerful tools for the analysis of singing voices, opening up new perspectives through the automation of the different steps. We present a complete processing chain in support of musicological analysis, using neural models to isolate singing voice, predict its F0, and automatically align the syllables or notes to the audio (despite the musical accompaniment). The effectiveness of this approach is demonstrated by its practical application on two popular songs. These tools, developed in an ANR project, will soon be available to the scientific community.
Origine | Fichiers produits par l'(les) auteur(s) |
---|