How to find word error rate of spoken sentence for regression based model?

7 visualizaciones (últimos 30 días)
I am working on visual speech synthesis. I have used GRID dataset which consists of short sentences. The developed model is regression based model.The model takes mute video as a input & generate speech signal. My aim is to find word error rate from output signal(speech signal). I don't know how to seperate words from input and output signal in order to find word error rate.
Kindly guide me about this.

Respuesta aceptada

Drew
Drew el 25 de Oct. de 2023
Word Error Rate (WER) is a widely used metric for evaluating Automatic Speech Recognition (ASR). To calculate WER for a visual speech synthesis (VSS) system, a reference word transcription and a hypothesis word transcription will be needed, and then standard word error rate alignment can be performed to obtain the WER. These word transcriptions can be obtained in various ways. For example, the reference word transcriptions might come from the visual dataset labels. The hypothesis word transcription might come from the VSS system itself (if the VSS system has an intermediate representation in words), or from running ASR on the synthesized speech. It is important to note that while WER is a widely-used metric, it does not capture all aspects of visual speech synthesis quality. Other evaluation metrics, such as perceptual evaluation of speech quality (PESQ) or subjective user studies, could be conducted to assess the system's performance from different perspectives, including audio-visual synchronization, intelligibility, overall usefulness of the synthesized speech, and naturalness.
If this answer helps you, please remember to accept the answer.

Más respuestas (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by