Speech to text model for a non-English language.

9 visualizaciones (últimos 30 días)
Murtaza Mohammadi
Murtaza Mohammadi el 22 de Sept. de 2024
Comentada: Umar el 24 de Sept. de 2024
Hello
I want to develop a tool that can transcribe a non-English audio and using the letters of that language itself. What I have gathered so far is that I need to create some labelled data and train a deep learning model. Beyound that I am unaware how to proceed. All online examples and discussions pertain to an existing English dataset or a trained model on the English language. I would like to develop something for a regional dialect which uses a different alphabet system.
Looking for some detailed help and guidance here.
Thank you.

Respuesta aceptada

Umar
Umar el 22 de Sept. de 2024

Hi @Murtaza Mohammadi ,

The first step in developing your transcription tool is to gather a dataset of audio recordings in the target language. This dataset should include:

Audio Files: Recordings of spoken language in various contexts (e.g., conversations, speeches).

Transcriptions: Text files that contain the corresponding transcriptions of the audio files in the target alphabet.

You may need to create this dataset manually or find existing resources. Ensure that the audio quality is high and that the recordings cover a diverse range of speakers and dialects. Once you have your audio files, you need to label them. This involves creating a mapping between the audio and its corresponding text. You can use a simple CSV format for this purpose:

audio_file, transcription
audio1.wav, "transcription in target alphabet"
audio2.wav, "another transcription"

So, before you train a model, you must preprocess the audio data. This typically involves:

Resampling: Ensure all audio files are at the same sample rate.

Feature Extraction: Convert audio signals into a format suitable for model training, such as Mel-frequency cepstral coefficients (MFCCs).

Here’s a MATLAB code snippet to extract MFCC features from an audio file: language-matlab

[audioIn, fs] = audioread('audio1.wav'); % Read audio file
audioIn = resample(audioIn, 16000, fs); % Resample to 16 kHz
coeffs = mfcc(audioIn, 16000); % Extract MFCC features

For more information on these functions, please refer to

https://www.mathworks.com/help/matlab/import_export/read-and-get-information-about-audio-files.html

https://www.mathworks.com/help/signal/ref/resample.html?searchHighlight=resample&s_tid=srchtitle_support_results_1_resample

https://www.mathworks.com/help/audio/ref/mfcc.html?searchHighlight=mfcc&s_tid=srchtitle_support_results_1_mfcc

Now for transcription tasks, I will recommend using recurrent neural networks (RNNs) or convolutional neural networks (CNNs) which are commonly used. You can also consider using Long Short-Term Memory (LSTM) networks, which are effective for sequence prediction problems.Here’s a simple example of defining an LSTM network in MATLAB:

layers = [
  sequenceInputLayer(13) % Input layer for MFCC features
  lstmLayer(100, 'OutputMode', 'sequence') % LSTM layer
  fullyConnectedLayer(numClasses) % Output layer for classes
  softmaxLayer
  classificationLayer];

For more information on lstm layer, please refer to

https://www.mathworks.com/help/deeplearning/ref/nnet.cnn.layer.lstmlayer.html?searchHighlight=lstmLayer&s_tid=srchtitle_support_results_1_lstmLayer

Once your model is defined, then you can train it using the labeled data. Use the trainNetwork function in MATLAB:

options = trainingOptions('adam', ...
  'MaxEpochs', 100, ...
  'MiniBatchSize', 32, ...
  'Verbose', 0, ...
  'Plots', 'training-progress');
net = trainnet(trainingData, layers, options);

For more information on trainnet, please refer to

https://www.mathworks.com/help/deeplearning/ref/trainnet.html

After training, evaluate your model's performance using a separate test dataset. Calculate metrics such as accuracy, precision, and recall to assess how well your model transcribes audio. Now that you are satisfied with the model's performance, you can deploy it as a standalone application or integrate it into a larger system. Consider using MATLAB's App Designer to create a user-friendly interface for your transcription tool. For more information on App Designer, please refer to

https://www.mathworks.com/help/matlab/ref/appdesigner.html?searchHighlight=App%20designer&s_tid=srchtitle_support_results_1_App%20designer

Hope, this should help you get started with your project. Please let me know if you have any further questions.

  2 comentarios
Murtaza Mohammadi
Murtaza Mohammadi el 23 de Sept. de 2024
Thanks for your detailed response. I will get going on this and keep you posted.
Umar
Umar el 24 de Sept. de 2024
Hi @ Murtaza Mohammadi,
Thank you for your prompt acknowledgment. I appreciate your commitment to moving forward with this matter. Please feel free to reach out if you have any questions or require further assistance as you proceed. I look forward to hearing from you soon.

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Sequence and Numeric Feature Data Workflows en Help Center y File Exchange.

Productos


Versión

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by