Chapter 3
Improving Quality and Quantity of Training Data
When is noise in your data a good thing? When it accurately reflects real-world conditions.
For speech and voice applications, typical existing large data sets will be recorded in ways that differ from real application scenarios. If your application is supposed to recognize a spoken trigger word, then it needs to cope with poor microphones, specific types of reverberation, and background noise.
These and other effects can be added artificially to grow a training data set using established signal processing methods and domain-specific applications through:
- Data augmentation
- Data synthesis
Signals can be difficult to measure consistently or observe to build a large data set; this chapter looks at techniques to create more training data. Data synthesis can help create new signals from models or simulations, and data augmentation is a specific type of data synthesis that creates new variations of your existing data.
First, a brief overview of how deep learning works with signal data.
Data Augmentation
Starting from existing labeled samples, augmentation generates:
- Training data that is similar to your high-quality validation data
- Variations of the available data that the system may encounter in real-world scenarios
Augmentation effects are often domain specific. Common augmentation effects for audio, speech, and acoustic data include stretch time, shift pitch, control volume, and many more.
Kitchen Reverberation

Washing Machine Noise

Synthesis
Data synthesis includes generating training data from scratch using a combination of AI generative models or simulations.
A few examples of domain-specific data synthesis include:
The text2speech
function in MATLAB can help you generate high-quality synthetic voice signals by using cloud-based services by IBM®, Microsoft®, or Google®, including via Google’s well-known WaveNet network.

This example shows how to classify pedestrians and bicyclists based on their micro-Doppler characteristics using a deep learning network and time-frequency analysis. The movements of different parts of an object placed in front of a radar produce micro-Doppler signatures that can be used to identify the object.

Communication signals are also very difficult to field-record off the air and then label. The WLAN Router Impersonation Detection example simulates realistic signals for RF fingerprinting. With the algorithm in place, you can use data collected from a software-defined radio to train and test the same system using actual data.

Test Your Knowledge
Which of these is not a common augmentation effect for audio data?
Incorrect! Flatten tone is the only one that is not a common augmentation effect for audio data.
Correct!
Seleccione un país/idioma
Seleccione un país/idioma para obtener contenido traducido, si está disponible, y ver eventos y ofertas de productos y servicios locales. Según su ubicación geográfica, recomendamos que seleccione: United States.
También puede seleccionar uno de estos países/idiomas:
Cómo obtener el mejor rendimiento
Seleccione China (en idioma chino o inglés) para obtener el mejor rendimiento. Los sitios web de otros países no están optimizados para ser accedidos desde su ubicación geográfica.
América
- América Latina (Español)
- Canada (English)
- United States (English)
Europa
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)