How to pre-process Next Generation Sequencing data using MATLAB?
2 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
E V
el 22 de Sept. de 2016
Comentada: E V
el 22 de Sept. de 2016
I am trying to use MATLAB for pre-processing of NGS data. Can anyone suggest a comprehensive code for this procedure. I have tried codes suggested in this page but the codes can only be used for a limited number of tasks. For example I don't know how to filter (or mask) reads shorter than 10 nucleotides or how to treat paired-end reads. Moreover, how can I filter reads that have more than two N nucleotides? can anyone suggest a comprehensive reference for these tasks?
0 comentarios
Respuesta aceptada
Luuk van Oosten
el 22 de Sept. de 2016
Dear Ehsan,
The page you refer to is a good start to get familiar with processing NGS data, but there are (a lot more!!) functions in the BioInformatics toolbox that will help you with preprocessing. Now to your specific questions:
(1) how to filter (or mask) reads shorter than 10 nucleotides:
your_filtered_data = seqfilter(yourdata.fastq, 'Method','MinLength','Threshold',10)
(2) how to treat paired-end reads
You are in luck, as there exist this thingy called 'seqsplitpe', which allows you to split merged paired-end sequences into separate files (if that is something you want).
(3) how can I filter reads that have more than two N nucleotides
This is probably a combination of (a) importing your sequences and then (b) searching your sequences for your specific repeat of >N nucleotides. I believe there are no pre-fabricated functions in MATLAB for this, but there are numerous functions which allow you to analyze sequences in the Bioinformatics Toolbox.
Best regards
Más respuestas (0)
Ver también
Categorías
Más información sobre Bioinformatics Toolbox en Help Center y File Exchange.
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!