Random amino acid sequence generation with a given amino acid count of a specified sequence
3 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Sk. Hassan
el 8 de Jun. de 2022
Editada: Tim DeFreitas
el 22 de Jun. de 2022
I have a sequence
>sp|Q7RTX1|TS1R1_HUMAN Taste receptor type 1 member 1 OS=Homo sapiens OX=9606 GN=TAS1R1 PE=2 SV=1
MLLCTARLVGLQLLISCCWAFACHSTESSPDFTLPGDYLLAGLFPLHSGCLQVRHRPEVT
LCDRSCSFNEHGYHLFQAMRLGVEEINNSTALLPNITLGYQLYDVCSDSANVYATLRVLS
LPGQHHIELQGDLLHYSPTVLAVIGPDSTNRAATTAALLSPFLVPMISYAASSETLSVKR
QYPSFLRTIPNDKYQVETMVLLLQKFGWTWISLVGSSDDYGQLGVQALENQATGQGICIA
FKDIMPFSAQVGDERMQCLMRHLAQAGATVVVVFSSRQLARVFFESVVLTNLTGKVWVAS
EAWALSRHITGVPGIQRIGMVLGVAIQKRAVPGLKAFEEAYARADKKAPRPCHKGSWCSS
NQLCRECQAFMAHTMPKLKAFSMSSAYNAYRAVYAVAHGLHQLLGCASGACSRGRVYPWQ
LLEQIHKVHFLLHKDTVAFNDNRDPLSSYNIIAWDWNGPKWTFTVLGSSTWSPVQLNINE
TKIQWHGKDNQVPKSVCSSDCLEGHQRVVTGFHHCCFECVPCGAGTFLNKSDLYRCQPCG
KEEWAPEGSQTCFPRTVVFLALREHTSWVLLAANTLLLLLLLGTAGLFAWHLDTPVVRSA
GGRLCFLMLGSLAAGSGSLYGFFGEPTRPACLLRQALFALGFTIFLSCLTVRSFQLIIIF
KFSTKVPTFYHAWVQNHGAGLFVMISSAAQLLICLTWLVVWTPLPAREYQRFPHLVMLEC
TETNSLGFILAFLYNGLLSISAFACSYLGKDLPENYNEAKCVTFSLLFNFVSWIAFFTTA
SVYDGKYLPAANMMAGLSSLSSGFGGYFLPKCYVILCRPDLNSTEHFQASIQDYTRRCGS
T
I wish to get a set of 10000 sequences (in fasta format) having identical amino acid counts as in the above sequence.
I could not use properly the randseq function in getting what I need.
Any help would be highly appreciated.
0 comentarios
Respuesta aceptada
Tim DeFreitas
el 8 de Jun. de 2022
Editada: Tim DeFreitas
el 22 de Jun. de 2022
If you want exactly the same amino acid counts, then you want to randomly shuffle the input sequence, which can be done with randperm:
[~, sequences] = fastaread('pf00002.fa');
targetSeq = sequences{1}; % Select specific sequence from FA file.
randomSeqs = cell(1,10000);
for i=1:numel(randomSeqs)
randomSeqs{i} = targetSeq(:, randperm(numel(targetSeq)));
end
0 comentarios
Más respuestas (1)
Sam Chak
el 8 de Jun. de 2022
Hi @Sk. Hassan
I'm no expert in this, but it is possible to generate a sequence of numbers that associate with the Roman alphabet.
Here is a simple script that you can modify to generate a long sequence of characters like the PASTA spaghetti form.
function amino = generateAmino(n)
ASCII_L = 65;
ASCII_U = 90;
C = round((ASCII_U - ASCII_L).*rand(n, 1) + ASCII_L);
amino = char(C');
end
On the Command Window:
S1 = generateAmino(60)
S1 =
'TGNRWYODEGVGUGXJFGPMJVPOXHTTKOCBNTXDOMAIEUINEPHQRTLCGXEVNZCL'
You can then use a for loop to generate the number of sequences that you want.
numOfSeq = 10000;
for i = 1:numOfSeq
S(i,:) = generateAmino(60);
end
S
That's the basic idea. You may need to modify the script to select certain Roman alphabets in the ASCII chart.
0 comentarios
Ver también
Categorías
Más información sobre QSP, PKPD, and Systems Biology en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!