New to Matlab- How to import a text file for analysis that contains numbers and characters?

I have a text file that has columns that contain both numbers and characters (attached). What function can I use to be able to read the file for further analysis? I need to calculate the average reaction times based the types of stimuli.

Respuestas (2)

If you need it only once (or few times), use the Import utility of Matlab (Home\Import Data - a green downward arrow); chose the output type either "table" or "numeric array" depending on what you want. To work with this kind of files repeatedly, you should write a program using file operations (fopen, fread, fclose, ...) and string processing tools. Check the Matlab documentation.
Using textscan command you can get the required data from the input text file, data would be extracted in the cell data type
try this to extract the data from input file
fmt = '%d\t%d\t%s\t%d\t%d\t%d\t%s\t%d\t%d\t%s\t%s'; % format of the text file
f = fopen('T001_1.txt', 'rt'); % open input file
% get the file data to variable data(cell data type)
data = textscan(f, fmt ,'HeaderLines',1,'CollectOutput',true);
fclose(f); % close file identifier
%% do your calculation on extracted "data"

5 comentarios

The default delimiter includes the horizontal tab character, so explicitly specifying it in the format string is not required and is confusing. It should be removed from the format string:
fmt = '%d%d%s%d%d%d%s%d%d%s%s'; % not checked
If it is really required to specify a different delimiter, then use the 'Delimiter' option rather than forcing it into the format string:
It might be easier to access the imported data without the 'CollectOutput' option, as then each column is separately indexable.
Thank you for this. When I ran the code I got an error message (Index in position 2 exceeds array bounds (must not exceed 6)). How can this be fixed?
%% Opening file to store Analysed data
targetfile= ['Outputfile',date,'.txt'];
fid1 = fopen(targetfile,'w');
fprintf(fid1,'Sub_Number \t Sub_Condition \t Avg_MDC_LB \t Avg_MDC_RB \t Avg_MDC_LG \t Avg_MDC_RG \t Avg_DC_LB \t Avg_DC_RB \t Avg_DC_LG \t Avg_DC_RG \t Percent_Error \n');
%MDC = Misdirectional Cue, DC = Directional Cue, LB = Left Blue, RB = Right Blue,LG
%= Left Green, RG = Right Green
subject_number = {'V001' 'V002' 'V003' 'V004' 'V005'};
subject_condition = {'1' '2'};
for sub_number = 1:length(subject_number)
for sub_condition = 1:2
fileName = cell2mat([subject_number(sub_number), '_', num2str(sub_condition) '.txt']);
% Opening Data File
fid = fopen(fileName,'r');
if(fid==-1)
continue;
end
% Allocate imported array to column variable names
fmt = '%d%d%s%d%d%d%s%d%d%s%s'; % format of the text file
fileData = textscan(fid, fmt,'HeaderLines',1,'CollectOutput',true);
%% Calculate
CueSecond = fileData (:,1);
CueMillisecond = fileData (:,2);
CueType = fileData (:,3);
CueDuration = fileData (:,4);
StimulusSecond = fileData (:,5);
Stimulus_ms = fileData (:,6);
StimType = fileData (:,7);
ResponseSecond = fileData (:,8);
Response_ms = fileData (:,9);
ResponseType = fileData (:,10);
ResponseCorrect = fileData (:,11);
fprintf(fid1,'%s \t %g \t %g \t %g \t %g \t %g \t %g \t %g \t %g \t %g \t %g \t n',sub_number, sub_condition, Avg_MDC_LB, Avg_MDC_RB, Avg_MDC_LG, Avg_MDC_RG, Avg_DC_LB, Avg_DC_RB, Avg_DC_LG, Avg_DC_RG, Percent_Error);
end
end
fclose(fid1);
When you use CollectOutput, all of the consecutive items with the same format are gathered into one output. So %d%d is put into one array, then %s into a second, then %d%d%d into a third, %s a fourth, %d a fifth, then %s%s into the sixth.
It might be easier for you to remove the CollectOutput option. And you should probably change filedata(:,1) to filedata{1}, filedata(:,2) to filedata{2} and so on.
fprintf(fid1,'%s \t %g \t %g \t %g \t %g \t %g \t %g \t %g \t %g \t %g \t %g \t n',sub_number, sub_condition, Avg_MDC_LB, Avg_MDC_RB, Avg_MDC_LG, Avg_MDC_RG, Avg_DC_LB, Avg_DC_RB, Avg_DC_LG, Avg_DC_RG, Percent_Error);
That is not going to work for you because of the way that fprintf goes down columns of data. You should have a look at compose()
Thank you, this was very helpful. Also, I haven't realized how tricky it is to perform simple math with numbers stored in cell arrays. Any tips on how I can calculate the mean of (Response_ms - Stimulus_ms)?
"How can this be fixed?"
By reading the comment that i wrote two days ago, where I recommended removing the 'CollectOutput' option, and explained why.
"I haven't realized how tricky it is to perform simple math with numbers stored in cell arrays. Any tips on how I can calculate the mean of"
By reading Walter Roberson's comment from four horus ago, which includes the advice to change your cell array indexing (from parentheses to curly braces):
CueSecond = fileData{1}; % Get the CONTENT of the 1st cell
CueMillisecond = fileData{2}; % Get the CONTENT of the 2nd cell
etc.
You should also revise cell array indexing

Iniciar sesión para comentar.

Categorías

Más información sobre Large Files and Big Data en Centro de ayuda y File Exchange.

Preguntada:

el 27 de Dic. de 2019

Editada:

el 29 de Dic. de 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by