parcing comma delimited column to multiple vectors and cell arrays

1 visualización (últimos 30 días)
Hi,
I am importing a series of CSV files of 18 columns each with different row sizes (can be up to 800,000 rows) using teh following codes
for i=1:135
%%Import the data
fullFileName=sprintf('%s%d%s', 'C:\Users\Joseph\Documents\MATLAB\CS\CSV\',i, '.csv') ;
fid = fopen(fullFileName, 'rt');
M=textscan(fid,'%s','collectoutput',1,'headerlines',0);
fclose(fid);
X=M{1,1};
end
The issue is that X is a cell array in which the data is comma delimited. For instance the first two rows are the following: 1st row:
'CUSIP_ID,BOND_SYM_ID,COMPANY_SYMBOL,TRD_EXCTN_DT,TRD_EXCTN_TM,TRC_ST,ASCII_RPTD_VOL_TX,RPTD_PR,YLD_PT,DAYS_TO_STTL_CT,SALE_CNDTN_CD,SPCL_TRD_FL,DISS_RPTG_SIDE_CD,RPTD_HIGH_PR,HIGH_YLD_PT,RPTD_LOW_PR,LOW_YLD_PT,RPTD_LAST_PR'
2nd row
'00846UAG6,A.GF,A,1/3/2011,17:21:06,T,1700000,101.636,4.78396,0,A,,B,0,0,0,0,0'
The first row is the headers of the columns and the second row contains data. All I want is to create cell and numeric variables (depending on the type of data) where each variable has the name of the respective name in the headers and has the corresponding data from the rest of rows. i.e to create cell array called CUSIP_ID with the data {00846UAG6} and another vvector RPTD_PR=[101.636] etc...
is there a way to parce the data of X?
  1 comentario
Jan
Jan el 8 de Jul. de 2012
I do not understand the question. Would textscan(... 'delimiter', ',') solve the problem already?
Btw. it is called "parsing" with "s".

Iniciar sesión para comentar.

Respuestas (1)

Walter Roberson
Walter Roberson el 8 de Jul. de 2012
  3 comentarios
Jan
Jan el 8 de Jul. de 2012
Is this really the same question as above?
C = {'CUSIP_ID', 'BOND_SYM_ID', 'COMPANY_SYMBOL');
FileName2 = ['Issuer' num2str(UIssuer(i))];
save(FileName2, C{:]});
Walter Roberson
Walter Roberson el 8 de Jul. de 2012
Editada: Walter Roberson el 8 de Jul. de 2012
You wrote,
All I want is to create cell and numeric variables (depending on the type of data) where each variable has the name of the respective name in the headers and has the corresponding data from the rest of rows.
You are therefore asking to compute variable names. It is not a good idea to do that; there are many associated problems.
In your situation, I recommend using dynamic field names in a structure, and then saving with save() and the -struct flag.
The parsing is easy:
fieldnames = regexp( FirstRow, ',', 'split');
fieldvals = regexp( SecondRow, ',', 'split');
tempcell = [fieldnames; fieldvals];
savestruct = struct( tempcell{:} );
save( FileName, 'savestruct', '-struct');
The step that this misses is converting numeric-looking fields to numeric values. In order to do that, you have to know ahead of time which fields must be numeric, or you have to set rules about the forms that are okay to convert to numeric. Keep in mind as you construct those rules that some strings that contain the characters 'e', 'E', 'i', 'I', '-', '+' or '.' are considered to be convertible to numeric, so you can end up surprised if something you "know" should be a text field just happened to contain "E0", which is interpretable as "0E0" which is 0.

Iniciar sesión para comentar.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by