Appending dataset of varying length

I have a collection of monthly 1 minute averaged data files that I would like to import, append, and process. I would like to handle the data in Matlab using a dataset array:
wind = dataset('file','Halkirk1_12_2010_average1min.csv','delimiter',',','format',['%s' repmat(' %f',1,72)]);
Due to the nature of the files, they are varying lengths so vertcat does not work. I will use datevec to pick up the next file to append. Is there a function or method that will append two datasets of varying lengths? Any tips or thoughts would be appreciated.

2 comentarios

Oleg Komarov
Oleg Komarov el 17 de Abr. de 2011
You mean you have different number of columns?
Braden
Braden el 18 de Abr. de 2011
No, the csv files have a different number of rows. i.e. the number of 1 min averages in January's file is more than February because there are 31 days in January vs. 28 in February.

Iniciar sesión para comentar.

 Respuesta aceptada

Oleg Komarov
Oleg Komarov el 18 de Abr. de 2011
The number of rows is not a problem:
A = dataset({rand(10,1),'col1'});
B = dataset({rand(20,1),'col1'});
C = [A;B]
If you have an error, report the full error msg.

Más respuestas (2)

Laura Proctor
Laura Proctor el 18 de Abr. de 2011

0 votos

You can merge datasets using the JOIN function.

4 comentarios

Braden
Braden el 18 de Abr. de 2011
From what I read, it sounded like the join function was more if there was overlapping observations. I got this error when I tried using the join function.
>> wind = dataset('file','Halkirk1_12_2010_average10min.csv','delimiter',',','format',['%s' repmat(' %f',1,72)]);
>> wind2 = dataset('file','Halkirk1_01_2011_average10min.csv','delimiter',',', 'format',['%s' repmat(' %f',1,72)]);
>> C = join(wind,wind2)
??? Error using ==> dataset.join>simplejoin at 335
The key variable for B must have contain all values in the key variable for A.
Error in ==> dataset.join at 249
ir = simplejoin(leftkey,rightkey);
Laura Proctor
Laura Proctor el 19 de Abr. de 2011
A full outer join will return the union of observations from both datasets.
c = join(a,b,'Type','outer')
Oleg Komarov
Oleg Komarov el 19 de Abr. de 2011
@Laura: which is not what the op wants if she wants to append.
Braden
Braden el 19 de Abr. de 2011
This is correct Oleg. 'He' wants to append.

Iniciar sesión para comentar.

Richard Willey
Richard Willey el 19 de Abr. de 2011
Hi Oleg
This strikes me as more of a data representation issue than a question of MATLAB syntax. Your eventual solution will depend on how you want to describe "time". You seem to be assuming a "wide" format in which the observations for each month are stored as separate variables and each row represents a separate one minute average (the first one minute average in the month, the second one minute average in the month, ...) This format will work fine, however, you might need to use some NaNs to pad out some of the monthes.
You might find it easier if you created a variable labeled "Time" and used this to measure all of your observations. You could create separate variables that track what month this time value corresponds to, what day of the week it is, whether its a holiday, what have you.
I'm attaching some code that I wrote a while back that grabs data from xls files and automatically creates nominal variables based on the file name.
Hope that this proves helpful
%%Loading Data into MATLAB
clear all
clc
% This script assumes that we have a set of XLS files.
% Each XLS file contains a separate spark sweep
% We're interested in combining all these files into a dataset array
% After which, we're going to identify the minimum BSFC for each spark
% sweep
%Identify where to search for files
Location = 'H:\Documents\MATLAB\BSFC\';
% Store the name of all .xls files as a vector D
D = dir([Location, '*.xls']);
% Create a dataset array from the file that is the first element in D
name = D(1) .name
Engine = dataset('xlsfile',name);
% Use the name of the file as a nominal variable
% The nominal variable can be used to note that all these rows came from
% the file with name = "name"
% Start by stripping off the ".xls" extension
name = name(1:end-4);
% Write the name to the dataset array and convert to a nominal
Engine.Name = repmat(name,length(Engine),1);
Engine.Name = nominal(Engine.Name);
% Repeat for all the rest of the .xls files in the "Location".
% Each new file with be vertically concatenated with the
% original dataset array
f = @(x,y) vertcat(x,y);
parfor i = 2 : length(D)
name = D(i) .name
Engine2 = dataset('xlsfile',name);
name = name(1:end-4);
Engine2.Name = repmat(name,length(Engine2),1);
Engine2.Name = nominal(Engine2.Name);
Engine = f(Engine, Engine2);
end

10 comentarios

Braden
Braden el 19 de Abr. de 2011
Thanks for the further tips Richard. It has proven to be helpful! I applied your method of using the data file names but was returned an error. I'm guessing that this has something to do using 'name' with the dataset syntax.
Error:
??? Error using ==> dataset.dataset>readFile at 472
Unable to open the file Halkirk1_10_2010_average10min.csv for reading.
Error in ==> dataset.dataset>dataset.dataset at 267
a = readFile(a,fileArg,otherArgs);
Error in ==> Untitled2 at 18
wind = dataset('file',name,'delimiter',',','format',['%s' repmat('
%f',1,72)]);
Code used:
clear all
clc
%Identify where to search for files
Location = 'C:\Halkirk_Data_Complete\Averaged\Halkirk1_average10min\test\';
D = dir([Location, '*.csv']);
% Store the name of all .csv files as a vector D
name = D(1) .name;
% Create a dataset array from the file that is the first element in D
wind = dataset('file',name,'delimiter',',','format',['%s' repmat(' %f',1,72)]);
% Strip off .csv extension
name = name(1:end-4);
% Write the name to the dataset array and convert to a nominal
wind.Name = repmat(name,length(wind),1);
wind.Name = nominal(wind.Name);
% Repeat for all the rest of the .csv files in the "Location".
% Each new file with be vertically concatenated with the
% original dataset array
f = @(x,y) vertcat(x,y);
parfor i = 2 : length(D)
name = D(i) .name;
wind2 = dataset('file',name,'delimiter',',','format',['%s' repmat(' %f',1,72)]);
name = name(1:end-4);
wind2.Name = repmat(name,length(wind2),1);
wind2.Name = nominal(wind2.Name);
wind = f(wind, wind2);
end
Braden
Braden el 27 de Abr. de 2011
Anybody have an idea of why this code isn't running? Am I correct in assuming that it has something to do with the dataset function?
Oleg Komarov
Oleg Komarov el 28 de Abr. de 2011
As it says it's unable to open the file Halkirk1_10_2010_average10min.csv
Is the file there? Is the name correct? Try to fopen('...\Halkirk1_10_2010_average10min.csv') and see what it returns (doc fopen).
Braden
Braden el 2 de Mayo de 2011
Yes the file name is there and correct. fopen gives the output that signifies that it successfully opened the file (file identifier greater than or equal to 3).
Oleg Komarov
Oleg Komarov el 3 de Mayo de 2011
What if you try to use alone:
wind2 = dataset('file',name,'delimiter',',','format',['%s' repmat(' %f',1,72)]);
Or if you try to use a simple for instead of parfor.
Braden
Braden el 3 de Mayo de 2011
Thanks for the help Oleg. It turns out I didn't have the Current Folder set correctly. Silly me.
Although now I am running into 'Out of Memory' issues when trying to work with my dataset, although it is only 32000x75 pts...
Oleg Komarov
Oleg Komarov el 3 de Mayo de 2011
That's very weird. Datasets do add overhead but even a cell-array:
C(1:32000,1:75) = deal({'qwertyuiop'});
is just ~180 mb.
Braden
Braden el 6 de Mayo de 2011
Thanks for the help Oleg. The dataset is 30000x75 named 'wind'. I run this code to initialize storage of variables calculated from the dataset:
wind.vhub = zeros(size(wind,1),1);
wind.newWinV = zeros(size(wind,1));
wind.newWinVMax = zeros(size(wind,1));
wind.newWinVMin = zeros(size(wind,1));
wind.newSonWinV = zeros(size(wind,1));
wind.newSonWinVMax = zeros(size(wind,1));
wind.newSonWinVMin = zeros(size(wind,1));
wind.phub = zeros(size(wind,1),1);
wind.rho = zeros(size(wind,1),1);
wind.Cp = zeros(size(wind,1),1);
wind.normpow = zeros(size(wind,1),1);
This is the error that I get:
Error using ==> zeros
Out of memory. Type HELP MEMORY for your options
These are my memory stats:
Maximum possibly array: 449 MB
Memory available for all arrays: 1335 MB
Memory used by MATLAB: 414 MB
Physical Memory (RAM): 3454 MB
Teja Muppirala
Teja Muppirala el 6 de Mayo de 2011
A very common mistake.
zeros(size(wind,1)) <--- Out of memory. This is not what you meant.
zeros(size(wind,1),1) <--- This is what you meant to write
Fix all of those lines to be:
wind.vhub = zeros(size(wind,1),1);
wind.newWinV = zeros(size(wind,1),1);
wind.newWinVMax = zeros(size(wind,1),1);
wind.newWinVMin = zeros(size(wind,1),1);
wind.newSonWinV = zeros(size(wind,1),1);
wind.newSonWinVMax = zeros(size(wind,1),1);
wind.newSonWinVMin = zeros(size(wind,1),1);
wind.phub = zeros(size(wind,1),1);
wind.rho = zeros(size(wind,1),1);
wind.Cp = zeros(size(wind,1),1);
wind.normpow = zeros(size(wind,1),1);
Braden
Braden el 6 de Mayo de 2011
ah yes! thanks for picking up on that! your help is much appreciated.

Iniciar sesión para comentar.

Categorías

Preguntada:

el 17 de Abr. de 2011

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by