Script fails only when managing large amount of files (Error: Matrix dimensions must agree)

1 visualización (últimos 30 días)
I have 6004 files each one containing the energy readings for several appliances (and the total energy consumed) of one house-day. There are 47 different houses, each one monitored for a different period of time. Each house has a different amount of appliances monitored. The files are written in json and I have to "translate" them to a matrix format to manage them well and do the analysis.
I wrote a function which takes the name of a file as input and returns: (1) a 144 by 25 matrix of doubles (144 energy readings a day, 25 is the maximum amount of sensors a house hold) with the readings of sensor 'n' in column n; (2) a column vector of 144 datetime elements with the date and time of each energy reading (day XX/XX/XX from 00:00:00 until 23:50:00); and (3) a hexadecimal code name of the specific house.
I use this function in a script which reads the directory where the files are stored and returns a struct with the following fields for each house (each house is one "row" of the struct): sensordata, dates, house. Sensordata is a matrix with the energy readings of all the days of one house; dates is a column vector with the dates and times corresponding to the readings; and house is only the code name of the house. The script takes the name of one file at a time, calls the mentioned function, and if the name of the house is equal to the previous one it adds the new matrix with energy readings below the previous ones (like this: matrix=[matrix;new_matrix] I don't know how to preallocate it to avoid changing the size in each iteration). It does the same for the dates vector. If the name of the house is not the same, it starts the next "row" of the struct, corresponding to the next house, and follows the same process. Due to the way I built the loops, the first "row" of the struct is empty (the name of the house doesn't match the previous name, so it starts the next row), so at the end I just do struct=struct(2:last_row) to get rid of it.
When I test the script with a small amount of files it works perfectly. It concatenates properly the data of one day after the previous, it separates the houses in different rows and everything seems to work perfectly. But when I try to use it to read all the files, MATLAB gives me the following error:
Matrix dimensions must agree.
Error in defiles (line 55)
sensor(:,i)=sensor(:,i)+decoded.series(n).data; % Adds the values of sensor i to
colum i
Error in openfiles (line 37)
[sensor,datime,house]=defiles(pathname); % Run function to get the data out
I don't understand why it should give me the error only when the number of files is large enough. If I take, say 100 files, the script works fine. I also have to say that depending on which files it reads, the amount of files for which it works differ (it worked for more than 400 files once but with other files does not work for 200). I guess it has to have something to do with the size of the data readings matrices, which are really big, and my way to concatenate them. But I should be able to have matrices with up to 1.032.125.000 elements and it is not even close to that. The result of the command memory are the following:
Maximum possible array: 7874 MB (8.257e+09 bytes) *
Memory available for all arrays: 7874 MB (8.257e+09 bytes) *
Memory used by MATLAB: 1204 MB (1.263e+09 bytes)
Physical Memory (RAM): 8050 MB (8.441e+09 bytes)
Any idea on what can be happening? How could I solve it? This are my first real (no exercise) functions, so I have no clue and I didn't find any similar question.
The script also calls another function I wrote which reads the metadata file containing the information of what each sensor is and returns a struct with this information (information of sensor 1 in "row" 1, e.g. 'Freezer') and adds this to the struct with the data, dates and name. But I think this does not play any role for the error.
Find here the codes in case they help you:
% Read the names of the files in the active folder (or the folder you wish), then
% calls the function defiles for each file, which decodes it and returns a struct
% with the data and the dates of the measurements, and the name of the house. This
% name is used as input for the function metadata, which returns a struct with the
% appliances each sensor was monitoring and the id code of the house. It finally
% puts it all nicely in a struct. Each row corresponds to a house.
function [all]=openfiles()
% names=dir('/MATLAB Drive/jsons'); where between ' ' is the path of the folder
% with the files to open
names=dir('C:\Chale\dat'); % Creates a struct with information of all the files in the current
% folder. For each file there is a file called name that contains an array of
% chars with the name of the file.
[numfiles,~]=size(names); % returns the number of files in the current folder
% +2 (the first two "files" are '.' and '..').
prehouse='hola'; % Starting of char array with a random word.
preaplis='Sugurú'; % Starting randomly
preid='Gelida'; % Starting randomly
mm=1; % Staring the counter for different houses.
sensors=[]; % Start an array to store the data from the sensors
datatime=[]; % Start an array to store the date and time from the data
alls(50)=struct('sensordata',sensors,'dates',datatime,'apliances',preaplis,'house',prehouse,'id',preid);
% Prealocate fields for speed (in theory there are 47 houses but better more than less)
for nn=3:100%numfiles % Omit rows 1 and 2 ("files" '.' and '..') and go to the last file.
% for nn=numfiles:-1:3
namefile=names(nn).name; % Introduces the name of the nnth file in namefile.
pathfile=names(nn).folder; % Introduces the path of the nnth file in pathfile.
pathname=[pathfile '\' namefile]; % Complete name of the file, including path.
[sensor,datime,house]=defiles(pathname); % Run function to get the data out
% of the nnth file. See function defiles.
[aplishouse,idhouse]=metadata(house);
if strcmp(house,prehouse)==0 % If the name of the house is different than the previous...
alls(mm)=struct('sensordata',sensors,'dates',datatime,'apliances',preaplis,'house',prehouse,'id',preid);
% In the first iteration, alls(1), 'sensordata' and 'dates' are empty
% and 'house' contains 'hola'. The rest, alls(n>1) contain the data,
% dates and names of each of the houses. For example, alls(3) contains
% ALL the data from all the days of the 2nd house (in the order that
% they appear from the command dir (which should be alphabetical order,
% taking into account the date of the measures)). It also contains all
% the dates and times corresponding to each data point in the field
% 'dates'. The field house is only a char with the codename of the house.
mm=mm+1; % Then add one to the house counter.
sensors=[]; % Start again the sensors array
datatime=[]; % Start again the dates array
end
% I don't know how to preallocate sensors and datatime in an easy way to make
% the function faster because I don't previously know how many days each house
% has been taking measurements
sensors=[sensors;sensor]; % Adds the sensor data of the following day
datatime=[datatime;datime]; % Adds the dates and times of the following day
prehouse=house; % Introduces the name of the current house in prehouse.
preaplis=aplishouse; % Same for apliances
preid=idhouse; % Same for id
end
alls(mm)=struct('sensordata',sensors,'dates',datatime,'apliances',aplishouse,'house',prehouse,'id',idhouse);
% Introduces the data, dates, and name of the last house in alls
all=alls(2:mm); % Get's rid of the first empty row as well as the last ones
end
.
% The argument, namefile, must be a char, therefore 'namefile'
% It returns 3 outputs:
% (1) a 144x25 matrix (sensor) with the readings of each sensor ordered
% by column (sensor n in column n + sensor 0 in column 25), and in each
% row the energy readings starting at 00:00 with increments of 10 min
% (144 rows)
% (2) a datetime array (column) column vector with the date and time corresponding
% to each reading from the previous matrix, starting at 00:00 of the date of
% the readings with increments of 10 min (correspondance row to row with
% previous matrix).
% (3) a char array with the codename of the household the data comes from;
% these names seem to be 40 characters long
%
% [Data,time,house]=defiles('namefile')
%
% The script sums the values for all the Channels of each sensor row by row
function [sensor,datime,house]=defiles(namefile)
file = importdata(namefile); % Import data from namefile
data = {file{1}(10:end-2)}; % Extracts the wrapping function loadData()
decoded=jsondecode(data{1}); % Produces a 4 cell struct with names uuid (contains
% the name of the house), date, status, and series (a struct with the
% dada from all pairs sensor-channel
[nn,~]=size(decoded.series); % Finds the total amount of pairs sensor-channel
sensornum=zeros(nn,1); % Produces a vector of the size equal to the previous amount
for n=1:nn
decoded.series(n).sensor=str2double(decoded.series(n).sensor); % Turns sensor numbers from char to double
decoded.series(n).channel=str2double(decoded.series(n).channel); % Turns channel numbers from char to double
sensornum(n)=decoded.series(n).sensor; % Writes the sensor numbers corresponding to each data struct
% I could check here if the size(decoded.series(n).data) contains 144 readings. If it is not,
% ther's a problem with the data of that channel. Make the rest 0? set a warining? Just leave it empty as now?
end
mm=max(sensornum); % Find the largest sensor number in the file
sensor=zeros(144,25); % Start the matrix to store the readings from the sensors. Each collumn will be the data
% for one sensor.
% Here it finds which series corresponds to sensor(s) 0 and adds it at collumn 25
% and if there is more than one sensor 0, it adds up the all the values in each row
for n=1:nn
if decoded.series(n).sensor==0 % Checks if the sensor in positon n is sensor 0
sensor(:,25)=sensor(:,25)+decoded.series(n).data; % Adds the values of sensor 0 to colum 25
end
% Now it goes to each series, checks which sensor it represents and adds it at the
% collumn of the same number (sensor 1 in collumn 1)
% and if there is more than one sensor i, it adds up the all the values in each row
for i=1:mm
if decoded.series(n).sensor==i % Checks if the sensor in positon n is sensor i
sensor(:,i)=sensor(:,i)+decoded.series(n).data; % Adds the values of sensor i to colum i
% because each sensor may have more than one channel and they have to be all summed
end
end
end
% Now I create a datetime array with the date and the times from 00:00:00 to 23:50:00
datime=datetime(decoded.date)+minutes(0:10:1430);
datime=datime(:);
house=decoded.uuid;
end
.
% It returns aplishouse (struct with metadata info) and idhouse (double with the id
% number of the house).
% Input: name of a house
% It compares the imput name with the fields from decodmeta.meters(n).hasheduuid,
% which are the codenames of the houses.
% When there is a match, it imports decodmeta.meters(match).sensors(kk).description;
% where kk is (1:end) indexing all description fields. Puts each description in the
% corresponding field of the struct, e.g. description corresponding to sensor
% 5 in return(5). Last one is return(25) with 'Total' (it corresponds to sensor 0).
% It also returns the number id of the house.
function [aplishouse,idhouse]=metadata(house)
% Add the corresponding path for the file 'chale_metadata2.json'
file = importdata('C:\Chale\chale_metadata2.json'); % Opens metadata file
decodmeta=jsondecode(file{1}); % Translate file in json. Generates a struct called
% meters containing a struct with the filds id (house code number), hasheduuid
% (house codename) and sensors for each house. Sensors is a struct containing
% the id, sensor (sensor number in char), channel, type and description (what
% appliance the sensor has measured). There are data for 47 houses.
for nn=1:47 % There are data for 47 houses
if strcmp(house, decodmeta.meters(nn).hasheduuid) % If the name of the input
% house matches the name stored in the field hasheduuid of row nn, then...
break % To keep the value of nn (house should only mach once with hashe...)
end
end
decodmeta.meters(nn).id=str2double(decodmeta.meters(nn).id); % Turns the field id
% from char to double
idhouse=decodmeta.meters(nn).id; % Double with the id number of the house
[numapl,~]=size(decodmeta.meters(nn).sensors); % Returns the number of appliances
% listed for this house.
for kk=1:numapl
decodmeta.meters(nn).sensors(kk).sensor=str2double(decodmeta.meters(nn).sensors(kk).sensor);
% Turns sensor numbers from char to double.
sensornum(kk)=decodmeta.meters(nn).sensors(kk).sensor; % Writes the sensor
% numbers corresponding to each data struct
end
maxsens=max(sensornum); % Find the largest sensor number for the house
aplishouse(25)=struct('apliance', 'Total'); % Initialize the struct to store what
% each sensor monitored. As sensor 0 is always total, it is put in field 25.
for mm=1:numapl % Run through all fields
for jj=1:maxsens % Run through all possible sensor numbers
if decodmeta.meters(nn).sensors(mm).sensor==jj % If in the fiel number mm, sensor
% number mm equals jj (sensor 0 is already done, so we start at 1), then...
aplishouse(jj).apliance=decodmeta.meters(nn).sensors(mm).description;
% ...then whrite whatever is in the field description to the jj row of aplishouse
end
end
end
end
Sorry for the mega-long question. I hope it is at least understandable.

Respuesta aceptada

Cam Salzberger
Cam Salzberger el 3 de Sept. de 2017
Editada: Cam Salzberger el 3 de Sept. de 2017
Hello Mi,
One thing that I can recommend is entering this command into the Command Window:
dbstop if error
Then try running your code. If/when the code runs into the same error, it will immediately pause execution and go into debug mode. Once you're in debug mode, you can get more information about what is happening. Things I would check while in debug mode:
  • What are the sizes of the arrays you are trying to add?
  • Which of the arrays is not the correct size?
  • How was that array computed?
  • Which file did that data come from?
  • When you try to re-read the file, does the data come in differently (could be a sporadic file-read error)?
It really helps to narrow the issue down to a single file, a couple arrays of data, and one line of code.
Hope this helps get you started!
-Cam
  3 comentarios
Miquel
Miquel el 4 de Sept. de 2017
Editada: Miquel el 4 de Sept. de 2017
Hi, Cam. The issue is that some of the sensors had problems sometimes, therefore there are less than 144 data readings. That's why the dimensions do not agree. Without being able to start the debugging mode when the error arises I wouldn't have seen that in ages. Than you very much for the tip! And thank you too, Walter. The answer is an empty array [], so it shouldn't be an issue... at least right now
Cam Salzberger
Cam Salzberger el 5 de Sept. de 2017
Glad I could help! In that case, you probably want to predicate the indexing, assignment, and other operations based on how many readings there actually are from the file. Or throw out that data if you consider it to be bad when sensors have problems.
Good luck with the rest!

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Logical en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by