Helping filling in missing data sections with NaN

6 visualizaciones (últimos 30 días)
Kendra Wright
Kendra Wright el 19 de Mzo. de 2015
Editada: dpb el 20 de Mzo. de 2015
What I have is a large data set with five minute data for about 5 months. It comes out to be quite a large array. It's called B and is 37493x16. The time stamp is saved as a serial number of day of the year. (For example, May 1st, at 6am is 121.25). There are a lot of missing data points in there, and to get to the next stage of processing the person helping me needs the five minute time stamp with NaN filled in when there is no data for that five minute period. Seems simple enough but....
What I tried to do is this:
I built an array of the five minute time stamp and 15 columns of NaN. It's called full 48096x16. Then I built a nested loop with the idea of taking each line of B, looking for the match of the first column of B to the array full, then replacing that line of full with the line of B.
So with the size of both loops, this is very time consuming, and it kills my old laptop before finishing. I suspect it doesn't work anyway. I am certain this is not the most efficient way of doing this.
My loops looks like this:
for i=1:length(y2)
for j = 3:length(B)
if B(j,1) == full(i,1)
full(i,:)=B(j,:)
end
end
end
I know I've heard beware nested loops. Can anyone either help to make this work or show me a more efficient method of getting those NaN data points in?
  10 comentarios
Guillaume
Guillaume el 20 de Mzo. de 2015
For some reason, I thought that date vectors had a resolution of a second, which would have lost the submillisecond differences between two datenum generated two different ways.
Turn out, I was wrong. You're right, the datevec doesn't help... unless you round to integer.
dpb
dpb el 20 de Mzo. de 2015
Editada: dpb el 20 de Mzo. de 2015
Ahhh...ok, that would make sense. Thanks for the response I was curious if I had somehow overlooked something obvious.
Meanwhile our poster seems to have disappeared so don't know if that means she got her solution or has given up...

Iniciar sesión para comentar.

Respuestas (2)

dpb
dpb el 19 de Mzo. de 2015
As Per notes, depending on how you built the time stamp vector vis a vis how it was generated in the sample data set it's almost certain the above floating point compare will fail for some desired cases.
The problem here in solving your problem is we don't know how the original timestamp was actually created in order to reproduce it exactly. The way in Matlab to generate a unique and reproducible timestamp is to use date numbers with the spacing at whatever level needed to create an integer vector array and use the internal smarts of datenumber to roll over time fields. In your case to generate it do something like--
Presume have start/stop dates and times as variables x0 and x1, respectively, then
nMins=(datenum(yr1,mo1,da1,hr1,mn1,0)-datenum(yr0,mo0,da0,hr0,mn0,0))*24*60;
dn=datenum(yr0,mo0,da0,hr0,[0:5:nMins].',0);
The robust thing to do would be to convert the given date stamps to calendar dates and then transform that vector also to Matlab date numbers; internally the conversions to floating point are done identically and hence the rounding issues are handled consistently.
To find the locations to fill in, use intersect on the column of dates. If this isn't sufficient, post a short section of the file and the info for the actual start/stop correlating times and can demonstrate the process numerically...I went through a lengthy treatise on this idea just the last few days with another similar question at creating-a-matrix-after-giving-initial-conditions

Guillaume
Guillaume el 19 de Mzo. de 2015
[~, column] = ismember(B(1, :), full(1, :));
full(:, column) = B
Note that full is a built-in function of matlab, so it may not be a good idea to use that name for a variable.
  4 comentarios
dpb
dpb el 19 de Mzo. de 2015
Editada: dpb el 20 de Mzo. de 2015
This has the same problem I outlined above of ismember is an exact floating point comparison and unless you've generated the time vectors consistently there IS going to be rounding error(*).
The result for the location array in ismember is--
[LIA,LOCB] = ismember(A,B) also returns an array LOCB containing the
highest absolute index in B for each element in A which is a member
of B and 0 if there is no such index.
Note the last phrase...and that '0' is not a positive integer and Matlab arrays are 1-based.
(*) Demonstration of the issue...
>> tstmp=0:5/(24*60):200; % generate a five-minute delta t by floats
>> dstmp=datenum(2015,1,1,0,[0:5:tstmp(end)*24*60],0); % and by integers
>> dstmp=dstmp-dstmp(1); % reduce to same range
>> all(tsmp==dstmp) % are they all the same? NO!!!
ans =
0
>> sum(tstmp==dstmp) % how many succeed?
ans =
6145
>> sum(tstmp==dstmp)/length(dstmp) % what fraction succeed?
ans =
0.1111
>>
Moral: Floating point series aren't precise when the delta isn't exactly representable as a binary fraction.
Guillaume
Guillaume el 19 de Mzo. de 2015
Editada: Guillaume el 20 de Mzo. de 2015
@Kendra,
Yes, no loop required. Just the two lines.
If you get the error Subscript indices must either be real positive integers or logicals, that's because one of your header in B is not found in C. The cause is indeed, as dpb and per warned, comparison of floating points that have been generated two different ways. They may appear the same, but small rounding errors mean that they differ by a tiny amount.
Since you're on a version before the introduction of datetime, one workaround would be:
Bdates = round(datevec(B(1, :))); %convert to date vectors
Cdates = round(datevec(C(1, ;))); %convert to date vectors
[~, columns] = ismember(Bdates, Cdates, 'rows');
assert(all(column > 0), 'some dates in B are not in C')
C(:, columns) = B
edit: After dpb's comment, I realised that a simple conversion to datevec doesn't help. Rounding the date vector does, so I've added that to the code.

Iniciar sesión para comentar.

Categorías

Más información sobre Data Type Conversion en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by