Cleaning of large files of data

3 visualizaciones (últimos 30 días)
Abm
Abm el 7 de Jun. de 2019
Comentada: Walter Roberson el 11 de Ag. de 2021
Hello all,
I am working with large files of data of solar irradiance and electricity consumption in order to study the reliability of a microgrid. The data contains minutely samples for one year and there are so many missing values in each file, so I would like to clean it before analyize it but dont know how. I am a beginner in matlab and appreciate all help you can offer me.
One possible way stands in the litterature about dealing with this issue is to group the data into 24-hours subset, each willl coresponding to a calender day. any 24-h subset with more than four hours of missing data, consecutive or discarded. The missing values for the remaining 24-h subset can be synthesized through linear interpolation. Then I want to convert the minutly data to hourly averages.
The most important thing for me that both solar irradiance and load data har data i samma tid so that I can analyize them.
So much thanks to alla feedback.
Best regards.
  17 comentarios
dpb
dpb el 9 de Jun. de 2019
See my above logic looking at the difference in time vector after removing isnan() elements--any day in which that difference >=4 is to be removed entirely; all others can be retained from the full file and interpolated.
That's doable without looping I believe.
Abm
Abm el 9 de Jun. de 2019
@dpb, thank you so much! could you suggest a way to write it in matlab? I am still confused!

Iniciar sesión para comentar.

Respuesta aceptada

dpb
dpb el 10 de Jun. de 2019
Editada: dpb el 10 de Jun. de 2019
Start something like
ttclean=rmissing(tt); % the subset with no missing data only
dt=diff(ttclean.Time); % the difference in timestampe in clean data
is4hrgap=any(dt>=hours(4)); % are there any gaps of 4hr or more?
If the above is false (0), you can use the previous result with linear interpolation; you've proved there is no gap of 4 hours or more in the data.
OTOH, if the above is true (1), then you need to find which days are those--this can be done by retrieving the individual elements of the logical vector
ixGap=find(dt>=hours(4))+1; % the indices in clean series of gaps
From this index, return the year,month and day of those elements and then remove those days from the original interpolated series. (If you remove and then re-interpolate with retime, it will just fill in all the elements again so you have to remove them afterwards).
gapDays=unique(dateshift(ttclean.Time(ixGap)); % the unique days
Those days should be removed from the interpolated result above as Walter showed.
  6 comentarios
dpb
dpb el 11 de Jun. de 2019
Oh...I was just typing in the edit window and left off the 'start' parameter in computing the day times. (I would have thought that would have been apparent to you when you looked up the documentation for dateshift to see what the error was and saw the missing parameter that the point of the exercise is to get the day(s) that have gaps in a list, though).
At that point, then the set member functions should be useful to find the intersection of the time data between that list and the included times -- remember to compare you'll have to also look at matching the day only part in the time table time stamp by also doing a dateshift on them; days won't match to day+time.
Geospatial Engineer
Geospatial Engineer el 11 de Ag. de 2021
Editada: Geospatial Engineer el 11 de Ag. de 2021
hello everyone.. im working with snow data in which present in one colon time and in other colon ... and I have some missing value with pass of 7 minutes..now I would like to fill this gaps of 7 minutes with the previous interval of time so the last interval of time before this gaps... can someone help me please?

Iniciar sesión para comentar.

Más respuestas (1)

Cris LaPierre
Cris LaPierre el 9 de Jun. de 2019
Consider looking through this example in MATLAB. It might give you some ideas of how to handle the missing data.
  2 comentarios
Geospatial Engineer
Geospatial Engineer el 11 de Ag. de 2021
Editada: Geospatial Engineer el 11 de Ag. de 2021
hello everyone.. im working with snow data in which present in one colon time and in other colon other parameter.. and I have some missing value with pass of 7 minutes..now I would like to fill this gaps of 7 minutes with the previous interval of time so the last interval of time before this gaps... can someone help me please?
Walter Roberson
Walter Roberson el 11 de Ag. de 2021
If you change to timetable() then you can use retime with 'previous'

Iniciar sesión para comentar.

Categorías

Más información sobre Calendar en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by