Borrar filtros
Borrar filtros

Info

La pregunta está cerrada. Vuélvala a abrir para editarla o responderla.

Unix code check and REMOVE the datapoints ranging outside 9:00am and 4:15pm for a second by second dataset

1 visualización (últimos 30 días)
I have a list of about 70 million rows. I want to delete the the following and clean the dataset-
  1. Any values which are 0 or in the range of 0.001 or less.
  2. Any values that lie outside the range of 9:00am and 4:15pm
  3. If multiple quotes are present with the same time stamp, then replace that with a single entry of the median price.
I am able ot achive the third point, but not the second and the first one. Can someone guide me with this? Thanks
  4 comentarios
Harsh Rob
Harsh Rob el 20 de Ag. de 2019
Apologies for the confusion caused.
This is the description for the RAW dataset I have-
Column 1 contains the timestamp in the unix format - NEEDS to be a part of cleaned data
The raw dataset is in the unix format(number). However, I want to delete all the datapoints which is a weekend or falls outside the range of 9:00 hrs to 16:15 hrs. We can either do this by converting it into dd/mm/yyyy hh:mm:ss format, or if it can be deleted directly from the unix format(number).
Column 2 contains the price data -NEEDS to be a part of cleaned data
If the prices are 0, delete the entire row
If the prices are less than 0.001, delete the entire row
if the timestamps are same, take the median value of the unique timestamp. (I have figured out this one by using the unique and accumarray functions.)
Column 3 contains - NOT NEEDED to be a part of cleaned data
Not required for my calculation purposes, but a part of RAW data. Can be deleted as well.
Does this explantion make sense ?
Jan
Jan el 21 de Ag. de 2019
@Harsh Rob: I cannot know what "RAW dataset" means. Is it a binary oder text file? Have you been able to import it already? Converting the time to a datevec or datetime object allow to create a matching filter easily.
It is still not clear, how your data are represented. A "timestamp in unix format" could be a UINT64, or s string containing the digits of the UINT64, or something else.
Please post a small example of the inputs.

Respuestas (0)

La pregunta está cerrada.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by