Can I specify the records in each datastore partition?

2 visualizaciones (últimos 30 días)
Siva
Siva el 9 de En. de 2022
Comentada: Siva el 15 de Sept. de 2023
I create a datastore from a single CSV file which I partition into 3 parts.
>> ds = tabularTextDatastore('airlinesmall.csv') ;
>> subds= partition( ds, 3, 1) ; preview( subds)
ans =
8×29 table
Year Month DayofMonth DayOfWeek DepTime CRSDepTime ArrTime CRSArrTime UniqueCarrier FlightNum TailNum ActualElapsedTime CRSElapsedTime AirTime ArrDelay DepDelay Origin Dest Distance TaxiIn TaxiOut Cancelled CancellationCode Diverted CarrierDelay WeatherDelay NASDelay SecurityDelay LateAircraftDelay
____ _____ __________ _________ _______ __________ _______ __________ _____________ _________ _______ _________________ ______________ _______ ________ ________ _______ _______ ________ ______ _______ _________ ________________ ________ ____________ ____________ ________ _____________ _________________
1987 10 21 3 642 630 735 727 {'PS'} 1503 {'NA'} 53 57 {'NA'} 8 12 {'LAX'} {'SJC'} 308 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'}
1987 10 26 1 1021 1020 1124 1116 {'PS'} 1550 {'NA'} 63 56 {'NA'} 8 1 {'SJC'} {'BUR'} 296 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'}
1987 10 23 5 2055 2035 2218 2157 {'PS'} 1589 {'NA'} 83 82 {'NA'} 21 20 {'SAN'} {'SMF'} 480 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'}
1987 10 23 5 1332 1320 1431 1418 {'PS'} 1655 {'NA'} 59 58 {'NA'} 13 12 {'BUR'} {'SJC'} 296 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'}
1987 10 22 4 629 630 746 742 {'PS'} 1702 {'NA'} 77 72 {'NA'} 4 -1 {'SMF'} {'LAX'} 373 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'}
1987 10 28 3 1446 1343 1547 1448 {'PS'} 1729 {'NA'} 61 65 {'NA'} 59 63 {'LAX'} {'SJC'} 308 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'}
1987 10 8 4 928 930 1052 1049 {'PS'} 1763 {'NA'} 84 79 {'NA'} 3 -2 {'SAN'} {'SFO'} 447 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'}
1987 10 10 6 859 900 1134 1123 {'PS'} 1800 {'NA'} 155 143 {'NA'} 11 -1 {'SEA'} {'LAX'} 954 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'}
>>
I would like to specify the rows of the source dataset I want in each partition. Is that possible?
>> subds= partition( ds, 3, 2) ; preview( subds)
ans =
0×29 empty table
>> subds= partition( ds, 3, 3) ; preview( subds)
ans =
0×29 empty table
>>
Incidentally, why are my 2nd and 3rd partitions empty?

Respuestas (1)

Piyush Dubey
Piyush Dubey el 15 de Sept. de 2023
Hi @Siva,
I understand that you are attempting to create partitions of your datastore and are experiencing difficulties with it. You also observed that the partitions created are empty tables.
Please note that in order to create datastore partitions, the datastore needs to have more than one file. The datastore referred to in the code snippet has only one CSV file and thus cannot be partitioned. You can determine the number of partitions in a datastore using the numpartitions() function and the syntax for this function is demonstrated below:
X=numpartitions(subds)
You will be able to see that the number of partitions remains '1', both before and after attempting to create partitions because there is only one file in the datastore. This is the reason why any partition accessed after index 1 results in an empty table.
If you would like to perform row-wise partition of a file within the datastore, a possible workaround for it would be using the read function. After the data is read to a variable, it can be further sliced, indexed and labeled as shown below:
ds = tabularTextDatastore(airlinesmall.csv);
temp=ds.read;
%extracting a particular column
subds=temp.Month;
%extract the first out of 3 partitions
subds=temp(height(temp)/3, : );
Please refer to the following MathWorks documentation links for more information on DataStore and Indexing:
  1. https://www.mathworks.com/help/matlab/import_export/what-is-a-datastore.html
  2. https://www.mathworks.com/company/newsletters/articles/matrix-indexing-in-matlab.html
Hope this helps.

Productos


Versión

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by