Trailing rows in datastore with multiple csv files

Question

Matlab 2015b

I have several large (100-300MB) csv files, I want to merge to one and filter out some of the columns. They are shaped like this:

timestamp         | variable1 | ... | variable200
01.01.16 00:00:00 | 1.59      | ... | 0.5
01.01.16 00:00:01 | ...
.
.

For this task I am using a datastore class including all the csv files:

ds = datastore('file*.csv');

When I read all of the entries and try to write them back to a csv file using writetable, I get an error, that the input has to be a cell array.

When looking at the cell array read from the datastore in debug mode, I noticed, that there are several rows containing only a timestamp, which are not in the original files. These columns are between the last row of a file and the first rows of the following one. The timestamps of this rows are the logical continuation of the last timestamp (as you would get them using excel).

Is this a bug or intended behaviour?

Can I avoid reading this rows in the first place or do I have to filter them out afterwards?

Thanks in advance.

Answer 1

As it seems nobody else had this problem, I will share how I dealt with it in the end:

toDelete = strcmp(data.(2), '');
data(toDelete, :) = [];

I took the second column of the table and checked for an empty string. Afterwards I filled all faulty rows with an empty array via logical indexing. (As shown in the Matlab Documentation)

Sadly I found no method to prevent loading the faulty data, but in the end the amount of data was not to big to do this processing step in memory.

Trailing rows in datastore with multiple csv files

Question

Matlab 2015b

1 answers

solution1
0 2016-12-12 12:25:58

Trailing rows in datastore with multiple csv files

Question

Matlab 2015b

1 answers

solution1 0 2016-12-12 12:25:58

solution1
0 2016-12-12 12:25:58