简体   繁体   English

在具有多个csv文件的数据存储区中尾随行

[英]Trailing rows in datastore with multiple csv files

Matlab 2015b Matlab 2015b

I have several large (100-300MB) csv files, I want to merge to one and filter out some of the columns. 我有几个大(100-300MB)的csv文件,我想合并为一个并过滤掉某些列。 They are shaped like this: 它们的形状如下:

timestamp         | variable1 | ... | variable200
01.01.16 00:00:00 | 1.59      | ... | 0.5
01.01.16 00:00:01 | ...
.
.

For this task I am using a datastore class including all the csv files: 对于此任务,我使用的是一个包含所有csv文件的数据存储类:

ds = datastore('file*.csv');

When I read all of the entries and try to write them back to a csv file using writetable, I get an error, that the input has to be a cell array. 当我读取所有条目并尝试使用writetable将它们写回到csv文件时,出现错误,输入必须是单元格数组。

When looking at the cell array read from the datastore in debug mode, I noticed, that there are several rows containing only a timestamp, which are not in the original files. 在以调试模式查看从数据存储读取的单元格数组时,我注意到有几行仅包含时间戳,而这些行不在原始文件中。 These columns are between the last row of a file and the first rows of the following one. 这些列位于文件的最后一行和下一个文件的第一行之间。 The timestamps of this rows are the logical continuation of the last timestamp (as you would get them using excel). 该行的时间戳是最后一个时间戳的逻辑延续(就像您使用excel一样)。

Is this a bug or intended behaviour? 这是错误还是预期的行为?

Can I avoid reading this rows in the first place or do I have to filter them out afterwards? 我可以先避免读这些行吗,还是之后必须将其过滤掉?

Thanks in advance. 提前致谢。

As it seems nobody else had this problem, I will share how I dealt with it in the end: 似乎没有其他人遇到这个问题,我将分享最后的处理方式:

toDelete = strcmp(data.(2), '');
data(toDelete, :) = [];

I took the second column of the table and checked for an empty string. 我使用了表格的第二列,并检查了一个空字符串。 Afterwards I filled all faulty rows with an empty array via logical indexing. 之后,我通过逻辑索引用一个空数组填充了所有错误行。 (As shown in the Matlab Documentation) (如Matlab文档中所示)

Sadly I found no method to prevent loading the faulty data, but in the end the amount of data was not to big to do this processing step in memory. 遗憾的是,我没有找到防止加载错误数据的方法,但是最后,要在内存中执行此处理步骤,数据量并不大。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM