[英]Matlab: how handle abnormal data files
I am trying to import a large number of files into Matlab for processing. 我正在尝试将大量文件导入Matlab进行处理。 A typical file would look like this: 一个典型的文件如下所示:
mass intensity
350.85777 238
350.89252 3094
350.98688 2762
351.87899 468
352.17712 569
352.28449 426
Some text and numbers here, describing the experimental setup, eg
Scan 3763 @ 81.95, contains 1000 points:
The numbers in the two columns are separated by 8 spaces. 两列中的数字用8个空格分隔。 However, sometimes the experiment will go wrong and the machine will produce a datafile like this one: 但是,有时实验会出错,并且计算机会生成这样的数据文件:
mass intensity
Some text and numbers here, describing the experimental setup, eg
Scan 3763 @ 81.95, contains 1000 points:
I found that using space-separated files with a single header row, ie 我发现使用带有单个标题行的以空格分隔的文件,即
importdata(path_to_file,' ', 1);
works best for the normal files. 最适合普通文件。 However, it totally fails on all the abnormal files. 但是,它对所有异常文件完全失败。 What would the easiest way to fix this be? 解决此问题的最简单方法是什么? Should I stick with importdata (already tried all possible settings, it just doesn't work) or should I try writing my own parser? 我应该坚持使用importdata(已经尝试了所有可能的设置,但实际上不起作用)还是应该尝试编写自己的解析器? Ideally, I would like to get those values in a Nx2 matrix for normal files and [0 0] for abnormal files. 理想情况下,对于常规文件,我希望在Nx2矩阵中获得这些值,对于异常文件,我想获得[0 0]。
Thanks. 谢谢。
I don't think you need to create your own parser, nor is this all that abnormal. 我认为您不需要创建自己的解析器,也不是那么异常。 Using textscan is your best option here. 在这里,使用textscan是最好的选择。
fid = fopen('input.txt', 'rt');
data = textscan(fid, '%f %u', 'Headerlines', 1);
fclose(fid);
mass = data{1};
intensity = data{2};
Yields: 产量:
mass =
350.8578
350.8925
350.9869
351.8790
352.1771
352.2845
intensity =
238
3094
2762
468
569
426
For your 1st file and: 对于您的第一个文件,以及:
mass =
Empty matrix: 0-by-1
intensity =
Empty matrix: 0-by-1
For your empty one. 为了您的空虚。
By default, text scan reads whitespace as a delimiter, and it only reads what you tell it to until it can no longer do so; 默认情况下,文本扫描将空格作为分隔符读取,并且仅读取您告诉它的内容,直到不再能够读取为止。 thus it ignores the final lines in your file. 因此,它会忽略文件中的最后几行。 You can also run a second textscan after this one if you want to pick up those additional fields: 如果要提取其他字段,也可以在此之后进行第二次文本扫描:
fid = fopen('input.txt', 'rt');
data = textscan(fid, '%f %u', 'Headerlines', 1);
mass = data{1};
intensity = data{2};
data = textscan(fid, '%*s %u %*c %f %*c %*s %u %*s', 'Headerlines', 1);
scan = data{1};
level = data{2};
points = data{3};
fclose(fid);
Along with your mass and intensity data gives: 连同您的质量和强度数据可以得出:
scan =
3763
level =
81.9500
points =
1000
what do you mean 'totally failes on abnormal files'? 您的意思是“完全在异常文件上失败”?
you can check if importdata finds any data using eg 您可以使用以下方法检查importdata是否找到任何数据
>> imported = importdata(path_to_file,' ', 1);
>> isfield(imported, 'data')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.