Matlab：如何处理异常数据文件

Question

I am trying to import a large number of files into Matlab for processing. 我正在尝试将大量文件导入Matlab进行处理。 A typical file would look like this: 一个典型的文件如下所示：

    mass      intensity
 350.85777         238
 350.89252        3094
 350.98688        2762
 351.87899         468
 352.17712         569
 352.28449         426
Some text and numbers here, describing the experimental setup, eg  
Scan 3763 @ 81.95, contains 1000 points:

The numbers in the two columns are separated by 8 spaces. 两列中的数字用8个空格分隔。 However, sometimes the experiment will go wrong and the machine will produce a datafile like this one: 但是，有时实验会出错，并且计算机会生成这样的数据文件：

mass      intensity

Some text and numbers here, describing the experimental setup, eg  
Scan 3763 @ 81.95, contains 1000 points:

I found that using space-separated files with a single header row, ie 我发现使用带有单个标题行的以空格分隔的文件，即

importdata(path_to_file,' ',  1);

works best for the normal files. 最适合普通文件。 However, it totally fails on all the abnormal files. 但是，它对所有异常文件完全失败。 What would the easiest way to fix this be? 解决此问题的最简单方法是什么？ Should I stick with importdata (already tried all possible settings, it just doesn't work) or should I try writing my own parser? 我应该坚持使用importdata（已经尝试了所有可能的设置，但实际上不起作用）还是应该尝试编写自己的解析器？ Ideally, I would like to get those values in a Nx2 matrix for normal files and [0 0] for abnormal files. 理想情况下，对于常规文件，我希望在Nx2矩阵中获得这些值，对于异常文件，我想获得[0 0]。

Thanks. 谢谢。

Answer 1

I don't think you need to create your own parser, nor is this all that abnormal. 我认为您不需要创建自己的解析器，也不是那么异常。 Using textscan is your best option here. 在这里，使用textscan是最好的选择。

fid = fopen('input.txt', 'rt');
data = textscan(fid, '%f %u', 'Headerlines', 1);
fclose(fid);

mass = data{1};
intensity = data{2};

Yields: 产量：

mass =
  350.8578
  350.8925
  350.9869
  351.8790
  352.1771
  352.2845

intensity =
         238
        3094
        2762
         468
         569
         426

For your 1st file and: 对于您的第一个文件，以及：

    mass =
       Empty matrix: 0-by-1

    intensity =
       Empty matrix: 0-by-1

For your empty one. 为了您的空虚。

By default, text scan reads whitespace as a delimiter, and it only reads what you tell it to until it can no longer do so; 默认情况下，文本扫描将空格作为分隔符读取，并且仅读取您告诉它的内容，直到不再能够读取为止。 thus it ignores the final lines in your file. 因此，它会忽略文件中的最后几行。 You can also run a second textscan after this one if you want to pick up those additional fields: 如果要提取其他字段，也可以在此之后进行第二次文本扫描：

fid = fopen('input.txt', 'rt');
data = textscan(fid, '%f %u', 'Headerlines', 1);

mass = data{1};
intensity = data{2};

data = textscan(fid, '%*s %u %*c %f %*c %*s %u %*s', 'Headerlines', 1);

scan = data{1};
level = data{2};
points = data{3};

fclose(fid);

Along with your mass and intensity data gives: 连同您的质量和强度数据可以得出：

    scan =
            3763

    level =
       81.9500

    points =
            1000

Answer 2

what do you mean 'totally failes on abnormal files'? 您的意思是“完全在异常文件上失败”？

you can check if importdata finds any data using eg 您可以使用以下方法检查importdata是否找到任何数据

>> imported = importdata(path_to_file,' ',  1);
>> isfield(imported, 'data')

Matlab：如何处理异常数据文件

问题描述

2 个解决方案

解决方案1
4 已采纳 2010-08-31 23:48:27

解决方案2
1 2010-08-31 11:57:26

Matlab：如何处理异常数据文件

问题描述

2 个解决方案

解决方案1 4 已采纳 2010-08-31 23:48:27

解决方案2 1 2010-08-31 11:57:26

解决方案1
4 已采纳 2010-08-31 23:48:27

解决方案2
1 2010-08-31 11:57:26