简体   繁体   English

Matlab:如何处理异常数据文件

[英]Matlab: how handle abnormal data files

I am trying to import a large number of files into Matlab for processing. 我正在尝试将大量文件导入Matlab进行处理。 A typical file would look like this: 一个典型的文件如下所示:

    mass      intensity
 350.85777         238
 350.89252        3094
 350.98688        2762
 351.87899         468
 352.17712         569
 352.28449         426
Some text and numbers here, describing the experimental setup, eg  
Scan 3763 @ 81.95, contains 1000 points:

The numbers in the two columns are separated by 8 spaces. 两列中的数字用8个空格分隔。 However, sometimes the experiment will go wrong and the machine will produce a datafile like this one: 但是,有时实验会出错,并且计算机会生成这样的数据文件:

mass      intensity

Some text and numbers here, describing the experimental setup, eg  
Scan 3763 @ 81.95, contains 1000 points:

I found that using space-separated files with a single header row, ie 我发现使用带有单个标题行的以空格分隔的文件,即

importdata(path_to_file,' ',  1);

works best for the normal files. 最适合普通文件。 However, it totally fails on all the abnormal files. 但是,它对所有异常文件完全失败。 What would the easiest way to fix this be? 解决此问题的最简单方法是什么? Should I stick with importdata (already tried all possible settings, it just doesn't work) or should I try writing my own parser? 我应该坚持使用importdata(已经尝试了所有可能的设置,但实际上不起作用)还是应该尝试编写自己的解析器? Ideally, I would like to get those values in a Nx2 matrix for normal files and [0 0] for abnormal files. 理想情况下,对于常规文件,我希望在Nx2矩阵中获得这些值,对于异常文件,我想获得[0 0]。

Thanks. 谢谢。

I don't think you need to create your own parser, nor is this all that abnormal. 我认为您不需要创建自己的解析器,也不是那么异常。 Using textscan is your best option here. 在这里,使用textscan是最好的选择。

fid = fopen('input.txt', 'rt');
data = textscan(fid, '%f %u', 'Headerlines', 1);
fclose(fid);

mass = data{1};
intensity = data{2};

Yields: 产量:

mass =
  350.8578
  350.8925
  350.9869
  351.8790
  352.1771
  352.2845

intensity =
         238
        3094
        2762
         468
         569
         426

For your 1st file and: 对于您的第一个文件,以及:

    mass =
       Empty matrix: 0-by-1

    intensity =
       Empty matrix: 0-by-1

For your empty one. 为了您的空虚。

By default, text scan reads whitespace as a delimiter, and it only reads what you tell it to until it can no longer do so; 默认情况下,文本扫描将空格作为分隔符读取,并且仅读取您告诉它的内容,直到不再能够读取为止。 thus it ignores the final lines in your file. 因此,它会忽略文件中的最后几行。 You can also run a second textscan after this one if you want to pick up those additional fields: 如果要提取其他字段,也可以在此之后进行第二次文本扫描:

fid = fopen('input.txt', 'rt');
data = textscan(fid, '%f %u', 'Headerlines', 1);

mass = data{1};
intensity = data{2};

data = textscan(fid, '%*s %u %*c %f %*c %*s %u %*s', 'Headerlines', 1);

scan = data{1};
level = data{2};
points = data{3};

fclose(fid);

Along with your mass and intensity data gives: 连同您的质量和强度数据可以得出:

    scan =
            3763

    level =
       81.9500

    points =
            1000

what do you mean 'totally failes on abnormal files'? 您的意思是“完全在异常文件上失败”?

you can check if importdata finds any data using eg 您可以使用以下方法检查importdata是否找到任何数据

>> imported = importdata(path_to_file,' ',  1);
>> isfield(imported, 'data')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM