简体   繁体   English

在 MATLAB 中使用 textscan 读取固定宽度的字符串时出错

[英]Error reading a fixed-width string with textscan in MATLAB

I'm reading fixed-width (9 characters) data from a text file using textscan.我正在使用 textscan 从文本文件中读取固定宽度(9 个字符)数据。 Textscan fails at a certain line containing the string: Textscan 在包含字符串的特定行失败:

'   9574865.0E+10  '

I would like to read two numbers from this:我想从中读出两个数字:

957486 5.0E+10

The problem can be replicated like this:问题可以像这样复制:

dat = textscan('   9574865.0E+10  ','%9f %9f','Delimiter','','CollectOutput',true,'ReturnOnError',false);

The following error is returned:返回以下错误:

Error using textscan
Mismatch between file and format string.
Trouble reading floating point number from file (row 1u, field 2u) ==> E+10

Surprisingly, if we add a minus, we don't get an error, but a wrong result:令人惊讶的是,如果我们添加一个减号,我们不会得到错误,而是得到错误的结果:

dat = textscan('  -9574865.0E+10  ','%9f %9f','Delimiter','','CollectOutput',true,'ReturnOnError',false);

Now dat{1} is:现在 dat{1} 是:

    -9574865           0

Obviously, I need both cases to work.显然,我需要两种情况都可以工作。 My current workaround is to add commas between the fields and use commas as a delimiter in textscan, but that's slow and not a nice solution.我目前的解决方法是在字段之间添加逗号并在 textscan 中使用逗号作为分隔符,但这很慢而且不是一个很好的解决方案。 Is there any way I can read this string correctly using textscan or another built-in (for performance reasons) MATLAB function?有什么方法可以使用 textscan 或其他内置(出于性能原因)MATLAB 函数正确读取此字符串?

I suspect textscan first trims leading white space, and then parses the format string. 我怀疑textscan 首先修剪前导空格, 然后解析格式字符串。 I think this, because if you change yuor format string from 我认为这是因为,如果您将您的格式字符串从

'%9f%9f'

to

'%6f%9f'

your one-liner suddenly works. 你的单线突然工作了。 Also, if you try 另外,如果您尝试

'%9s%9s'

you'll see that the first string has its leading whitespace removed (and therefore has 3 characters "too many"), but for some reason, the last string keeps its trailing whitespace. 您会看到第一个字符串的前导空格已删除(因此有3个字符“太多”),但是由于某种原因,最后一个字符串保留了其尾随空格。

Obviously, this means you'd have to know exactly how many digits there are in both numbers. 显然,这意味着您必须确切地知道两个数字中有多少个数字。 I'm guessing this is not desirable. 我猜这是不可取的。

A workaround could be something like the following: 解决方法可能类似于以下内容:

% Split string on the "dot"
dat = textscan(<your data>,'%9s%9s',...
    'Delimiter'     , '.',...
    'CollectOutput' , true,...
    'ReturnOnError' , false);

% Correct the strings; move the last digit of the first string to the 
% front of the second string, and put the dot back
dat = cellfun(@(x,y) str2double({y(1:end-1),  [y(end) '.' x]}),  dat{1}(:,2), dat{1}(:,1), 'UniformOutput', false);

% Cast to regular array
dat  = cat(1, dat{:})

I had a similar problem and solved it by calling textscan twice, which proved to be way faster than cellfun or str2double and will work with any input that can be interpreted by Matlab's '%f'我有一个类似的问题,并通过两次调用textscan解决了它,事实证明这比cellfunstr2double并且可以处理任何可以被 Matlab 的'%f'解释的输入

In your case I would first call textscan with only string arguments and Whitespace = '' to correctly define the width of the fields.在您的情况下,我将首先仅使用字符串参数和Whitespace = ''调用 textscan 以正确定义字段的宽度。

data = '   9574865.0E+10  ';
tmp = textscan(data, '%9s %9s', 'Whitespace', '');

Now you need to interweave and append a delimiter that won't interfere with your data, for example ;现在您需要交织并附加一个不会干扰您的数据的分隔符,例如;

tmp = [char(join([tmp{:}],';',2)) ';'];

And now you can apply the right format to your data by calling textscan again with a delimiter like:现在您可以通过使用如下分隔符再次调用textscan来将正确的格式应用于您的数据:

result = textscan(tmp, '%f %f', 'Delimiter', ';', 'CollectOutput', true);
format shortE
result{:}

ans =

9.5749e+05   5.0000e+10

Comparing the speed of this approach with str2double :将这种方法的速度与str2double

n = 50000;
data = repmat('   9574865.0E+10  ', n, 1);
% Approach 1 with str2double
tic
tmp = textscan(data', '%9s %9s', 'Whitespace', '');
result1 = str2double([tmp{:}]);
toc

Elapsed time is 2.435376 seconds.

% Approach 2 with double textscan
tic
tmp = textscan(data', '%9s %9s', 'Whitespace', '');
tmp = [char(join([tmp{:}],';',2)) char(59)*ones(n,1)]; % char(59) is just ';'
result2 = cell2mat(textscan(tmp', '%f %f', 'Delimiter', ';', 'CollectOutput', true));
toc

Elapsed time is 0.098833 seconds.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM