简体   繁体   English

在MATLAB中快速分割字符串

[英]Fast string splitting in MATLAB

I've been using MATLAB to read through a bunch of output files and have noticed that it was reading the files fairly slowly in comparison to a reader that I wrote in Python for the same files (on the order of 120s for MATLAB, 4s for Python on the same set). 我一直在使用MATLAB来读取一堆输出文件,并且注意到与我在Python中为相同文件编写的读取器相比,读取文件的速度相当慢(MATLAB的读取时间为120 s,相同的Python)。 The files have a combination of letters and numbers, where the numbers I actually want each have a unique string on the same line, but there is no real pattern to the rest of the file. 这些文件由字母和数字组成,其中我实际上希望每个数字在同一行上都有一个唯一的字符串,但是文件的其余部分没有真正的模式。 Is there a faster way to read in non-uniformly formatted text files in MATLAB? 有没有一种更快的方法来读取MATLAB中非均匀格式的文本文件?

I tried using the code profiler in MATLAB to see what takes the most time, and it seemed to be the strfind and strsplit functions. 我尝试在MATLAB中使用代码分析器来查看花费时间最多的时间,这似乎是strfindstrsplit函数。 Deeper down, the strfun\\private\\strescape seems to be the culprit which takes up around 50% of the time, which is called by strsplit function. 从更深层次strfun\\private\\strescapestrfun\\private\\strescape似乎是占用约50%时间的元凶,这被strsplit函数调用。

I am currently using a combination of strfind and strsplit in order to search through a file for 5 specific strings, then convert the string after it into a double. 我目前正在使用strfind和strsplit的组合,以便在文件中搜索5个特定的字符串,然后将其转换为双精度字符串。

lots of text before this

   ####  unique identifying text here

lots of text before this

sometext  X = #####
          Y = #####
          Z = #####
more text = ######

I am iterating through the file with approximately the following code, repeated for each number that is being found. 我正在遍历该文件,大致使用以下代码,对找到的每个数字重复此代码。

fid=fopen(filename)
tline=fgets(fid)
while ischar(tline)
    if ~isempty(strfind(tline('X =')))
        tempstring=strsplit(tline(13:length(tline)),' ');
        result=str2double(char(tempstring(2)));
    end
    tline=fgets(fid);
end

I'm guessing this will be a bit faster, but maybe not by much. 我猜这会更快一些,但可能不会很快。

s = fileread('texto');
[X,s] = strtok(strsplit(s, "X = "){2}); X = str2num(X);
[Y,s] = strtok(strsplit(s, "Y = "){2}); Y = str2num(Y);
[Z,s] = strtok(strsplit(s, "Z = "){2}); Z = str2num(Z);

Obviously this is highly specific to your text example. 显然,这是高度特定于您的文本示例的。 You haven't given me any more info on how the variables might change etc so presumably you'll have to implement try/catch blocks if files are not consistent etc. 您没有给我更多有关变量如何更改等方面的信息,因此,如果文件不一致,则可能必须实现try/catch块等。

PS. PS。 This is octave syntax which allows chaining operations. 这是八度语法,允许链接操作。 For matlab, split them into separate operations as appropriate. 对于matlab,请将它们分成适当的单独操作。

EDIT: ach, nevermind, here's the matlab compatible one too. 编辑: ach,没关系,这也是与matlab兼容的。 :) :)

s = fileread('texto');
C = strsplit(s, 'X = '); [X,s] = strtok(C{2}); X = str2num(X);
C = strsplit(s, 'Y = '); [Y,s] = strtok(C{2}); Y = str2num(Y);
C = strsplit(s, 'Z = '); [Z,s] = strtok(C{2}); Z = str2num(Z);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM