简体   繁体   English

MATLAB在字符串的单元格数组中查找单元格数组子字符串

[英]MATLAB find cell array substrings in a cell array of strings

Let's say we have a cell array of substrings arrayOfSubstrings = {substr1;substr2} and a cell array of strings arrayOfStrings = {string1;string2;string3;stirng4} . 假设我们有一个子字符串arrayOfSubstrings = {substr1;substr2}的单元格数组和一个字符串arrayOfStrings = {string1;string2;string3;stirng4} arrayOfSubstrings = {substr1;substr2}的单元格数组。 How can I get a logical map into the cell array of strings where at least one of the substrings is found? 我如何才能将逻辑映射映射到找到至少一个子字符串的字符串的单元格数组中? I have tried 我努力了

cellfun('isempty',regexp(arrayOfSubstrings ,arrayOfStrings ))

and

cellfun('isempty', strfind(arrayOfSubstrings , arrayOfStrings ))

and some other permutations of functions, but am not getting anywhere. 以及其他一些功能上的排列,但是却一无所获。

The issue is that with both strfind and regexp is that you can't provide two cell arrays and have them automatically apply all patterns to all strings. 问题在于,使用strfindregexp ,您不能提供两个单元格数组并使它们自动将所有模式应用于所有字符串。 You will need to loop through one or the other to make it work. 您将需要遍历一个或另一个来使其正常工作。

You can do this with an explicit loop 您可以使用显式循环来执行此操作

strings = {'ab', 'bc', 'de', 'fa'};
substrs = {'a', 'b', 'c'};

% First you'll want to escape the regular expressions
substrs = regexptranslate('escape', substrs);

matches = false(size(strings));

for k = 1:numel(strings)
    matches(k) = any(~cellfun('isempty', regexp(strings{k}, substrs)));
end

% 1  1  0  1

Or if you are for loop-averse you can use cellfun 或者,如果您想cellfun循环,则可以使用cellfun

cellfun(@(s)any(~cellfun('isempty', regexp(s, substrs))), strings)
% 1  1  0  1

A Different Approach 不同的方法

Alternately, you could combine your sub-strings into a single regular expression 或者,您可以将子字符串组合成单个正则表达式

pattern = ['(', strjoin(regexptranslate('escape', substrs), '|'), ')'];
%   (a|b|c)

output = ~cellfun('isempty', regexp(strings, pattern));
%   1  1  0  1

If you are using R2016b or R2017a you can just use contains: 如果您使用的是R2016b或R2017a,则可以使用contains:

>> strings = {'ab', 'bc', 'de', 'fa'};
>> substrs = {'a', 'b', 'c'};
>> contains(strings, substrs)

ans =

  1×4 logical array

   1   1   0   1

Contains is also the fastest, especially if you use the new string datatype. 包含也是最快的,特别是如果您使用新的字符串数据类型。

function profFunc()

    strings = {'ab', 'bc', 'de', 'fa'};
    substrs = {'a', 'b', 'c'};

    n = 10000;

    tic;
    for i = 1:n
        substrs_translated = regexptranslate('escape', substrs);

        matches = false(size(strings));

        for k = 1:numel(strings)
            matches(k) = any(~cellfun('isempty', regexp(strings{k}, substrs_translated)));
        end
    end
    toc

    tic;
    for i = 1:n
        cellfun(@(s)any(~cellfun('isempty', regexp(s, substrs))), strings);
    end
    toc

    tic;
    for i = 1:n
        pattern = ['(', strjoin(regexptranslate('escape', substrs), '|'), ')'];
        output = ~cellfun('isempty', regexp(strings, pattern)); %#ok<NASGU>
    end
    toc

    tic;
    for i = 1:n
        contains(strings,substrs);
    end
    toc

    %Imagine you were using strings for all your text!
    strings = string(strings);

    tic;
    for i = 1:n
        contains(strings,substrs);
    end
    toc
end

Timing results: 计时结果:

>> profFunc
Elapsed time is 0.643176 seconds.
Elapsed time is 1.007309 seconds.
Elapsed time is 0.683643 seconds.
Elapsed time is 0.050663 seconds.
Elapsed time is 0.008177 seconds.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM