MATLAB在字符串的單元格數組中找到子字符串數組的重復次數

Question

我有一個字符串的MATLAB單元格數組和一個帶有部分字符串的第二個數組：

base = {'a','b','c','d'}
all2 = {'a1','b1','c1','d1','a2','b2','c2','d2','q8','r15'}

輸出為：

base = 

    'a'    'b'    'c'    'd'


all2 = 

    'a1'    'b1'    'c1'    'd1'    'a2'    'b2'    'c2'    'd2'    'q8'    'r15'

問題/需求

如果有任何的 'a1' 'b1' 'c1' 'd1'和任何 'a2' 'b2' 'c2' 'd2'存在於中all2陣列，然后返回一個可變numb=2 。

如果有任何的 'a1' 'b1' 'c1' 'd1'和任何 'a2' 'b2' 'c2' 'd2' 任何的 'a3' 'b3' 'c3' ， 'd3'存在於all2數組中，然后返回變量numb=3 。

嘗試

1。

基於strfind （這種方法），我嘗試過matches = strfind(all2,base); 但是我得到了這個錯誤：

`Error using strfind`

`Input strings must have one row.`
....

2。

使用strfind 另一種方法似乎更好，但只是給了我

fun = @(s)~cellfun('isempty',strfind(all2,s));
out = cellfun(fun,base,'UniformOutput',false)
idx = all(horzcat(out{:}));
idx(1,1) 

out = 

[1x10 logical]    [1x10 logical]    [1x10 logical]    [1x10 logical]


ans =

     0

這些嘗試均無濟於事。 我認為我的邏輯不正確。

3。

此答案允許在字符串數組中找到部分字符串數組的所有索引。 它返回：

base = regexptranslate('escape', base);
matches = false(size(all2));
for k = 1:numel(all2)
    matches(k) = any(~cellfun('isempty', regexp(all2{k}, base)));
end
matches

輸出：

matches =

     1     1     1     1     1     1     1     1     0     0

我的方法存在問題：如何使用輸出matches計算numb=2 ？ 我不確定這是否與我的特定問題最相關，因為它只給出匹配的索引。

題

在MATLAB中有沒有辦法做到這一點？

編輯

附加信息：

數組all2將始終是連續的。 all2 = {'a1','b1','c1','d1','a3','b3','c3','d3','q8','r15'}是不可能的。

Answer 1

使用正則表達式查找base元素的唯一后綴：

base = {'a','b','c','d'};
all2 = {'a1','b1','c1','d1','a2','b2','c2','d2', 'a4', 'q8','r15'};

% Use sprintf to build the expression so we can concatenate all the values
% of base into a single string; this is the [c1c2c3] metacharacter.
% Assumes the values of base are going to be one character
%
% This regex looks for one or more digits preceeded by a character from
% base and returns only the digits that match this criteria.
regexstr = sprintf('(?<=[%s])(\\d+)', [base{:}]);

% Use once to eliminate a cell array level
test = regexp(all2, regexstr, 'match', 'once');

% Convert the digits to a double array
digits = str2double(test);

% Return the number of unique digits. With isnan() we can use logical indexing
% to ignore the NaN values
num = numel(unique(digits(~isnan(digits))));

哪個返回：

>> num

num =

     3

如果您需要連續的數字，則類似這樣的內容應該是有效的：

base = {'a','b','c','d'};
all2 = {'a1','b1','c1','d1','a2','b2','c2','d2', 'a4', 'q8','r15'};

regexstr = sprintf('(?<=[%s])(\\d+)', [base{:}]);
test = regexp(all2, regexstr, 'match', 'once');
digits = str2double(test);

% Find the unique digits, with isnan() we can use logical indexing to ignore the
% NaN values
unique_digits = unique(digits(~isnan(digits)));

% Because unique returns sorted values, we can use this to find where the
% first difference between digits is greater than 1. Append Inf at the end to
% handle the case where all values are continuous.
num = find(diff([unique_digits Inf]) > 1, 1);  % Thanks @gnovice :)

哪個返回：

>> num

num =

     2

分解regexp和sprintf行：因為我們知道base只包含單個字符，所以我們可以使用[c1c2c3]元字符，它將匹配方括號內的任何字符。 因此，如果我們使用'[rp]ain'我們將匹配'rain'或'pain' ，但不會匹配'gain' 。

base{:}返回MATLAB稱為逗號分隔的列表。 添加方括號將結果連接到單個字符數組中。

不帶括號：

>> base{:}

ans =

    'a'


ans =

    'b'


ans =

    'c'


ans =

    'd'

帶括號：

>> [base{:}]

ans =

    'abcd'

我們可以使用sprintf將其插入表達式字符串中。 這樣就得到(?<=[abcd])(\\d+) ，它匹配a, b, c, d之一之前的一個或多個數字。

MATLAB在字符串的單元格數組中找到子字符串數組的重復次數

問題描述

1 個解決方案

解決方案1
2 已采納 2017-04-19 20:34:19

MATLAB在字符串的單元格數組中找到子字符串數組的重復次數

問題描述

1 個解決方案

解決方案1 2 已采納 2017-04-19 20:34:19

解決方案1
2 已采納 2017-04-19 20:34:19