[英]MATLAB find number of replicates of substring array, in cell array of strings
I have a MATLAB cell array of strings and a second array with partial strings: 我有一个字符串的MATLAB单元格数组和一个带有部分字符串的第二个数组:
base = {'a','b','c','d'}
all2 = {'a1','b1','c1','d1','a2','b2','c2','d2','q8','r15'}
The output is: 输出为:
base =
'a' 'b' 'c' 'd'
all2 =
'a1' 'b1' 'c1' 'd1' 'a2' 'b2' 'c2' 'd2' 'q8' 'r15'
Problem/Requirement 问题/需求
If any of 'a1'
, 'b1'
, 'c1'
, 'd1'
AND any of 'a2'
, 'b2'
, 'c2'
, 'd2'
are present in the all2
array, then return a variable numb=2
. 如果有任何的
'a1'
'b1'
'c1'
'd1'
和任何 'a2'
'b2'
'c2'
'd2'
存在于中all2
阵列,然后返回一个可变numb=2
。
If any of 'a1'
, 'b1'
, 'c1'
, 'd1'
AND any of 'a2'
, 'b2'
, 'c2'
, 'd2'
AND any of 'a3'
, 'b3'
, 'c3'
, 'd3'
are present in the all2
array, then return a variable numb=3
. 如果有任何的
'a1'
'b1'
'c1'
'd1'
和任何 'a2'
'b2'
'c2'
'd2'
任何的 'a3'
'b3'
'c3'
, 'd3'
存在于all2
数组中,然后返回变量numb=3
。
Attempts 尝试
1. 1。
Based on strfind
( this approach ), I tried matches = strfind(all2,base);
基于
strfind
( 这种方法 ),我尝试过matches = strfind(all2,base);
but I got this error: 但是我得到了这个错误:
`Error using strfind`
`Input strings must have one row.`
....
2. 2。
This other approach using strfind
seemed better but just gave me 使用
strfind
另一种方法似乎更好,但只是给了我
fun = @(s)~cellfun('isempty',strfind(all2,s));
out = cellfun(fun,base,'UniformOutput',false)
idx = all(horzcat(out{:}));
idx(1,1)
out =
[1x10 logical] [1x10 logical] [1x10 logical] [1x10 logical]
ans =
0
Neither of these attempts have worked. 这些尝试均无济于事。 I think my logic is incorrect.
我认为我的逻辑不正确。
3. 3。
This answer allows to find all indices of an array of partial strings in an array of strings. 此答案允许在字符串数组中找到部分字符串数组的所有索引。 It returns:
它返回:
base = regexptranslate('escape', base);
matches = false(size(all2));
for k = 1:numel(all2)
matches(k) = any(~cellfun('isempty', regexp(all2{k}, base)));
end
matches
Output: 输出:
matches =
1 1 1 1 1 1 1 1 0 0
My problem with this approach: How do I use the output matches
to calculate numb=2
? 我的方法存在问题:如何使用输出
matches
计算numb=2
? I am not sure if this is the most relevant logic for my specific question since it only gives matching indices. 我不确定这是否与我的特定问题最相关,因为它只给出匹配的索引。
Question 题
Is there a way to do this in MATLAB? 在MATLAB中有没有办法做到这一点?
EDIT 编辑
Additional Information: 附加信息:
The array all2
WILL always be contiguous. 数组
all2
将始终是连续的。 A scenario of all2 = {'a1','b1','c1','d1','a3','b3','c3','d3','q8','r15'}
is not possible. all2 = {'a1','b1','c1','d1','a3','b3','c3','d3','q8','r15'}
是不可能的。
Using a regex to find the unique suffixes to the base
elements: 使用正则表达式查找
base
元素的唯一后缀:
base = {'a','b','c','d'};
all2 = {'a1','b1','c1','d1','a2','b2','c2','d2', 'a4', 'q8','r15'};
% Use sprintf to build the expression so we can concatenate all the values
% of base into a single string; this is the [c1c2c3] metacharacter.
% Assumes the values of base are going to be one character
%
% This regex looks for one or more digits preceeded by a character from
% base and returns only the digits that match this criteria.
regexstr = sprintf('(?<=[%s])(\\d+)', [base{:}]);
% Use once to eliminate a cell array level
test = regexp(all2, regexstr, 'match', 'once');
% Convert the digits to a double array
digits = str2double(test);
% Return the number of unique digits. With isnan() we can use logical indexing
% to ignore the NaN values
num = numel(unique(digits(~isnan(digits))));
Which returns: 哪个返回:
>> num
num =
3
If you need continuous digits then something like this should be valid: 如果您需要连续的数字,则类似这样的内容应该是有效的:
base = {'a','b','c','d'};
all2 = {'a1','b1','c1','d1','a2','b2','c2','d2', 'a4', 'q8','r15'};
regexstr = sprintf('(?<=[%s])(\\d+)', [base{:}]);
test = regexp(all2, regexstr, 'match', 'once');
digits = str2double(test);
% Find the unique digits, with isnan() we can use logical indexing to ignore the
% NaN values
unique_digits = unique(digits(~isnan(digits)));
% Because unique returns sorted values, we can use this to find where the
% first difference between digits is greater than 1. Append Inf at the end to
% handle the case where all values are continuous.
num = find(diff([unique_digits Inf]) > 1, 1); % Thanks @gnovice :)
Which returns: 哪个返回:
>> num
num =
2
Breaking down the regexp
and sprintf
lines: Because we know that base
only consists of single characters, we can use the [c1c2c3]
metacharacter , which will match any character inside the brackets. 分解
regexp
和sprintf
行:因为我们知道base
只包含单个字符,所以我们可以使用[c1c2c3]
元字符 ,它将匹配方括号内的任何字符。 So if we have '[rp]ain'
we'll matche 'rain'
or 'pain'
, but not 'gain'
. 因此,如果我们使用
'[rp]ain'
我们将匹配'rain'
或'pain'
,但不会匹配'gain'
。
base{:}
returns what MATLAB calls a comma-separated list . base{:}
返回MATLAB称为逗号分隔的列表 。 Adding the brackets concatenates the result into a single character array. 添加方括号将结果连接到单个字符数组中。
Without brackets: 不带括号:
>> base{:}
ans =
'a'
ans =
'b'
ans =
'c'
ans =
'd'
With brackets: 带括号:
>> [base{:}]
ans =
'abcd'
Which we can insert into our expression string with sprintf
. 我们可以使用
sprintf
将其插入表达式字符串中。 This gives us (?<=[abcd])(\\d+)
, which matches one or more digits preceeded by one of either a, b, c, d
. 这样就得到
(?<=[abcd])(\\d+)
,它匹配a, b, c, d
之一之前的一个或多个数字。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.