[英]Compare two arrays of strings
I have two lists of strings as a column in a table ( PM25_spr{i}.MonitorID
and O3_spr{i}.MonitorID
). 我在表中有两个字符串列表作为一列( PM25_spr{i}.MonitorID
和O3_spr{i}.MonitorID
)。 The lists are of different lengths. 列表的长度不同。 I want to compare the first 11 characters of each entry and pull out the index for each list where they are the same. 我想比较每个条目的前11个字符,并为每个相同的列表提取索引。
Example 例
List 1:
'01-003-0010-44201'
'01-027-0001-44201'
'01-051-0001-44201'
'01-073-0023-44201'
'01-073-1003-44201'
'01-073-1005-44201'
'01-073-1009-44201'
'01-073-1010-44201'
'01-073-2006-44201'
'01-073-5002-44201'
'01-073-5003-44201'
'01-073-6002-44201'
List 2:
'01-073-0023-88101'
'01-073-2003-88101'
'04-013-0019-88101'
'04-013-9992-88101'
'04-013-9997-88101'
'05-119-0007-88101'
'05-119-1008-88101'
'06-019-0008-88101'
'06-029-0014-88101'
'06-037-0002-88101'
'06-037-1103-88101'
'06-037-4002-88101'
'06-059-0001-88101'
'06-065-8001-88101'
'06-067-0010-88101'
'06-073-0003-88101'
'06-073-1002-88101'
'06-073-1007-88101'
'08-001-0006-88101'
'08-031-0002-88101'
I tried intersect
, which isn't the right approach for what I want to do. 我尝试过intersect
,这不是我想要做的正确方法。 I'm not sure how to use ismember
given that I only want to look at the first 11 characters. 鉴于我只想查看前11个字符,因此我不确定如何使用ismember
。
I tried strncmp
, but Inputs must be the same size or either one can be a scalar.
我尝试了strncmp
,但是Inputs must be the same size or either one can be a scalar.
chars2compare = length('18-097-0083');
strncmp(O3_spr{i}.MonitorID, PM25_spr{i}.MonitorID,chars2compare)
PM25_spr_MID = cell(length(years),1); % Preallocate cell array
for n = 1:length(PM25_spr{i}.MonitorID)
s = char(PM25_spr{i}.MonitorID(n)); % Convert string to char
PM25_spr_MID{i}(n) = cellstr(s(1:11)); % Pull out 1-11 characters and convert to cell
end
O3_spr_MID = cell(length(years),1); % Preallocate cell array
for n = 1:length(O3_spr{i}.MonitorID)
s = char(O3_spr{i}.MonitorID(n));
O3_spr_MID{i}(n) = cellstr(s(1:11));
end
[C, ia, ib] = intersect(O3_spr_MID{i}, PM25_spr_MID{i})
PerCap_spr_O3{i} = O3_spr{i}(ia,:);
PerCap_spr_PM25{i} = PM25_spr{i}(ib,:);
Assuming list1
and list2
to be the two input cell arrays, you can use few approaches. 假设list1
和list2
是两个输入单元格数组,则可以使用几种方法。
With intersect
- 与intersect
-
%// Clip off after first 11 characters in each cell of the input cell arrays
list1_f11 = arrayfun(@(n) list1{n}(1:11),1:numel(list1),'uni',0)
list2_f11 = arrayfun(@(n) list2{n}(1:11),1:numel(list2),'uni',0)
%// Use intersect to find common indices in the input cell arrays
[~,idx_list1,idx_list2] = intersect(list1_f11,list2_f11)
With ismember
- 与ismember
%// Clip off after first 11 characters in each cell of the input cell arrays
list1_f11 = arrayfun(@(n) list1{n}(1:11),1:numel(list1),'uni',0)
list2_f11 = arrayfun(@(n) list2{n}(1:11),1:numel(list2),'uni',0)
%// Use ismember to find common indices in the input cell arrays
[LocA,LocB] = ismember(list1_f11,list2_f11);
idx_list1 = find(LocA)
idx_list2 = LocB(LocA)
We can use char
dierctly on the input cell arrays to get 2D
char arrays as working with them could be faster than working with cells
. 我们可以在输入单元格数组上直接使用char
来获取2D
char数组,因为使用它们比使用cells
更快。
With intersect
+ 'rows' - 与intersect
+“行” -
%// Convert to char arrays
list1c = char(list1)
list2c = char(list2)
%// Clip char arrays after first 11 columns
list1c_f11 = list1c(:,1:11)
list2c_f11 = list2c(:,1:11)
%// Use intersect with 'rows' option
[~,idx_list1,idx_list2] = intersect(list1c_f11,list2c_f11,'rows')
We can convert the char arrays further to numeric arrays with just one column as that could lead to faster solutions. 我们可以只用一列将char数组进一步转换为数字数组,因为这可能导致更快的解决方案。
%// Convert to char arrays
list1c = char(list1)
list2c = char(list2)
%// Clip char arrays after first 11 columns
list1c_f11 = list1c(:,1:11)
list2c_f11 = list2c(:,1:11)
%// Remove char columns of hyphens (3 and 7 for the given input)
list1c_f11(:,[3 7])=[];
list2c_f11(:,[3 7])=[];
%// Convert char arrays to numeric arrays
ncols = size(list1c_f11,2);
list1c_f11num = (list1c_f11 - '0')*(10.^(ncols-1:-1:0))'
list2c_f11num = (list2c_f11 - '0')*(10.^(ncols-1:-1:0))'
This point onwards you have three more approaches to work with that are listed next. 从现在开始,接下来列出了三种其他的使用方法。
With ismember
( would be memory efficient, but maybe not fast across all datasizes) - 使用ismember
(会提高内存效率,但可能无法在所有数据大小上快速ismember
) -
[LocA,LocB] = ismember(list1c_f11num,list2c_f11num);
idx_list1 = find(LocA)
idx_list2 = LocB(LocA)
With intersect
(could be slow) - 与intersect
(可能很慢) -
[~,idx_list1,idx_list2] = intersect(list1c_f11num,list2c_f11num)
With bsxfun
( would be memory inefficient, but maybe fast for small to decent sized inputs) - 使用bsxfun
(会降低内存效率,但对于小到体面的输入可能会很快) -
[idx_list1,idx_list2] = find(bsxfun(@eq,list1c_f11num,list2c_f11num'))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.