简体   繁体   English

比较两个字符串数组

[英]Compare two arrays of strings

I have two lists of strings as a column in a table ( PM25_spr{i}.MonitorID and O3_spr{i}.MonitorID ). 我在表中有两个字符串列表作为一列( PM25_spr{i}.MonitorIDO3_spr{i}.MonitorID )。 The lists are of different lengths. 列表的长度不同。 I want to compare the first 11 characters of each entry and pull out the index for each list where they are the same. 我想比较每个条目的前11个字符,并为每个相同的列表提取索引。

Example

List 1:
    '01-003-0010-44201'
    '01-027-0001-44201'
    '01-051-0001-44201'
    '01-073-0023-44201'
    '01-073-1003-44201'
    '01-073-1005-44201'
    '01-073-1009-44201'
    '01-073-1010-44201'
    '01-073-2006-44201'
    '01-073-5002-44201'
    '01-073-5003-44201'
    '01-073-6002-44201'

List 2:
    '01-073-0023-88101'
    '01-073-2003-88101'
    '04-013-0019-88101'
    '04-013-9992-88101'
    '04-013-9997-88101'
    '05-119-0007-88101'
    '05-119-1008-88101'
    '06-019-0008-88101'
    '06-029-0014-88101'
    '06-037-0002-88101'
    '06-037-1103-88101'
    '06-037-4002-88101'
    '06-059-0001-88101'
    '06-065-8001-88101'
    '06-067-0010-88101'
    '06-073-0003-88101'
    '06-073-1002-88101'
    '06-073-1007-88101'
    '08-001-0006-88101'
    '08-031-0002-88101'

I tried intersect , which isn't the right approach for what I want to do. 我尝试过intersect ,这不是我想要做的正确方法。 I'm not sure how to use ismember given that I only want to look at the first 11 characters. 鉴于我只想查看前11个字符,因此我不确定如何使用ismember

I tried strncmp , but Inputs must be the same size or either one can be a scalar. 我尝试了strncmp ,但是Inputs must be the same size or either one can be a scalar.

chars2compare = length('18-097-0083'); 
strncmp(O3_spr{i}.MonitorID, PM25_spr{i}.MonitorID,chars2compare)
PM25_spr_MID = cell(length(years),1); % Preallocate cell array
for n = 1:length(PM25_spr{i}.MonitorID) 
    s = char(PM25_spr{i}.MonitorID(n)); % Convert string to char
    PM25_spr_MID{i}(n) = cellstr(s(1:11)); % Pull out 1-11 characters and convert to cell
end

O3_spr_MID = cell(length(years),1); % Preallocate cell array
for n = 1:length(O3_spr{i}.MonitorID)
    s = char(O3_spr{i}.MonitorID(n));
    O3_spr_MID{i}(n) = cellstr(s(1:11));
end

[C, ia, ib] = intersect(O3_spr_MID{i}, PM25_spr_MID{i}) 
PerCap_spr_O3{i} = O3_spr{i}(ia,:);
PerCap_spr_PM25{i} = PM25_spr{i}(ib,:);

Assuming list1 and list2 to be the two input cell arrays, you can use few approaches. 假设list1list2是两个输入单元格数组,则可以使用几种方法。

I. Operate on cell arrays I.在单元阵列上操作

With intersect - intersect -

%// Clip off after first 11 characters in each cell of the input cell arrays
list1_f11 = arrayfun(@(n) list1{n}(1:11),1:numel(list1),'uni',0)
list2_f11 = arrayfun(@(n) list2{n}(1:11),1:numel(list2),'uni',0)

%// Use intersect to find common indices in the input cell arrays
[~,idx_list1,idx_list2] = intersect(list1_f11,list2_f11)

With ismember - ismember

%// Clip off after first 11 characters in each cell of the input cell arrays
list1_f11 = arrayfun(@(n) list1{n}(1:11),1:numel(list1),'uni',0)
list2_f11 = arrayfun(@(n) list2{n}(1:11),1:numel(list2),'uni',0)

%// Use ismember to find common indices in the input cell arrays
[LocA,LocB] = ismember(list1_f11,list2_f11);
idx_list1 = find(LocA)
idx_list2 = LocB(LocA)

II. 二。 Operate on char arrays 在char数组上操作

We can use char dierctly on the input cell arrays to get 2D char arrays as working with them could be faster than working with cells . 我们可以在输入单元格数组上直接使用char来获取2D char数组,因为使用它们比使用cells更快。

With intersect + 'rows' - intersect +“行” -

%// Convert to char arrays
list1c = char(list1)
list2c = char(list2)

%// Clip char arrays after first 11 columns
list1c_f11 = list1c(:,1:11)
list2c_f11 = list2c(:,1:11)

%// Use intersect with 'rows' option
[~,idx_list1,idx_list2] = intersect(list1c_f11,list2c_f11,'rows')

III. 三, Operate on numeric arrays 在数字数组上操作

We can convert the char arrays further to numeric arrays with just one column as that could lead to faster solutions. 我们可以只用一列将char数组进一步转换为数字数组,因为这可能导致更快的解决方案。

%// Convert to char arrays
list1c = char(list1)
list2c = char(list2)

%// Clip char arrays after first 11 columns
list1c_f11 = list1c(:,1:11)
list2c_f11 = list2c(:,1:11)

%// Remove char columns of hyphens (3 and 7 for the given input)
list1c_f11(:,[3 7])=[];
list2c_f11(:,[3 7])=[];

%// Convert char arrays to numeric arrays
ncols = size(list1c_f11,2);
list1c_f11num = (list1c_f11 - '0')*(10.^(ncols-1:-1:0))'
list2c_f11num = (list2c_f11 - '0')*(10.^(ncols-1:-1:0))'

This point onwards you have three more approaches to work with that are listed next. 从现在开始,接下来列出了三种其他的使用方法。

With ismember ( would be memory efficient, but maybe not fast across all datasizes) - 使用ismember (会提高内存效率,但可能无法在所有数据大小上快速ismember -

[LocA,LocB] = ismember(list1c_f11num,list2c_f11num);
idx_list1 = find(LocA)
idx_list2 = LocB(LocA)

With intersect (could be slow) - intersect (可能很慢) -

[~,idx_list1,idx_list2] = intersect(list1c_f11num,list2c_f11num)

With bsxfun ( would be memory inefficient, but maybe fast for small to decent sized inputs) - 使用bsxfun (会降低内存效率,但对于小到体面的输入可能会很快) -

[idx_list1,idx_list2] = find(bsxfun(@eq,list1c_f11num,list2c_f11num'))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM