简体   繁体   English

Pandas系列不区分大小写的匹配和值之间的部分匹配

[英]Pandas series case-insensitive matching and partial matching between values

I have the following operation to add a status showing where any string in a column of one dataframe column is present in a specified column of another dataframe. 我执行以下操作以添加状态,以显示一个数据框列的列中的任何字符串存在于另一数据框的指定列中。 It looks like this: 看起来像这样:

df_one['Status'] = np.where(df_one.A.isin(df_two.A), 'Matched','Unmatched')

This won't match if the string case is different. 如果字符串大小写不同,这将不匹配。 Is it possible to perform this operation while being case insensitive? 在不区分大小写的情况下是否可以执行此操作?

Also, is it possible return 'Matched' when a value in df_one.A ends with the full string from df_two.A ? 此外,是否有可能回归“匹配”时df_one.A值从df_two.A完整的字符串结尾? eg df_one.A abcdefghijkl -> df_two.A ijkl = 'Matched' 例如df_one.A abcdefghijkl-> df_two.A ijkl ='Matched'

You can do the first test by converting both strings to lowercase or uppercase (either works) inside the expression (as you aren't reassigning either column back to your DataFrames, the case conversion is only temporary): 您可以通过将两个字符串都转换为表达式中的小写或大写(两种都可行)来进行第一个测试(因为您没有将任一列重新分配给DataFrames,因此大小写转换只是临时的):

df_one['Status'] = np.where(df_one.A.str.lower().isin(df_two.A.str.lower()), \ 
                            'Matched', 'Unmatched')

You can perform your second test by checking whether each string in df_one.A ends with any of the strings in df_two.A, like so (assuming you still want a case-insensitive match): 您可以通过检查df_one.A中的每个字符串是否都以df_two.A中的任何字符串结尾来进行第二次测试,就像这样(假设您仍然需要不区分大小写的匹配项):

df_one['Endswith_Status'] = np.where(df_one.A.str.lower().apply( \
                                      lambda x: any(x.endswith(i) for i in df_two.A.str.lower())), \ 
                                      'Matched', 'Unmatched')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM