简体   繁体   English

在 Pandas 系列中搜索值列表并屏蔽匹配项

[英]Searching for a list of values in Pandas series and masking the matches

I am trying to search for a list of values within a Pandas series and write the matches to a new column.我正在尝试在 Pandas 系列中搜索值列表并将匹配项写入新列。

input:输入:

vals = ['john', 'jane']

col1         
john doe
jane doe
billy j. 

desired output:所需的输出:

col1          col2
john doe      john
jane doe      jane
billy j.      nan

I tried to stay away from for loops and do it with Pandas methods but could not get the result.我试图远离 for 循环并使用 Pandas 方法来完成,但无法获得结果。

With the bare python code below, I can print the matches but can't write them to the corresponding rows in col2.使用下面的裸 python 代码,我可以打印匹配项,但不能将它们写入 col2 中的相应行。 Also, it is obviously not efficient for larger datasets.此外,对于较大的数据集,它显然效率不高。

for i in vals:
    for j in df.col1:
        if i in j:
            print("match\t",  i,'in:\t',j)
        else:
            print('-')

output:
match    john in:    john doe
-
-
-
match    jane in:    jane doe
-

Any help would be appreciated.任何帮助,将不胜感激。

Here's an option that could also handle multiple vals per row, concatenating them with comma:这是一个选项,它也可以处理每行多个 val,用逗号将它们连接起来:

df['col2'] = df.col1.apply(lambda x: ','.join([v for v in vals if v in x])).replace('', np.nan)

Output:输出:

       col1  col2
0  john doe  john
1  jane doe  jane
2  billy j.   NaN

Another option using pandas.Series.findall :使用pandas.Series.findall另一种选择:

pat = '|'.join(vals)
df['col2'] = df.col1.str.findall(pat).apply(','.join)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM