[英]Searching for a list of values in Pandas series and masking the matches
I am trying to search for a list of values within a Pandas series and write the matches to a new column.我正在尝试在 Pandas 系列中搜索值列表并将匹配项写入新列。
input:输入:
vals = ['john', 'jane']
col1
john doe
jane doe
billy j.
desired output:所需的输出:
col1 col2
john doe john
jane doe jane
billy j. nan
I tried to stay away from for loops and do it with Pandas methods but could not get the result.我试图远离 for 循环并使用 Pandas 方法来完成,但无法获得结果。
With the bare python code below, I can print the matches but can't write them to the corresponding rows in col2.使用下面的裸 python 代码,我可以打印匹配项,但不能将它们写入 col2 中的相应行。 Also, it is obviously not efficient for larger datasets.
此外,对于较大的数据集,它显然效率不高。
for i in vals:
for j in df.col1:
if i in j:
print("match\t", i,'in:\t',j)
else:
print('-')
output:
match john in: john doe
-
-
-
match jane in: jane doe
-
Any help would be appreciated.任何帮助,将不胜感激。
Here's an option that could also handle multiple vals per row, concatenating them with comma:这是一个选项,它也可以处理每行多个 val,用逗号将它们连接起来:
df['col2'] = df.col1.apply(lambda x: ','.join([v for v in vals if v in x])).replace('', np.nan)
Output:输出:
col1 col2
0 john doe john
1 jane doe jane
2 billy j. NaN
Another option using pandas.Series.findall
:使用
pandas.Series.findall
另一种选择:
pat = '|'.join(vals)
df['col2'] = df.col1.str.findall(pat).apply(','.join)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.