[英]Using str.contains instead of .isin with pandas
If my goal is to see if any values in one dataframe's column match in another dataframe's column I can use .isin
like so:如果我的目标是查看一个数据
.isin
列中的任何值是否与另一个数据.isin
列中的值匹配,我可以像这样使用.isin
:
df1 = pd.DataFrame({'name': ['Marc', 'Jake', 'Sam', 'Brad']})
df2 = pd.DataFrame({'IDs': ['Jake', 'John', 'Marc', 'Tony', 'Bob']})
print(df1.assign(In_df2=df1.name.isin(df2.IDs).astype(int)))
Output:
name In_df2
0 Marc 1
1 Jake 1
2 Sam 0
3 Brad 0
However if I don't want an exact match and want to avoid looping is there a way to substitute .isin
with str.contains()
?但是,如果我不想要完全匹配并且想要避免循环,有没有办法用
str.contains()
替换.isin
? Something like this?像这样的东西?
print(df1.assign(In_df2=df1.name.str.contains(df2.IDs).astype(int)))
right now this returns:现在这返回:
TypeError: unhashable type: 'Series'
Thanks!谢谢!
Use a regex like this:使用这样的正则表达式:
pattern = fr"(?:{'|'.join(df2['IDs'])})"
df1['In_df2'] = df1['name'].str.contains(pattern).astype(int)
Output:输出:
>>> df1
name In_df2
0 Marc 1
1 Jake 1
2 Sam 0
3 Brad 0
>>> pattern
'(?:Jake|John|Marc|Tony|Bob)'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.