[英]Find value in one column in another column with regex in pandas
I have a pandas dataframe with two columns of strings. 我有一个包含两列字符串的pandas数据帧。 I want to identify all row where the string in the first column (
s1
) appears within the string in the second column ( s2
). 我想识别第一列(
s1
)中的字符串出现在第二列( s2
)的字符串中的所有行。
So if my columns were: 所以,如果我的专栏是:
abc abcd*ef_gh
z1y xxyyzz
I want to keep the first row, but not the second. 我想保留第一行,但不是第二行。
The only approach I can think of is to: 我能想到的唯一方法是:
df.str.contains()
to s2
using the contents of s1
as the matching pattern s1
的内容作为匹配模式将df.str.contains()
应用于s2
Is there a way to accomplish this that doesn't require iterating over the rows? 有没有办法实现这一点,不需要迭代行?
It is probably doable (for simple matching only), in a vectorised way, with numpy chararray methods : 它可能是可行的(仅用于简单匹配),以矢量化方式,使用numpy chararray 方法 :
In [326]:
print df
s1 s2
0 abc abcd*ef_gh
1 z1y xxyyzz
2 aaa aaabbbsss
In [327]:
print df.ix[np.char.find(df.s2.values.astype(str),
df.s1.values.astype(str))>=0,
's1']
0 abc
2 aaa
Name: s1, dtype: object
The best I could come up with is to use apply
instead of manual iterations: 我能想到的最好的方法是使用
apply
而不是手动迭代:
>> df = pd.DataFrame({'x': ['abc', 'xyz'], 'y': ['1234', '12xyz34']})
>> df
x y
0 abc 1234
1 xyz 12xyz34
>> df.x[df.apply(lambda row: row.y.find(row.x) != -1, axis=1)]
1 xyz
Name: x, dtype: object
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.