I have a pandas dataframe with two columns of strings. I want to identify all row where the string in the first column ( s1
) appears within the string in the second column ( s2
).
So if my columns were:
abc abcd*ef_gh
z1y xxyyzz
I want to keep the first row, but not the second.
The only approach I can think of is to:
df.str.contains()
to s2
using the contents of s1
as the matching pattern Is there a way to accomplish this that doesn't require iterating over the rows?
It is probably doable (for simple matching only), in a vectorised way, with numpy chararray methods :
In [326]:
print df
s1 s2
0 abc abcd*ef_gh
1 z1y xxyyzz
2 aaa aaabbbsss
In [327]:
print df.ix[np.char.find(df.s2.values.astype(str),
df.s1.values.astype(str))>=0,
's1']
0 abc
2 aaa
Name: s1, dtype: object
The best I could come up with is to use apply
instead of manual iterations:
>> df = pd.DataFrame({'x': ['abc', 'xyz'], 'y': ['1234', '12xyz34']})
>> df
x y
0 abc 1234
1 xyz 12xyz34
>> df.x[df.apply(lambda row: row.y.find(row.x) != -1, axis=1)]
1 xyz
Name: x, dtype: object
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.