使用pandas中的regex在另一列中的一列中查找值

Question

I have a pandas dataframe with two columns of strings. 我有一个包含两列字符串的pandas数据帧。 I want to identify all row where the string in the first column ( s1 ) appears within the string in the second column ( s2 ). 我想识别第一列（ s1 ）中的字符串出现在第二列（ s2 ）的字符串中的所有行。

So if my columns were: 所以，如果我的专栏是：

abc    abcd*ef_gh
z1y    xxyyzz

I want to keep the first row, but not the second. 我想保留第一行，但不是第二行。

The only approach I can think of is to: 我能想到的唯一方法是：

iterate through dataframe rows 迭代数据帧行
apply df.str.contains() to s2 using the contents of s1 as the matching pattern 使用s1的内容作为匹配模式将df.str.contains()应用于s2

Is there a way to accomplish this that doesn't require iterating over the rows? 有没有办法实现这一点，不需要迭代行？

Answer 1

It is probably doable (for simple matching only), in a vectorised way, with numpy chararray methods : 它可能是可行的（仅用于简单匹配），以矢量化方式，使用numpy chararray 方法：

In [326]:

print df
    s1          s2
0  abc  abcd*ef_gh
1  z1y      xxyyzz
2  aaa   aaabbbsss
In [327]:

print df.ix[np.char.find(df.s2.values.astype(str), 
                         df.s1.values.astype(str))>=0, 
            's1']
0    abc
2    aaa
Name: s1, dtype: object

Answer 2

The best I could come up with is to use apply instead of manual iterations: 我能想到的最好的方法是使用apply而不是手动迭代：

>> df = pd.DataFrame({'x': ['abc', 'xyz'], 'y': ['1234', '12xyz34']})
>> df
     x        y
0  abc     1234
1  xyz  12xyz34

>> df.x[df.apply(lambda row: row.y.find(row.x) != -1, axis=1)]
1    xyz
Name: x, dtype: object

使用pandas中的regex在另一列中的一列中查找值

问题描述

2 个解决方案

解决方案1
2 2015-09-01 19:49:27

解决方案2
1 2015-09-01 19:36:36

使用pandas中的regex在另一列中的一列中查找值

问题描述

2 个解决方案

解决方案1 2 2015-09-01 19:49:27

解决方案2 1 2015-09-01 19:36:36

解决方案1
2 2015-09-01 19:49:27

解决方案2
1 2015-09-01 19:36:36