简体   繁体   English

使用pandas中的regex在另一列中的一列中查找值

[英]Find value in one column in another column with regex in pandas

I have a pandas dataframe with two columns of strings. 我有一个包含两列字符串的pandas数据帧。 I want to identify all row where the string in the first column ( s1 ) appears within the string in the second column ( s2 ). 我想识别第一列( s1 )中的字符串出现在第二列( s2 )的字符串中的所有行。

So if my columns were: 所以,如果我的专栏是:

abc    abcd*ef_gh
z1y    xxyyzz

I want to keep the first row, but not the second. 我想保留第一行,但不是第二行。

The only approach I can think of is to: 我能想到的唯一方法是:

  1. iterate through dataframe rows 迭代数据帧行
  2. apply df.str.contains() to s2 using the contents of s1 as the matching pattern 使用s1的内容作为匹配模式将df.str.contains()应用于s2

Is there a way to accomplish this that doesn't require iterating over the rows? 有没有办法实现这一点,不需要迭代行?

It is probably doable (for simple matching only), in a vectorised way, with numpy chararray methods : 它可能是可行的(仅用于简单匹配),以矢量化方式,使用numpy chararray 方法

In [326]:

print df
    s1          s2
0  abc  abcd*ef_gh
1  z1y      xxyyzz
2  aaa   aaabbbsss
In [327]:

print df.ix[np.char.find(df.s2.values.astype(str), 
                         df.s1.values.astype(str))>=0, 
            's1']
0    abc
2    aaa
Name: s1, dtype: object

The best I could come up with is to use apply instead of manual iterations: 我能想到的最好的方法是使用apply而不是手动迭代:

>> df = pd.DataFrame({'x': ['abc', 'xyz'], 'y': ['1234', '12xyz34']})
>> df
     x        y
0  abc     1234
1  xyz  12xyz34

>> df.x[df.apply(lambda row: row.y.find(row.x) != -1, axis=1)]
1    xyz
Name: x, dtype: object

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 找到在另一列中只取一个值的值 pandas python - find the values that ONLY take one value in another column pandas python 使用正则表达式有效地用pandas中另一列的值替换一列中的部分值? - Efficiently replace part of value from one column with value from another column in pandas using regex? 在另一列中查找一列的值 - Find value of one column within another column Pandas DataFrames:有效地找到另一列具有更大值的一列中的下一个值 - Pandas DataFrames: Efficiently find next value in one column where another column has a greater value 使用 pandas 按另一列中的值计算一列中的正则表达式匹配 - Count regex matches in one column by values in another column with pandas 检查一列值是否在另一列中并创建列以在 Pandas 中指示 - Check if one column value is in another column and create column to indicate in Pandas 熊猫:在一列中找到所有唯一值,并将另一列中的所有值归一化为它们的最后一个值 - Pandas: find all unique values in one column and normalize all values in another column to their last value 查找将一列的值作为另一列中的子字符串以及熊猫中的其他 OR 条件的行 - Find rows which have one column's value as substring in another column along with other OR conditions in pandas Pandas Dataframe:从另一列中唯一值最多的列中查找唯一值 - Pandas Dataframe: Find unique value from one column which has the largest number of unique values in another column Pandas/Python:根据另一列中的值设置一列的值 - Pandas/Python: Set value of one column based on value in another column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM