简体   繁体   English

Pandas dataframe:根据正则表达式字符串搜索过滤行

[英]Pandas dataframe: Filter rows based on regex string search

I have a Pandas dataframe that has 128 million rows and I need to find an efficient way to filter the rows in this dataframe.我有一个 Pandas dataframe 有 1.28 亿行,我需要找到一种有效的方法来过滤这个 dataframe 中的行。

I need to filter all rows that have "foo" in them.我需要过滤所有包含“foo”的行。 "foo" could be in any column but essentially if any row has "foo" then return that row. “foo”可以在任何列中,但本质上,如果任何行有“foo”,则返回该行。

I did something like this:我做了这样的事情:

final_rows = df[df['col1'].str.contains(string_to_search))] & df[df['col2' ].str.contains(string_to_search))] ..... etc.

but this did not work.但这没有用。

I am new to Pandas, so apologies if this is a very basic question.我是 Pandas 的新手,如果这是一个非常基本的问题,我深表歉意。

You can pass apply with any你可以通过any apply

m=df[['col1',...]].apply(lambda x : x.str.contains(string_to_search)).any(axis=1)

final_rows=df[m]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM