简体   繁体   中英

Pandas dataframe: Filter rows based on regex string search

I have a Pandas dataframe that has 128 million rows and I need to find an efficient way to filter the rows in this dataframe.

I need to filter all rows that have "foo" in them. "foo" could be in any column but essentially if any row has "foo" then return that row.

I did something like this:

final_rows = df[df['col1'].str.contains(string_to_search))] & df[df['col2' ].str.contains(string_to_search))] ..... etc.

but this did not work.

I am new to Pandas, so apologies if this is a very basic question.

You can pass apply with any

m=df[['col1',...]].apply(lambda x : x.str.contains(string_to_search)).any(axis=1)

final_rows=df[m]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM