简体   繁体   中英

Pandas - how to filter dataframe by regex comparisons on mutliple column values

I have a dataframe like the following, where everything is formatted as a string:

df
  property  value  count
0   propAb   True     10
1   propAA  False     10
2   propAB   blah     10
3   propBb      3      8
4   propBA      4      7
5   propCa    100      4

I am trying to find a way to filter the dataframe by applying a series of regex-style rules to both the property and value columns together.

For example, some sample rules may be like the following:

  • "if property starts with 'propA' and value is not 'True', drop the row".

Another rule may be something more mathematical, like:

  • "if property starts with 'propB' and value < 4, drop the row".

Is there a way to accomplish something like this without having to iterate over all rows each time for every rule I want to apply?

You still have to apply each rule (how else?), but let pandas handle the rows. Also, instead of removing the rows that you do not like, keep the rows that you do. Here's an example of how the first two rules can be applied:

rule1 = df.property.str.startswith('propA') & (df.value != 'True')
df = df[~rule1] # Keep everything that does NOT match
rule2 = df.property.str.startswith('propB') & (df.value < 4)
df = df[~rule2] # Keep everything that does NOT match

By the way, the second rule will not work because value is not a numeric column.

For the first one:

df = df.drop(df[(df.property.startswith('propA')) & (df.value is not True)].index)

and the other one:

df = df.drop(df[(df.property.startswith('propB')) & (df.value < 4)].index)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM