简体   繁体   中英

Filter pandas dataframe by complex dynamic conditions

I need to filter down a pandas dataframe based on conditions for multiple columns. I got these conditions from a dict config file like this:

config = {
 "PLANT_ID": ["KD"],
 "CO_CD": ["V", "R"]
 }

What this means is that I need to filter down the dataset like: if (PLANT_ID starts with KD) or (CO_CD startswith V or R) then I should keep that record. There can be more than 2 columns specified, and more than 2 strings in the list.

I know I can use startswith and convert the list to tuples like this:

df.PLANT_ID.str.startswith(tuple(config['PLANT_ID']))

But I somehow need to write this condition to dynamically pick the column names from the config dict.

IIUC, you can craft a regex for each item in your initial dictionary, then apply it using str.startswith to each column and aggregate with any :

import re
regex = {k: "|".join(map(re.escape, l)) for k,l in config.items()}

m = df[list(config)].apply(lambda c: c.str.startswith(regex[c.name])).any(1)

df2 = df[m]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM