I need to filter down a pandas dataframe based on conditions for multiple columns. I got these conditions from a dict config file like this:
config = {
"PLANT_ID": ["KD"],
"CO_CD": ["V", "R"]
}
What this means is that I need to filter down the dataset like: if (PLANT_ID starts with KD) or (CO_CD startswith V or R) then I should keep that record. There can be more than 2 columns specified, and more than 2 strings in the list.
I know I can use startswith and convert the list to tuples like this:
df.PLANT_ID.str.startswith(tuple(config['PLANT_ID']))
But I somehow need to write this condition to dynamically pick the column names from the config dict.
IIUC, you can craft a regex for each item in your initial dictionary, then apply
it using str.startswith
to each column and aggregate with any
:
import re
regex = {k: "|".join(map(re.escape, l)) for k,l in config.items()}
m = df[list(config)].apply(lambda c: c.str.startswith(regex[c.name])).any(1)
df2 = df[m]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.