Filter pandas dataframe by complex dynamic conditions

Question

I need to filter down a pandas dataframe based on conditions for multiple columns. I got these conditions from a dict config file like this:

config = {
 "PLANT_ID": ["KD"],
 "CO_CD": ["V", "R"]
 }

What this means is that I need to filter down the dataset like: if (PLANT_ID starts with KD) or (CO_CD startswith V or R) then I should keep that record. There can be more than 2 columns specified, and more than 2 strings in the list.

I know I can use startswith and convert the list to tuples like this:

df.PLANT_ID.str.startswith(tuple(config['PLANT_ID']))

But I somehow need to write this condition to dynamically pick the column names from the config dict.

Answer 1

IIUC, you can craft a regex for each item in your initial dictionary, then apply it using str.startswith to each column and aggregate with any :

import re
regex = {k: "|".join(map(re.escape, l)) for k,l in config.items()}

m = df[list(config)].apply(lambda c: c.str.startswith(regex[c.name])).any(1)

df2 = df[m]

Filter pandas dataframe by complex dynamic conditions

Question

1 answers

solution1
1 ACCPTED 2022-05-04 15:25:40

Filter pandas dataframe by complex dynamic conditions

Question

1 answers

solution1 1 ACCPTED 2022-05-04 15:25:40

solution1
1 ACCPTED 2022-05-04 15:25:40