Filter a pandas dataframe on multiple columns for partial string match, using values from a dict

Question

I need to filter a dataframe on multiple values from a dict

df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/gapminderDataFiveYear.csv')
filters_raw = {'continent': {'filterTerm': 'Asi', 'column': {'rowType': 'filter', 'key': 'continent', 'name': 'continent', 'editable': True, 'sortable': True, 'resizable': True, 'filterable': True, 'width': 147, 'left': 60}}, 'gdpPercap': {'filterTerm': '9', 'column': {'rowType': 'filter', 'key': 'gdpPercap', 'name': 'gdpPercap', 'editable': True, 'sortable': True, 'resizable': True, 'filterable': True, 'width': 147, 'left': 354}}, 'lifeExp': {'filterTerm': '4', 'column': {'rowType': 'filter', 'key': 'lifeExp', 'name': 'lifeExp', 'editable': True, 'sortable': True, 'resizable': True, 'filterable': True, 'width': 147, 'left': 501}}, 'pop': {'filterTerm': '3', 'column': {'rowType': 'filter', 'key': 'pop', 'name': 'pop', 'editable': True, 'sortable': True, 'resizable': True, 'filterable': True, 'width': 147, 'left': 648}}, 'year': {'filterTerm': '2007', 'column': {'rowType': 'filter', 'key': 'year', 'name': 'year', 'editable': True, 'sortable': True, 'resizable': True, 'filterable': True, 'width': 147, 'left': 795}}, 'country': {'filterTerm': 'af', 'column': {'rowType': 'filter', 'key': 'country', 'name': 'country', 'editable': True, 'sortable': True, 'resizable': True, 'filterable': True, 'width': 147, 'left': 207}}}
filters = {i:filters_raw[i]['filterTerm'] for i in filters_raw.keys()}

To use a dict to get exact matches I can do this Based on this answer( Filter a pandas dataframe using values from a dict ); ;

dff = df.loc[(df[list(filters)] == pd.Series(filters)).all(axis=1)]

But if I want to filter the same way, but not be limited to just exact matches but also get matches where value from dict is contained as a substring in dataframe. How would I do that?

The desired output is a dataframe with only the values that correspond to all the conditions simultaneously. With the filters above;

Dff
Asia Afghanistan 974.5803384 43.828 31889923 2007

Answer 1

Have a look at pandas.Series.str.contains where you can use a regular expression. There is also string handling functions that may be more tailored for what you need.

Answer 2

One solution can be using pd.Series.str.starstwith to find strings matching the ones in filters .

You can create a mask for those rows this way:

mask =  df.astype(str).apply(lambda x: x.str.lower()
        ).apply(lambda x: x.str.startswith(filters[x.name].lower()),
                axis=0).all(axis=1)

Basically, you convert the original dataframe to string and lower case and then go column by column checking wich elements start with the string in filter for that column (ie filters['continent'] ). Finally you set to true rows where all the cells contain the elements in filter

The result will be:

df[mask]

        country  year         pop continent  lifeExp   gdpPercap
11  Afghanistan  2007  31889923.0      Asia   43.828  974.580338

Hope it serves.

Filter a pandas dataframe on multiple columns for partial string match, using values from a dict

Question

2 answers

solution1
0 2018-10-28 14:14:27

solution2
0 ACCPTED 2018-10-28 19:50:26

Filter a pandas dataframe on multiple columns for partial string match, using values from a dict

Question

2 answers

solution1 0 2018-10-28 14:14:27

solution2 0 ACCPTED 2018-10-28 19:50:26

solution1
0 2018-10-28 14:14:27

solution2
0 ACCPTED 2018-10-28 19:50:26