I need to filter a dataframe on multiple values from a dict
df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/gapminderDataFiveYear.csv')
filters_raw = {'continent': {'filterTerm': 'Asi', 'column': {'rowType': 'filter', 'key': 'continent', 'name': 'continent', 'editable': True, 'sortable': True, 'resizable': True, 'filterable': True, 'width': 147, 'left': 60}}, 'gdpPercap': {'filterTerm': '9', 'column': {'rowType': 'filter', 'key': 'gdpPercap', 'name': 'gdpPercap', 'editable': True, 'sortable': True, 'resizable': True, 'filterable': True, 'width': 147, 'left': 354}}, 'lifeExp': {'filterTerm': '4', 'column': {'rowType': 'filter', 'key': 'lifeExp', 'name': 'lifeExp', 'editable': True, 'sortable': True, 'resizable': True, 'filterable': True, 'width': 147, 'left': 501}}, 'pop': {'filterTerm': '3', 'column': {'rowType': 'filter', 'key': 'pop', 'name': 'pop', 'editable': True, 'sortable': True, 'resizable': True, 'filterable': True, 'width': 147, 'left': 648}}, 'year': {'filterTerm': '2007', 'column': {'rowType': 'filter', 'key': 'year', 'name': 'year', 'editable': True, 'sortable': True, 'resizable': True, 'filterable': True, 'width': 147, 'left': 795}}, 'country': {'filterTerm': 'af', 'column': {'rowType': 'filter', 'key': 'country', 'name': 'country', 'editable': True, 'sortable': True, 'resizable': True, 'filterable': True, 'width': 147, 'left': 207}}}
filters = {i:filters_raw[i]['filterTerm'] for i in filters_raw.keys()}
To use a dict to get exact matches I can do this Based on this answer( Filter a pandas dataframe using values from a dict ); ;
dff = df.loc[(df[list(filters)] == pd.Series(filters)).all(axis=1)]
But if I want to filter the same way, but not be limited to just exact matches but also get matches where value from dict is contained as a substring in dataframe. How would I do that?
The desired output is a dataframe with only the values that correspond to all the conditions simultaneously. With the filters above;
Dff
Asia Afghanistan 974.5803384 43.828 31889923 2007
Have a look at pandas.Series.str.contains where you can use a regular expression. There is also string handling functions that may be more tailored for what you need.
One solution can be using pd.Series.str.starstwith
to find strings matching the ones in filters
.
You can create a mask for those rows this way:
mask = df.astype(str).apply(lambda x: x.str.lower()
).apply(lambda x: x.str.startswith(filters[x.name].lower()),
axis=0).all(axis=1)
Basically, you convert the original dataframe to string and lower case and then go column by column checking wich elements start with the string in filter for that column (ie filters['continent']
). Finally you set to true rows where all the cells contain the elements in filter
The result will be:
df[mask]
country year pop continent lifeExp gdpPercap
11 Afghanistan 2007 31889923.0 Asia 43.828 974.580338
Hope it serves.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.