简体   繁体   中英

How to filter pandas dataframe rows based on dictionary keys and values?

I have a dataframe and a dictionary in Python as shown below and I need to filter the dataframe based on the dictionary. As you see, the keys and values of the dictionary are two columns of the dataframe. I want to have a subset of dataframe which contains the keys and values of dictionary plus other columns.

df :

Customer_ID Category Type Delivery
40275 Book Buy True
40275 Software Sell False
40275 Video Game Sell False
40275 Cell Phone Sell False
39900 CD/DVD Sell True
39900 Book Buy True
39900 Software Sell True
35886 Cell Phone Sell False
35886 Video Game Buy False
35886 CD/DVD Sell False
35886 Software Sell False
40350 Software Sell True
28129 Software Buy False

And dictionary is:

d = {
 40275: ['Book','Software'],
 39900: ['Book'],
 35886: ['Software'],
 40350: ['Software'],
 28129: ['Software']
 }

And I need the following dataframe:

Customer_ID Category Type Delivery
40275 Book Buy True
40275 Software Sell False
39900 Book Buy True
35886 Software Sell False
40350 Software Sell True
28129 Software Buy False

We can set_index to the Customer_ID and Category columns then build a list of tuples from the dictionary d and reindex the DataFrame to include only the rows which match the list of tuples, then reset_index to restore the columns:

new_df = df.set_index(['Customer_ID', 'Category']).reindex(
    [(k, v) for k, lst in d.items() for v in lst]
).reset_index()

new_df :

   Customer_ID  Category  Type  Delivery
0        40275      Book   Buy      True
1        40275  Software  Sell     False
2        39900      Book   Buy      True
3        35886  Software  Sell     False
4        40350  Software  Sell      True
5        28129  Software   Buy     False

*Note this only works if the MultiIndex is unique (like the shown example). It will also add rows if the dictionary does not represent a subset of the DataFrame's MultiIndex (which may or may not be the desired behaviour).


Setup:

import pandas as pd

d = {
    40275: ['Book', 'Software'],
    39900: ['Book'],
    35886: ['Software'],
    40350: ['Software'],
    28129: ['Software']
}

df = pd.DataFrame({
    'Customer_ID': [40275, 40275, 40275, 40275, 39900, 39900, 39900, 35886,
                    35886, 35886, 35886, 40350, 28129],
    'Category': ['Book', 'Software', 'Video Game', 'Cell Phone', 'CD/DVD',
                 'Book', 'Software', 'Cell Phone', 'Video Game', 'CD/DVD',
                 'Software', 'Software', 'Software'],
    'Type': ['Buy', 'Sell', 'Sell', 'Sell', 'Sell', 'Buy', 'Sell', 'Sell',
             'Buy', 'Sell', 'Sell', 'Sell', 'Buy'],
    'Delivery': [True, False, False, False, True, True, True, False, False,
                 False, False, True, False]
})

You can use df.merge with df.append :

In [444]: df1 = pd.DataFrame.from_dict(d, orient='index', columns=['Cat1', 'Cat2']).reset_index()

In [449]: res = df.merge(df1[['index', 'Cat1']], left_on=['Customer_ID', 'Category'], right_on=['index', 'Cat1']).drop(['index', 'Cat1'], 1)

In [462]: res = res.append(df.merge(df1[['index', 'Cat2']], left_on=['Customer_ID', 'Category'], right_on=['index', 'Cat2']).drop(['index', 'Cat2'], 1)).sort_values('Customer_ID', ascending=False)

In [463]: res
Out[463]: 
   Customer_ID  Category  Type  Delivery
3        40350  Software  Sell      True
0        40275      Book   Buy      True
0        40275  Software  Sell     False
1        39900      Book   Buy      True
2        35886  Software  Sell     False
4        28129  Software   Buy     False

Flatten the dictionary and create a new dataframe, then inner merge df with the new dataframe

df.merge(pd.DataFrame([{'Customer_ID': k, 'Category': i} 
                       for k, v in d.items() for i in v]))

   Customer_ID  Category  Type  Delivery
0        40275      Book   Buy      True
1        40275  Software  Sell     False
2        39900      Book   Buy      True
3        35886  Software  Sell     False
4        40350  Software  Sell      True
5        28129  Software   Buy     False

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM