简体   繁体   English

如何根据字典键和值过滤熊猫数据框行?

[英]How to filter pandas dataframe rows based on dictionary keys and values?

I have a dataframe and a dictionary in Python as shown below and I need to filter the dataframe based on the dictionary.我在 Python 中有一个数据框和一个字典,如下所示,我需要根据字典过滤数据框。 As you see, the keys and values of the dictionary are two columns of the dataframe.如您所见,字典的键和值是数据框的两列。 I want to have a subset of dataframe which contains the keys and values of dictionary plus other columns.我想要一个数据框的子集,其中包含字典的键和值以及其他列。

df : df:

Customer_ID顾客ID Category类别 Type类型 Delivery送货
40275 40275 Book Buy True真的
40275 40275 Software软件 Sell False错误的
40275 40275 Video Game电子游戏 Sell False错误的
40275 40275 Cell Phone手机 Sell False错误的
39900 39900 CD/DVD CD/DVD Sell True真的
39900 39900 Book Buy True真的
39900 39900 Software软件 Sell True真的
35886 35886 Cell Phone手机 Sell False错误的
35886 35886 Video Game电子游戏 Buy False错误的
35886 35886 CD/DVD CD/DVD Sell False错误的
35886 35886 Software软件 Sell False错误的
40350 40350 Software软件 Sell True真的
28129 28129 Software软件 Buy False错误的

And dictionary is:字典是:

d = {
 40275: ['Book','Software'],
 39900: ['Book'],
 35886: ['Software'],
 40350: ['Software'],
 28129: ['Software']
 }

And I need the following dataframe:我需要以下数据框:

Customer_ID顾客ID Category类别 Type类型 Delivery送货
40275 40275 Book Buy True真的
40275 40275 Software软件 Sell False错误的
39900 39900 Book Buy True真的
35886 35886 Software软件 Sell False错误的
40350 40350 Software软件 Sell True真的
28129 28129 Software软件 Buy False错误的

We can set_index to the Customer_ID and Category columns then build a list of tuples from the dictionary d and reindex the DataFrame to include only the rows which match the list of tuples, then reset_index to restore the columns:我们可以set_indexCustomer_IDCategory列,然后从字典d构建元组列表并reindex DataFrame 以仅包含与元组列表匹配的行,然后reset_index恢复列:

new_df = df.set_index(['Customer_ID', 'Category']).reindex(
    [(k, v) for k, lst in d.items() for v in lst]
).reset_index()

new_df : new_df

   Customer_ID  Category  Type  Delivery
0        40275      Book   Buy      True
1        40275  Software  Sell     False
2        39900      Book   Buy      True
3        35886  Software  Sell     False
4        40350  Software  Sell      True
5        28129  Software   Buy     False

*Note this only works if the MultiIndex is unique (like the shown example). *请注意,这只适用于 MultiIndex 是唯一的(如所示示例)。 It will also add rows if the dictionary does not represent a subset of the DataFrame's MultiIndex (which may or may not be the desired behaviour).如果字典不代表 DataFrame 的 MultiIndex 的子集(这可能是也可能不是所需的行为),它也会添加行。


Setup:设置:

import pandas as pd

d = {
    40275: ['Book', 'Software'],
    39900: ['Book'],
    35886: ['Software'],
    40350: ['Software'],
    28129: ['Software']
}

df = pd.DataFrame({
    'Customer_ID': [40275, 40275, 40275, 40275, 39900, 39900, 39900, 35886,
                    35886, 35886, 35886, 40350, 28129],
    'Category': ['Book', 'Software', 'Video Game', 'Cell Phone', 'CD/DVD',
                 'Book', 'Software', 'Cell Phone', 'Video Game', 'CD/DVD',
                 'Software', 'Software', 'Software'],
    'Type': ['Buy', 'Sell', 'Sell', 'Sell', 'Sell', 'Buy', 'Sell', 'Sell',
             'Buy', 'Sell', 'Sell', 'Sell', 'Buy'],
    'Delivery': [True, False, False, False, True, True, True, False, False,
                 False, False, True, False]
})

You can use df.merge with df.append :您可以将df.mergedf.append df.merge使用:

In [444]: df1 = pd.DataFrame.from_dict(d, orient='index', columns=['Cat1', 'Cat2']).reset_index()

In [449]: res = df.merge(df1[['index', 'Cat1']], left_on=['Customer_ID', 'Category'], right_on=['index', 'Cat1']).drop(['index', 'Cat1'], 1)

In [462]: res = res.append(df.merge(df1[['index', 'Cat2']], left_on=['Customer_ID', 'Category'], right_on=['index', 'Cat2']).drop(['index', 'Cat2'], 1)).sort_values('Customer_ID', ascending=False)

In [463]: res
Out[463]: 
   Customer_ID  Category  Type  Delivery
3        40350  Software  Sell      True
0        40275      Book   Buy      True
0        40275  Software  Sell     False
1        39900      Book   Buy      True
2        35886  Software  Sell     False
4        28129  Software   Buy     False

Flatten the dictionary and create a new dataframe, then inner merge df with the new dataframe展平字典并创建一个新的数据帧,然后将df与新的数据帧进行内部合并

df.merge(pd.DataFrame([{'Customer_ID': k, 'Category': i} 
                       for k, v in d.items() for i in v]))

   Customer_ID  Category  Type  Delivery
0        40275      Book   Buy      True
1        40275  Software  Sell     False
2        39900      Book   Buy      True
3        35886  Software  Sell     False
4        40350  Software  Sell      True
5        28129  Software   Buy     False

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将字典键分配给匹配的行时,用字典值过滤数据帧? - Filter dataframe with dictionary values while assigning dictionary keys to matching rows? 基于 2 个连续行的值过滤 pandas Dataframe - Filter of pandas Dataframe based on values of 2 consecutive rows 根据列中的值过滤pandas数据帧中的行 - Filter rows in pandas dataframe based on values in columns 如何从字典中创建一个pandas数据框,列名作为键,值作为行,其中值是二维数组 - how to create a pandas dataframe from a dictionary with column names as keys and values as rows where the values are 2-d array Python:从pandas数据帧生成字典,行作为键,列作为值 - Python: Generate dictionary from pandas dataframe with rows as keys and columns as values 根据包含的字典键替换 Pandas DataFrame 列值 - Replace Pandas DataFrame column values based on containing dictionary keys 如何根据具有重复键值对的数据帧行将值附加到 python 字典中的键 - How to append values to keys in python dictionary based on dataframe rows with repeated key-value pairs 熊猫:根据字典更改数据框值,并删除不匹配的行 - Pandas: Change dataframe values based on dictionary and remove rows with no match 使用字典中的键过滤pandas DataFrame - Filter a pandas DataFrame using keys from a dictionary 根据字典值过滤 DataFrame - Filter DataFrame based on dictionary values
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM