简体   繁体   English

根据其他数据框列值过滤熊猫数据框

[英]Filter pandas Data Frame Based on other Dataframe Column Values

df1: df1:

Id   Country  Product
1    india    cotton
2    germany  shoes
3    algeria  bags

df2: df2:

id   Country  Product  Qty   Sales
1    India    cotton   25    635
2    India    cotton   65    335
3    India    cotton   96    455
4    India    cotton   78    255
5    germany  shoes    25    635
6    germany  shoes    65    458
7    germany  shoes    96    455
8    germany  shoes    69    255
9    algeria  bags     25    635
10   algeria  bags     89    788
11   algeria  bags     96    455
12   algeria  bags     78    165

I need to filter df2 based on the Country and Products Column from df1 and Create New Data Frame.我需要根据 df1 中的 Country 和 Products 列过滤 df2 并创建新数据框。 For example in df1, there are 3 unique country, Categories, so Number of df would be 3.例如在 df1 中,有 3 个唯一的国家/地区,类别,因此 df 的数量将为 3。

Output:输出:

df_India_Cotton :

id   Country  Product  Qty   Sales
1    India    cotton   25    635
2    India    cotton   65    335
3    India    cotton   96    455
4    India    cotton   78    255

df_germany_Product:

id   Country  Product  Qty   Sales
1    germany  shoes    25    635
2    germany  shoes    65    458
3    germany  shoes    96    455
4    germany  shoes    69    255

df_algeria_Product:

id  Country  Product  Qty   Sales
1   algeria  bags     25    635
2   algeria  bags     89    788
3   algeria  bags     96    455
4   algeria  bags     78    165

i can also filter out these dataframe with basic subsetting in pandas.我还可以使用 Pandas 中的基本子集过滤掉这些数据框。

df[(df.Country=='India') & (df.Products=='cotton')]

it would solve this problem, there could be so many unique combination of Country, Products in my df1.它会解决这个问题,在我的 df1.xml 中可能有这么多独特的 Country、Products 组合。

You can create a dictionary and save all dataframes in it.您可以创建一个字典并将所有数据框保存在其中。 Check the code below:检查下面的代码:

d={}
for i in range(len(df1)):
    name=df1.Country.iloc[i]+'_'+df1.Product.iloc[i]
    d[name]=df2[(df2.Country==df1.Country.iloc[i]) & (df2.Product==df1.Product.iloc[i])]

And you can call each dataframe by its values like below:您可以通过其值调用每个数据框,如下所示:

d['India_cotton'] will give: d['India_cotton'] 将给出:

id   Country  Product  Qty   Sales
1    India    cotton   25    635
2    India    cotton   65    335
3    India    cotton   96    455
4    India    cotton   78    255

Try creating two groupby's.尝试创建两个 groupby。 Use the first to select from the second:使用第一个从第二个中选择:

import pandas as pd

selector_df = pd.DataFrame(data=
                           {
                               'Country':'india germany algeria'.split(),
                               'Product':'cotton shoes bags'.split()
                           })

details_df = pd.DataFrame(data=
                         {
                            'Country':'india india india india germany germany germany germany algeria algeria algeria algeria'.split(),
                            'Product':'cotton cotton cotton cotton shoes shoes shoes shoes bags bags bags bags'.split(),
                            'qty':[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
                         })

selectorgroups = selector_df.groupby(by=['Country', 'Product'])
datagroups = details_df.groupby(by=['Country', 'Product'])
for tag, group in selectorgroups:
    print(tag)
    try:
        print(datagroups.get_group(tag))
    except KeyError:
        print('tag does not exist in datagroup')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM