[英]Filter pandas Data Frame Based on other Dataframe Column Values
df1: df1:
Id Country Product
1 india cotton
2 germany shoes
3 algeria bags
df2: df2:
id Country Product Qty Sales
1 India cotton 25 635
2 India cotton 65 335
3 India cotton 96 455
4 India cotton 78 255
5 germany shoes 25 635
6 germany shoes 65 458
7 germany shoes 96 455
8 germany shoes 69 255
9 algeria bags 25 635
10 algeria bags 89 788
11 algeria bags 96 455
12 algeria bags 78 165
I need to filter df2 based on the Country and Products Column from df1 and Create New Data Frame.我需要根据 df1 中的 Country 和 Products 列过滤 df2 并创建新数据框。 For example in df1, there are 3 unique country, Categories, so Number of df would be 3.
例如在 df1 中,有 3 个唯一的国家/地区,类别,因此 df 的数量将为 3。
Output:输出:
df_India_Cotton :
id Country Product Qty Sales
1 India cotton 25 635
2 India cotton 65 335
3 India cotton 96 455
4 India cotton 78 255
df_germany_Product:
id Country Product Qty Sales
1 germany shoes 25 635
2 germany shoes 65 458
3 germany shoes 96 455
4 germany shoes 69 255
df_algeria_Product:
id Country Product Qty Sales
1 algeria bags 25 635
2 algeria bags 89 788
3 algeria bags 96 455
4 algeria bags 78 165
i can also filter out these dataframe with basic subsetting in pandas.我还可以使用 Pandas 中的基本子集过滤掉这些数据框。
df[(df.Country=='India') & (df.Products=='cotton')]
it would solve this problem, there could be so many unique combination of Country, Products in my df1.它会解决这个问题,在我的 df1.xml 中可能有这么多独特的 Country、Products 组合。
You can create a dictionary and save all dataframes in it.您可以创建一个字典并将所有数据框保存在其中。 Check the code below:
检查下面的代码:
d={}
for i in range(len(df1)):
name=df1.Country.iloc[i]+'_'+df1.Product.iloc[i]
d[name]=df2[(df2.Country==df1.Country.iloc[i]) & (df2.Product==df1.Product.iloc[i])]
And you can call each dataframe by its values like below:您可以通过其值调用每个数据框,如下所示:
d['India_cotton'] will give: d['India_cotton'] 将给出:
id Country Product Qty Sales
1 India cotton 25 635
2 India cotton 65 335
3 India cotton 96 455
4 India cotton 78 255
Try creating two groupby's.尝试创建两个 groupby。 Use the first to select from the second:
使用第一个从第二个中选择:
import pandas as pd
selector_df = pd.DataFrame(data=
{
'Country':'india germany algeria'.split(),
'Product':'cotton shoes bags'.split()
})
details_df = pd.DataFrame(data=
{
'Country':'india india india india germany germany germany germany algeria algeria algeria algeria'.split(),
'Product':'cotton cotton cotton cotton shoes shoes shoes shoes bags bags bags bags'.split(),
'qty':[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
})
selectorgroups = selector_df.groupby(by=['Country', 'Product'])
datagroups = details_df.groupby(by=['Country', 'Product'])
for tag, group in selectorgroups:
print(tag)
try:
print(datagroups.get_group(tag))
except KeyError:
print('tag does not exist in datagroup')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.