[英]Create new column based on number of rows matching value in another dataframe
I want to create new column based on the number of rows each fruit is present in df2.我想根据 df2 中每个水果的行数创建新列。
Expected Output of df1
No | Fruit_Name | 2018 | 2019 | 2020
1 | Apple | 2 | 1 | 0
2 | Banana | 0 | 0 | 1
3 | Cherries | 0 | 0 | 1
df1 df2
No | Fruit_Name | year | farmer | fruit_farmed
1 | Apple | 2018 | John | Apple
2 | Banana | 2019 | Timo | Apple
3 | Cherries | 2020 | Eva | Cherries
2020 | Frey | Banana
2018 | Ali | Apple
The code that doesn't work:不起作用的代码:
i=0
for i in range(3):
df1['2018'] = len(df2.loc[df2['fruit_farmed'] == df1['Fruit_Name'][i]])
df1['2019'] = len(df2.loc[df2['fruit_farmed'] == df1['Fruit_Name'][i]])
df1['2020'] = len(df2.loc[df2['fruit_farmed'] == df1['Fruit_Name'][i]])
i=i+1
Output:
No Fruit_Name 2018 2019 2020
0 1 Apple 1 1 1
1 2 Banana 1 1 1
2 3 Cherries 1 1 1
You can try with crosstab
then join
您可以尝试使用crosstab
然后join
s = pd.crosstab(df2.fruit_farmed, df2.year)
s = s.reindex(df1.Fruit_Name)
s.index=df1.index
df1 = df1.join(s)
Another way can be to groupby fruit_farmed, year and then unstack year.另一种方法可以是 groupby fruit_farmed, year 然后 unstack year。
import pandas as pd
df2 = pd.DataFrame([[2018,'John','Apple'],[2019,'Timo','Apple'],
[2020,'Eva','Cherries'],[2020,'Frey','Banna'],
[2018,'Ali','Apple']],
columns=['year','farmer','fruit_farmed'])
df1 = df2.groupby(['fruit_farmed','year']).count().unstack('year').reset_index().fillna(0)
#rename the columns
df1.columns = ['fruit_farmed','2018','2019','2020']
print(df1)
fruit_farmed 2018 2019 2020
0 Apple 2.0 1.0 0.0
1 Banna 0.0 0.0 1.0
2 Cherries 0.0 0.0 1.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.