I want to create new column based on the number of rows each fruit is present in df2.
Expected Output of df1
No | Fruit_Name | 2018 | 2019 | 2020
1 | Apple | 2 | 1 | 0
2 | Banana | 0 | 0 | 1
3 | Cherries | 0 | 0 | 1
df1 df2
No | Fruit_Name | year | farmer | fruit_farmed
1 | Apple | 2018 | John | Apple
2 | Banana | 2019 | Timo | Apple
3 | Cherries | 2020 | Eva | Cherries
2020 | Frey | Banana
2018 | Ali | Apple
The code that doesn't work:
i=0
for i in range(3):
df1['2018'] = len(df2.loc[df2['fruit_farmed'] == df1['Fruit_Name'][i]])
df1['2019'] = len(df2.loc[df2['fruit_farmed'] == df1['Fruit_Name'][i]])
df1['2020'] = len(df2.loc[df2['fruit_farmed'] == df1['Fruit_Name'][i]])
i=i+1
Output:
No Fruit_Name 2018 2019 2020
0 1 Apple 1 1 1
1 2 Banana 1 1 1
2 3 Cherries 1 1 1
You can try with crosstab
then join
s = pd.crosstab(df2.fruit_farmed, df2.year)
s = s.reindex(df1.Fruit_Name)
s.index=df1.index
df1 = df1.join(s)
Another way can be to groupby fruit_farmed, year and then unstack year.
import pandas as pd
df2 = pd.DataFrame([[2018,'John','Apple'],[2019,'Timo','Apple'],
[2020,'Eva','Cherries'],[2020,'Frey','Banna'],
[2018,'Ali','Apple']],
columns=['year','farmer','fruit_farmed'])
df1 = df2.groupby(['fruit_farmed','year']).count().unstack('year').reset_index().fillna(0)
#rename the columns
df1.columns = ['fruit_farmed','2018','2019','2020']
print(df1)
fruit_farmed 2018 2019 2020
0 Apple 2.0 1.0 0.0
1 Banna 0.0 0.0 1.0
2 Cherries 0.0 0.0 1.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.