I created a DataFrame
A1 A2 A3 A4
0 cccc xx 6 5
1 aaaa yy 8 0
2 aaaa xx 15 0
3 bbbb xx 21 4
4 bbbb xx 26 0
5 cccc yy 33 2
6 aaaa xx 44 1
7 cccc xx 48 2
8 aaaa yy 58 0
9 cccc yy 59 5
10 bbbb yy 77 0
11 bbbb yy 99 0
and now using crosstab()
with the command given below I was created new DataFrame.
df5 = pd.crosstab(df4['A1'], df4['A2'], margins=False,values=df4['A3'] ,
dropna=False, aggfunc='mean').reset_index().fillna(0)
this works properl. it gives me output as follows
A2 A1 xx yy
0 aaaa 29.5 33.0
1 bbbb 23.5 88.0
2 cccc 27.0 46.0
Now I want to store the mean values into the DataFrame df4
How can I do it, since I want to change A3
which contain 0 in df5
based on the crosstab()
? and I want output as follows
A1 A2 A3 A4
0 aaaa xx 15 29.5
1 aaaa xx 44 1.0
2 aaaa yy 8 33.0
3 aaaa yy 58 33.0
4 bbbb xx 21 4.0
5 bbbb xx 26 23.5
6 bbbb yy 77 88.0
7 bbbb yy 99 88.0
8 cccc xx 6 5.0
9 cccc xx 48 2.0
mask
+ groupby
+ transform
Ignoring the unnecessary reordering and removal of some rows in your desired output, you can use mask
with groupby
:
group_mean = df4.groupby(['A1', 'A2'])['A3'].transform('mean')
df4['A4'] = df4['A4'].mask(df4['A4'] == 0, group_mean)
print(df4)
A1 A2 A3 A4
0 cccc xx 6 5.0
1 aaaa yy 8 33.0
2 aaaa xx 15 29.5
3 bbbb xx 21 4.0
4 bbbb xx 26 23.5
5 cccc yy 33 2.0
6 aaaa xx 44 1.0
7 cccc xx 48 2.0
8 aaaa yy 58 33.0
9 cccc yy 59 5.0
10 bbbb yy 77 88.0
11 bbbb yy 99 88.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.