[英]Pandas, how to replace mean values in python data frame using multiple grouped columns
Replace a data frame values with mean using multiple grouped columns. 使用多个分组的列将数据框值替换为均值。 The below snapshot is the dataframe:
下面的快照是数据帧:
Current Loan Amount DateTime Day Month Year
0 611314 1-Jan-92 1 Jan 92
1 266662 2-Jan-92 2 Jan 92
2 153494 3-Jan-92 3 Jan 92
3 176242 4-Jan-92 4 Jan 92
4 321992 5-Jan-92 5 Jan 92
5 202928 6-Jan-92 6 Jan 92
6 621786 7-Jan-92 7 Jan 92
7 266794 8-Jan-92 8 Jan 92
8 202466 9-Jan-92 9 Jan 92
9 266288 10-Jan-92 10 Jan 92
10 121110 11-Jan-92 11 Jan 92
11 258104 12-Jan-92 12 Jan 92
12 161722 13-Jan-92 13 Jan 92
13 753016 14-Jan-92 14 Jan 92
14 444664 15-Jan-92 15 Jan 92
15 172282 16-Jan-92 16 Jan 92
16 275440 17-Jan-92 17 Jan 92
17 218834 18-Jan-92 18 Jan 92
18 0 19-Jan-92 19 Jan 92
19 0 20-Jan-92 20 Jan 92
I need to replace the 0.0 values which with mean of the Current Loan Amount for the year and within the same month. 我需要用当年和当月的当前贷款金额的平均值替换0.0值。
I used different methods, and the below does give me the mean, but it doesnot change the dataframe and removes the rest of the columns 我使用了不同的方法,下面的方法确实给出了平均值,但它不会更改数据框并删除其余的列
data = data_loan.groupby(['Year','Month'])
def replace(group):
mask = (group==0)
group[mask] = group[~mask].mean()
return group
new_data = data.transform(replace)
import numpy as np
data_loan['current'] = data_loan['current'].replace(0, np.nan)
data_loan["current"] = data_loan.groupby(['Month','Year'])["current"].transform(lambda x: x.fillna(x.mean()))
This will replace 0 with mean of the group. 这将用组的均值替换0。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.