[英]replacing the missing value with different values on the same column in pandas dataframe
A B C D
1 2010 one 0 0
2 2020 one 2 4
3 2007 two 0 8
4 2010 one 8 4
5 2020 four 6 12
6 2007 three 0 14
7 2006 four 7 14
8 2010 two 10 12
I need to replace 0 with the average of the C values of years.For example 2010 C value would be 9. What is the best way to do this?我需要用 C 年值的平均值替换 0。例如 2010 C 值将是 9。最好的方法是什么? i have over 10,000 rows.我有超过 10,000 行。
You can use replace
to change 0's to np.nan
in Column C, and use fillna
to map
the yearly averages:您可以使用replace
将np.nan
列中的 0 更改为 np.nan,并将fillna
用于map
的年度平均值:
df.C.replace({0:np.nan},inplace=True)
df.C.fillna(
df.A.map(
df.groupby(df['A']).C.mean()\
.to_dict()
),inplace=True
)
print(df)
A B C D
0 2010 one 9.0 0
1 2020 one 2.0 4
2 2007 two NaN 8
3 2010 one 8.0 4
4 2020 four 6.0 12
5 2007 three NaN 14
6 2006 four 7.0 14
7 2010 two 10.0 12
2007 is still NaN
because we have no values other than 0's in the initial data. 2007 仍然是NaN
,因为我们在初始数据中除了 0 之外没有其他值。
Here is what I think I will do it.这就是我认为我会做的事情。 The code below will be pseudo-code.下面的代码将是伪代码。
1: You find the avg for each year, and put it to a dict. 1:您找到每年的平均值,并将其放入字典中。
my_year_dict = {'2020':xxx,'2021':xxx}
2: Use apply & lambda functions 2:使用apply & lambda函数
df[New C Col] = df[C].apply(lambda x: my_year_dict[x] if x is 0)
Hope it can be a start!希望这可以是一个开始!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.