[英]Python Dataframe filling NaN values using information from other columns
I tried to solve this problem on my own, but I unfortunately haven't made much progress and would really appreciate anyone who can help me out. 我试图自己解决这个问题,但不幸的是我没有取得太大进展,并且非常感谢任何可以帮助我的人。
My current dataframe contains 3 columns: 2 healthy columns and one column with some missing values, denoted as NaN. 我当前的数据框包含3列:2个健康列和1个缺少某些值的列,表示为NaN。
df
Out[18]:
x1 x2 x3
0 A 1 2.0
1 B 0 NaN
2 A 0 1.0
3 A 1 2.0
4 A 0 NaN
5 B 1 1.0
6 A 1 1.0
7 B 0 2.0
8 B 0 2.0
I would like to fill the missing values in 'x3' by taking the median value of groupby of 'x1' and 'x2'. 我想通过获取“ x1”和“ x2”的groupby的中值来填充“ x3”中的缺失值。
groupby_df = df.groupby(['x1', 'x2'])['x3'].median()
groupby_df
Out[22]:
x1 x2
A 0 1.0
1 2.0
B 0 2.0
1 1.0
So, for instance, the NaN value corresponding to (B, 0) would be replaced by 2 and (A,0) by 1. I unfortunately can't figure out this part. 因此,例如,对应于(B,0)的NaN值将被2替换,而(A,0)则被1替换。不幸的是,我无法弄清楚这部分。 Is there an elegant "DataFrame way" of filling in the NaN values with the computed median using groupby?
是否有一种优雅的“ DataFrame方法”,可以使用groupby用计算出的中位数填充NaN值?
Thank You 谢谢
using fillna
inside groupby
在
groupby
使用fillna
df['x3']=df.groupby(['x1','x2'])['x3'].apply(lambda x : x.fillna(x.median()))
df
Out[928]:
x1 x2 x3
0 A 1 2.0
1 B 0 2.0
2 A 0 1.0
3 A 1 2.0
4 A 0 1.0
5 B 1 1.0
6 A 1 1.0
7 B 0 2.0
8 B 0 2.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.