[英]How to pd.fillna(mean()) acccording to a column value which changes?
I have the following dataframe: 我有以下数据框:
data/hora
2017-08-18 09:22:33 22162 NaN 65.9 NaN NaN
2017-10-03 11:08:26 22162 NaN 60.5 NaN NaN
2018-02-17 01:45:24 22162 NaN 69.7 NaN NaN
2018-02-17 01:45:55 74034 NaN 67.5 NaN NaN
2018-02-17 01:46:29 74034 NaN 65.4 NaN NaN
2018-02-17 01:47:20 74034 NaN 63.3 NaN NaN
2018-02-17 01:48:35 74034 NaN 61.3 NaN NaN
2018-02-17 01:49:08 17448 NaN 63.4 NaN NaN
2018-02-17 01:49:31 17448 NaN 65.5 NaN NaN
2018-02-17 01:49:55 17448 NaN 67.6 NaN NaN
To which I want to fill the NaN as the mean of which column. 我想将NaN填充到哪一列的均值。 However, this value change as the 'Machine' changes - there are three machine values.
但是,该值会随着“机器”的更改而更改-有三个机器值。 Therefore, I need I
fillna
that changes according to Machine column value. 因此,我需要根据Machine列值更改的
fillna
。
I tried: 我试过了:
for i in df:
if i.isin(df.loc[df['Machine'] == '22162']):
df.fillna(df.loc[df['Machine'] == '22162'].mean)
elif i.isin(df.loc[df['Machine'] == '17448']):
df.fillna(df.loc[df['Machine'] == '17448'].mean)
elif i.isin(df.loc[df['Machine'] == '74034']):
df.fillna(df.loc[df['Machine'] == '74034'].mean)
But it didn't work. 但这没有用。
Thanks! 谢谢!
It's a bit all over the place & hard coded but it should work. 它到处都是位并且经过硬编码,但是应该可以使用。 I named the NaN columns
['A', 'C', 'D']
我将NaN列命名为
['A', 'C', 'D']
data hora machine A B C D
0 2017-08-18 09:22:33 22162 NaN 65.9 NaN NaN
1 2017-10-03 11:08:26 22162 NaN 60.5 NaN NaN
2 2018-02-17 01:45:24 22162 NaN 69.7 NaN NaN
3 2018-02-17 01:45:55 74034 NaN 67.5 NaN NaN
4 2018-02-17 01:46:29 74034 NaN 65.4 NaN NaN
5 2018-02-17 01:47:20 74034 NaN 63.3 NaN NaN
6 2018-02-17 01:48:35 74034 NaN 61.3 NaN NaN
7 2018-02-17 01:49:08 17448 NaN 63.4 NaN NaN
8 2018-02-17 01:49:31 17448 NaN 65.5 NaN NaN
9 2018-02-17 01:49:55 17448 NaN 67.6 NaN NaN
columns = ['A', 'C', 'D']
for clm in columns:
df[clm] = df[clm].fillna(df.machine.map(df.groupby('machine')['B'].mean().to_dict()))
Results in 结果是
data hora machine A B C D
0 2017-08-18 09:22:33 22162 65.366667 65.9 65.366667 65.366667
1 2017-10-03 11:08:26 22162 65.366667 60.5 65.366667 65.366667
2 2018-02-17 01:45:24 22162 65.366667 69.7 65.366667 65.366667
3 2018-02-17 01:45:55 74034 64.375000 67.5 64.375000 64.375000
4 2018-02-17 01:46:29 74034 64.375000 65.4 64.375000 64.375000
5 2018-02-17 01:47:20 74034 64.375000 63.3 64.375000 64.375000
6 2018-02-17 01:48:35 74034 64.375000 61.3 64.375000 64.375000
7 2018-02-17 01:49:08 17448 65.500000 63.4 65.500000 65.500000
8 2018-02-17 01:49:31 17448 65.500000 65.5 65.500000 65.500000
9 2018-02-17 01:49:55 17448 65.500000 67.6 65.500000 65.500000
Probably not the best way but gets the job done. 可能不是最好的方法,但是可以完成工作。
This is how I've solved my problem: 这是我解决问题的方式:
grupo = df.groupby(df["Machine"])
cada_maquina = list(grupo)
for i in range(3):
cada_maquina[i][1].fillna(cada_maquina[i][1].mean(), inplace=True)
Thank you very much for every comment! 非常感谢您的每条评论! :D
:d
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.