如何使用熊猫中的for循环根据另一列的条件填充一列中的缺失值？

Question

weather_train=pd.DataFrame({
'site_id':[0,0,0,0,0,0,1,1,1,1,1],
'air_temperature': [25,22,'NaN',28,'NaN',30,45,'NaN',50,'Nan',24]
})

When site_id is 0, I need to calculate the mean air_temperature for site_id 0 and then use the mean to fill in the missing values for air_temperature in site_id 0.当site_id是0，我需要计算的平均air_temperature的site_id 0，然后用平均填补了缺失值air_temperature在site_id 0。
Then, when the site_id is 1, I need to calculate the mean air_temperature for site_id 1 and fill in the missing values for air_temperature in site_id 1.然后，当site_id是1，我需要计算的平均air_temperature为SITE_ID在失踪值1和填充air_temperature在SITE_ID 1。

Have to do the same process for cloud_coverage .必须对cloud_coverage执行相同的过程。

Can anyone help me write a for loop in pandas for this?任何人都可以帮我在 Pandas 中为此编写一个 for 循环吗？

Answer 1

No need for loops.不需要循环。 Simply use groupby().transform() for inline mean aggregation enclosed in a conditional numpy.where :只需将groupby().transform()用于包含在条件numpy.where中的内联平均聚合：

weather_train['air_temperature'] = np.where(pd.isnull(weather_train['air_temperature']),
                                            weather_train.groupby(['site'])['air_temperature'].transform('mean'),    
                                            weather_train['air_temperature'])

如何使用熊猫中的for循环根据另一列的条件填充一列中的缺失值？

问题描述

1 个解决方案

解决方案1
1 2020-03-11 01:48:44

如何使用熊猫中的for循环根据另一列的条件填充一列中的缺失值？

问题描述

1 个解决方案

解决方案1 1 2020-03-11 01:48:44

解决方案1
1 2020-03-11 01:48:44