简体   繁体   English

用 python 中每个组的最大值填充na

[英]fillna with max value of each group in python

Dataframe Dataframe

df=pd.DataFrame({"sym":["a","a","aa","aa","aa","a","ab","ab","ab"],
                "id_h":[2.1, 2.2 , 2.5 , 3.1 , 2.5, 3.8 , 2.5, 5,6],
                 "pm_h":[np.nan, 2.3, np.nan , 2.8, 2.7, 3.7, 2.4, 4.9,np.nan]})

want to fill pm_h nan values with max id_h value of each "sys" group ie (a, aa, ab)想用每个“sys”组的最大 id_h 值填充 pm_h nan 值,即(a,aa,ab)

Required output:所需 output:

df1=pd.DataFrame({"sym":["a","a","aa","aa","aa","a","ab","ab","ab"],
                "id_h":[2.1, 2.2 , 2.5 , 3.1 , 2.5, 3.8 , 2.5, 5,6],
                 "pm_h":[3.8, 2.3, 3.1 , 2.8, 2.7, 3.7, 2.4, 4.9, 6})

Use Series.fillna with GroupBy.transform by maximal values for new Series with same index like original:Series.fillnaGroupBy.transform结合使用,为具有与原始索引相同的新Series的最大值:

df['pm_h'] = df['pm_h'].fillna(df.groupby('sym')['id_h'].transform('max'))
print (df)
  sym  id_h  pm_h
0   a   2.1   3.8
1   a   2.2   2.3
2  aa   2.5   3.1
3  aa   3.1   2.8
4  aa   2.5   2.7
5   a   3.8   3.7
6  ab   2.5   2.4
7  ab   5.0   4.9
8  ab   6.0   6.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM