简体   繁体   English

如何在熊猫中将价值分配给多价值

[英]How to assign value to multi-value in pandas

I want to assign All SkinThickness zero value with the mean of each patient lies in certain range of Age . 我要为All SkinThickness分配零值,每个患者的平均值在一定的Age范围内。

So I grouped data-frame by Age to get the mean of SkinThickness for each age range. 因此,我按Age将数据帧分组,以获取每个年龄段的SkinThickness平均值。

In order to assign every Zero value in SkinThickness Column to the corresponding mean value computed from the age grouping. 为了将SkinThickness列中的每个值分配给根据年龄分组计算出的相应平均值。

ageSkinMean = df_clean.groupby("Age_Class")["SkinThickness"].mean()
>>> ageSkinMean

Age_Class
21-22 years     82.163399
23-25 years    103.171429
26-30 years     91.170254
31-38 years     80.133028
39-47 years     73.685851
48-58 years     89.130233
60+ years       40.899160
Name: Insulin, dtype: float64

Currently I'm running such insufficient code ... which takes too long time for using iterrows() 目前,我正在运行的代码不足...使用iterrows()需要花费很长时间

start = time.time()
for i, val in df_clean[df_clean.SkinThickness == 0].iterrows():
    if val[7] < 22:
        df_clean.loc[i, "SkinThickness"] = ageSkinMean[0]
    elif val[7] < 25:
        df_clean.loc[i, "SkinThickness"] = ageSkinMean[1]
    elif val[7] < 30:
        df_clean.loc[i, "SkinThickness"] = ageSkinMean[2]
    elif val[7] < 38:
        df_clean.loc[i, "SkinThickness"] = ageSkinMean[3]
    elif val[7] < 47:
        df_clean.loc[i, "SkinThickness"] = ageSkinMean[4]
    elif val[7] < 58:
        df_clean.loc[i, "SkinThickness"] = ageSkinMean[5]
    else:
        df_clean.loc[i, "SkinThickness"] = ageSkinMean[6]
print(time.time() - start)

I wonder if there exist any pandas optimization to such block of code to run faster 我想知道是否存在对此类代码块进行任何熊猫优化以使其运行更快

You can use pandas transform function to replace SkinThickness 0 value with mean values 您可以使用pandas转换功能将SkinThickness 0值替换为平均值

    age_skin_thickness_mean = df_clean.groupby('Age_Class')['SkinThickness'].mean()

    def replace_with_mean_thickness(row):
       row['SkinThickness'] = age_skin_thickness_mean[row['Age_Class']]
       return row

    df_clean.loc[df_clean['SkinThickness'] == 0] = df_clean.loc[df_clean['SkinThickness'] == 0].transform(replace_with_mean_thickness, axis=1)

All rows having SkinThickness == 0 in df_clean will now have SkinThickness equal to their age group mean value. 现在,在df_clean中所有SkinThickness == 0的行的SkinThickness等于其年龄组平均值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM