[英]How to assign value to multi-value in pandas
我要为All SkinThickness
分配零值,每个患者的平均值在一定的Age
范围内。
因此,我按Age
将数据帧分组,以获取每个年龄段的SkinThickness
平均值。
为了将SkinThickness
列中的每个零值分配给根据年龄分组计算出的相应平均值。
ageSkinMean = df_clean.groupby("Age_Class")["SkinThickness"].mean()
>>> ageSkinMean
Age_Class
21-22 years 82.163399
23-25 years 103.171429
26-30 years 91.170254
31-38 years 80.133028
39-47 years 73.685851
48-58 years 89.130233
60+ years 40.899160
Name: Insulin, dtype: float64
目前,我正在运行的代码不足...使用iterrows()
需要花费很长时间
start = time.time()
for i, val in df_clean[df_clean.SkinThickness == 0].iterrows():
if val[7] < 22:
df_clean.loc[i, "SkinThickness"] = ageSkinMean[0]
elif val[7] < 25:
df_clean.loc[i, "SkinThickness"] = ageSkinMean[1]
elif val[7] < 30:
df_clean.loc[i, "SkinThickness"] = ageSkinMean[2]
elif val[7] < 38:
df_clean.loc[i, "SkinThickness"] = ageSkinMean[3]
elif val[7] < 47:
df_clean.loc[i, "SkinThickness"] = ageSkinMean[4]
elif val[7] < 58:
df_clean.loc[i, "SkinThickness"] = ageSkinMean[5]
else:
df_clean.loc[i, "SkinThickness"] = ageSkinMean[6]
print(time.time() - start)
我想知道是否存在对此类代码块进行任何熊猫优化以使其运行更快
您可以使用pandas转换功能将SkinThickness 0值替换为平均值
age_skin_thickness_mean = df_clean.groupby('Age_Class')['SkinThickness'].mean()
def replace_with_mean_thickness(row):
row['SkinThickness'] = age_skin_thickness_mean[row['Age_Class']]
return row
df_clean.loc[df_clean['SkinThickness'] == 0] = df_clean.loc[df_clean['SkinThickness'] == 0].transform(replace_with_mean_thickness, axis=1)
现在,在df_clean中所有SkinThickness == 0的行的SkinThickness等于其年龄组平均值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.