[英]How to assign value to multi-value in pandas
I want to assign All SkinThickness
zero value with the mean of each patient lies in certain range of Age
. 我要为All
SkinThickness
分配零值,每个患者的平均值在一定的Age
范围内。
So I grouped data-frame by Age
to get the mean of SkinThickness
for each age range. 因此,我按
Age
将数据帧分组,以获取每个年龄段的SkinThickness
平均值。
In order to assign every Zero value in SkinThickness
Column to the corresponding mean value computed from the age grouping. 为了将
SkinThickness
列中的每个零值分配给根据年龄分组计算出的相应平均值。
ageSkinMean = df_clean.groupby("Age_Class")["SkinThickness"].mean()
>>> ageSkinMean
Age_Class
21-22 years 82.163399
23-25 years 103.171429
26-30 years 91.170254
31-38 years 80.133028
39-47 years 73.685851
48-58 years 89.130233
60+ years 40.899160
Name: Insulin, dtype: float64
Currently I'm running such insufficient code ... which takes too long time for using iterrows()
目前,我正在运行的代码不足...使用
iterrows()
需要花费很长时间
start = time.time()
for i, val in df_clean[df_clean.SkinThickness == 0].iterrows():
if val[7] < 22:
df_clean.loc[i, "SkinThickness"] = ageSkinMean[0]
elif val[7] < 25:
df_clean.loc[i, "SkinThickness"] = ageSkinMean[1]
elif val[7] < 30:
df_clean.loc[i, "SkinThickness"] = ageSkinMean[2]
elif val[7] < 38:
df_clean.loc[i, "SkinThickness"] = ageSkinMean[3]
elif val[7] < 47:
df_clean.loc[i, "SkinThickness"] = ageSkinMean[4]
elif val[7] < 58:
df_clean.loc[i, "SkinThickness"] = ageSkinMean[5]
else:
df_clean.loc[i, "SkinThickness"] = ageSkinMean[6]
print(time.time() - start)
I wonder if there exist any pandas optimization to such block of code to run faster 我想知道是否存在对此类代码块进行任何熊猫优化以使其运行更快
You can use pandas transform function to replace SkinThickness 0 value with mean values 您可以使用pandas转换功能将SkinThickness 0值替换为平均值
age_skin_thickness_mean = df_clean.groupby('Age_Class')['SkinThickness'].mean()
def replace_with_mean_thickness(row):
row['SkinThickness'] = age_skin_thickness_mean[row['Age_Class']]
return row
df_clean.loc[df_clean['SkinThickness'] == 0] = df_clean.loc[df_clean['SkinThickness'] == 0].transform(replace_with_mean_thickness, axis=1)
All rows having SkinThickness == 0 in df_clean will now have SkinThickness equal to their age group mean value. 现在,在df_clean中所有SkinThickness == 0的行的SkinThickness等于其年龄组平均值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.