[英]Impute column value via mode based on another column
I have a dataset that contains n columns.我有一个包含 n 列的数据集。 Those include a column for the Birth City and another for the Birth Country.其中包括出生城市的列和出生国家的列。 What I want to do is, based on the Birth Country, I am going to get the mode of the Birth City, and use it for the missing values in the Birth City column.我要做的是,基于出生国家,我将获得出生城市的模式,并将其用于出生城市列中的缺失值。
I tried the following code, but nothing is affected.我尝试了以下代码,但没有任何影响。
df["Birth City"]= df.groupby('Birth Country')['Birth City'].transform(lambda x: x.fillna(x.mode()))
df[df["Birth City"].isnull()]
After executing the above code, I still get the same missing Birth City values.执行上述代码后,我仍然得到相同的缺失出生城市值。
You need to replace x.fillna(x.mode())
with x.mode()[0]
in your code您需要在x.fillna(x.mode())
替换为x.mode()[0]
df["Birth City"]= df.groupby('Birth Country')['Birth City'].transform(lambda x: x.mode()[0])
df[df["Birth City"].isnull()]
In your example you are trying to use Series.transform
which applies lambda x: x.fillna(x.mode())
.在您的示例中,您尝试使用Series.transform
应用lambda x: x.fillna(x.mode())
。 This lambda function replaces x with x.fillna(x.mode())
(not x.mode()
) which equals to Series with missing values filled.此 lambda function 将 x 替换为x.fillna(x.mode())
(不是x.mode()
),这等于填充缺失值的系列。 x.mode()
also has type Series
and equals to Series of modes in sorted order, so you have to use x.mode()[0]
x.mode()
也具有类型Series
并且等于按排序顺序排列的模式系列,因此您必须使用x.mode()[0]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.