简体   繁体   English

通过基于另一列的模式估算列值

[英]Impute column value via mode based on another column

在此处输入图像描述 I have a dataset that contains n columns.我有一个包含 n 列的数据集。 Those include a column for the Birth City and another for the Birth Country.其中包括出生城市的列和出生国家的列。 What I want to do is, based on the Birth Country, I am going to get the mode of the Birth City, and use it for the missing values in the Birth City column.我要做的是,基于出生国家,我将获得出生城市的模式,并将其用于出生城市列中的缺失值。

I tried the following code, but nothing is affected.我尝试了以下代码,但没有任何影响。

df["Birth City"]= df.groupby('Birth Country')['Birth City'].transform(lambda x: x.fillna(x.mode()))
df[df["Birth City"].isnull()]

After executing the above code, I still get the same missing Birth City values.执行上述代码后,我仍然得到相同的缺失出生城市值。

You need to replace x.fillna(x.mode()) with x.mode()[0] in your code您需要在x.fillna(x.mode())替换为x.mode()[0]

df["Birth City"]= df.groupby('Birth Country')['Birth City'].transform(lambda x: x.mode()[0])
df[df["Birth City"].isnull()]

In your example you are trying to use Series.transform which applies lambda x: x.fillna(x.mode()) .在您的示例中,您尝试使用Series.transform应用lambda x: x.fillna(x.mode()) This lambda function replaces x with x.fillna(x.mode()) (not x.mode() ) which equals to Series with missing values filled.此 lambda function 将 x 替换为x.fillna(x.mode()) (不是x.mode() ),这等于填充缺失值的系列。 x.mode() also has type Series and equals to Series of modes in sorted order, so you have to use x.mode()[0] x.mode()也具有类型Series并且等于按排序顺序排列的模式系列,因此您必须使用x.mode()[0]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM