简体   繁体   English

如何使用大熊猫估算缺失值?

[英]How Do I impute missing values using pandas?

I am trying to impute missing values as the mean of other values in the column; 我试图将缺少的值归为该列中其他值的平均值。 however, my code is having no effect. 但是,我的代码无效。 Does anyone know what I may be doing wrong? 有人知道我可能做错了吗? Thanks! 谢谢!

My code: 我的代码:

  from sklearn.preprocessing import Imputer
    imputer = Imputer(missing_values ='NaN', strategy = 
    'mean', axis = 0)
    imputer = imputer.fit(x[:, 1:3])
    x[:, 1:3] = imputer.transform(x[:, 1:3])
    print(dataset)

Output 产量

Country   Age   Salary Purchased
0   France  44.0  72000.0        No
1    Spain  27.0  48000.0       Yes
2  Germany  30.0  54000.0        No
3    Spain  38.0  61000.0        No
4  Germany  40.0      NaN       Yes
5   France  35.0  58000.0       Yes
6    Spain   NaN  52000.0        No
7   France  48.0  79000.0       Yes
8  Germany  50.0  83000.0        No
9   France  37.0  67000.0       Yes

You can do the following, let's say df is your dataset: 您可以执行以下操作,假设df是您的数据集:

from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values ='NaN', strategy = 'mean', axis = 0)

df[['Age','Salary']]=imputer.fit_transform(df[['Age','Salary']])

print(df)

   Country        Age        Salary Purchased
0   France  44.000000  72000.000000        No
1    Spain  27.000000  48000.000000       Yes
2  Germany  30.000000  54000.000000        No
3    Spain  38.000000  61000.000000        No
4  Germany  40.000000  63777.777778       Yes
5   France  35.000000  58000.000000       Yes
6    Spain  38.777778  52000.000000        No
7   France  48.000000  79000.000000       Yes
8  Germany  50.000000  83000.000000        No
9   France  37.000000  67000.000000       Yes

You're assigning an Imputer object to the variable imputer: 您正在将Imputer对象分配给变量imputer:

imputer = Imputer(missing_values ='NaN', strategy = 'mean', axis = 0)

You then call the fit() function on your Imputer object, and then the transform() function. 然后,您在Imputer对象上调用fit()函数,然后在transform()函数上调用。

Then you print the dataset variable, which I'm not sure where it comes from. 然后打印dataset变量,我不确定它来自哪里。 Did you mean to print the Imputer object, or the result of one of those calls instead? 您是要打印Imputer对象,还是打印其中一个调用的结果?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何遍历 pandas 数据框以估算另一个数据框中存在的缺失值? - How do I iterate over a pandas dataframe to impute missing values that are present in another data frame? 我需要使用 pandas dataframe 根据第二个分类变量中的值来估算分类变量的缺失值 - I need to impute the missing values of a categorical variable based on the values in second categorical variable using pandas dataframe 如何使用一系列值来估算/替换 Pandas DataFrame 中的缺失值? - How to impute/replace missing values in a pandas DataFrame with a sequence of values? 如何用模式/平均值估算 pandas dataframe 中的整个缺失值? - How to impute entire missing values in pandas dataframe with mode/mean? 如何使用 KNN 估算缺失值 - How to impute missing values with KNN 用于估算缺失值的线性回归 pandas python - Linear regression to impute missing values pandas python 如何使用python中的年增长率估算缺失值? - How can I impute missing values using a yearly growth rate in python? 如何在 pandas 的 2 layered group by 中“估算”缺失的项目 - How to "impute" missing item in 2 layered group by in pandas 如何根据先前的值来估算缺失值? - How to impute the missing values depending on previous values? 使用 Python 估算缺失的日期和值 - Impute missing dates and values using Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM