scikit-learn中的线性回归

Question

I started learning maching learning on Python using Pandas and Sklearn. 我开始使用Pandas和Sklearn学习Python的学习。 I tried to use the LinearRegression().fit method : 我尝试使用LinearRegression().fit方法：

import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split 
house_data = pd.read_csv(r"C:\Users\yassine\Desktop\ml\OC-tp-ML\house_data.csv")
y = house_data[["price"]] 
x = house_data[["surface","arrondissement"]] 
X = house_data.iloc[:, 1:3].values  
x_train, x_test, y_train, y_test = train_test_split (x, y, test_size=0.25, random_state=1) 
model = LinearRegression()
model.fit(x_train, y_train)

When I run the code, I have this message : 当我运行代码时，我有这样的消息：

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Can You help me please. 你能帮我吗。

Answer 1

Machine learning models may require you to impute the data as part of your data cleaning process. 机器学习模型可能需要您将数据作为数据清理过程的一部分。 Linear regression cares a lot about the yhat, so I usually start with imputing the mean. 线性回归非常关注yhat，所以我通常从推算平均值开始。 If you aren't comfortable imputing the missing data, you can drop the observations that contain NaN (provided you only have a small proportion of NaN observations.) 如果您不喜欢输入缺失的数据，则可以删除包含NaN的观测值（假设您只有一小部分NaN观测值。）

Imputing the mean can look like this: 输入均值可能如下所示：

df = df.fillna(df.mean())

Imputing to zero can look like this: 归零可能如下所示：

df = df.fillna(0)

Imputing to a custom result can look like: 输入到自定义结果可能如下所示：

df = df.fillna(my_func(args))

Dropping altogether can look like: 完全掉线可能看起来像：

df = df.dropna()

Prepping so that inf may be caught by these methods ahead of time can look like: 准备好以便提前可以通过这些方法捕获inf看起来像：

df.replace([np.inf, -np.inf], np.nan)

scikit-learn中的线性回归

问题描述

1 个解决方案

解决方案1
3 2018-12-13 16:20:43

scikit-learn中的线性回归

问题描述

1 个解决方案

解决方案1 3 2018-12-13 16:20:43

解决方案1
3 2018-12-13 16:20:43