简体   繁体   English

scikit-learn中的线性回归

[英]Linear regression in scikit-learn

I started learning maching learning on Python using Pandas and Sklearn. 我开始使用Pandas和Sklearn学习Python的学习。 I tried to use the LinearRegression().fit method : 我尝试使用LinearRegression().fit方法:

import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split 
house_data = pd.read_csv(r"C:\Users\yassine\Desktop\ml\OC-tp-ML\house_data.csv")
y = house_data[["price"]] 
x = house_data[["surface","arrondissement"]] 
X = house_data.iloc[:, 1:3].values  
x_train, x_test, y_train, y_test = train_test_split (x, y, test_size=0.25, random_state=1) 
model = LinearRegression()
model.fit(x_train, y_train) 

When I run the code, I have this message : 当我运行代码时,我有这样的消息:

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Can You help me please. 你能帮我吗。

Machine learning models may require you to impute the data as part of your data cleaning process. 机器学习模型可能需要您将数据作为数据清理过程的一部分。 Linear regression cares a lot about the yhat, so I usually start with imputing the mean. 线性回归非常关注yhat,所以我通常从推算平均值开始。 If you aren't comfortable imputing the missing data, you can drop the observations that contain NaN (provided you only have a small proportion of NaN observations.) 如果您不喜欢输入缺失的数据,则可以删除包含NaN的观测值(假设您只有一小部分NaN观测值。)

Imputing the mean can look like this: 输入均值可能如下所示:

df = df.fillna(df.mean())

Imputing to zero can look like this: 归零可能如下所示:

df = df.fillna(0)

Imputing to a custom result can look like: 输入到自定义结果可能如下所示:

df = df.fillna(my_func(args))

Dropping altogether can look like: 完全掉线可能看起来像:

df = df.dropna()

Prepping so that inf may be caught by these methods ahead of time can look like: 准备好以便提前可以通过这些方法捕获inf看起来像:

df.replace([np.inf, -np.inf], np.nan)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM