[英]Trying to understand an example script on ML
我正在嘗試通過一個關於機器學習的示例腳本: 線性模型系數解釋中的常見陷阱,但我無法理解某些步驟。 腳本的開頭是這樣的:
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import fetch_openml
survey = fetch_openml(data_id=534, as_frame=True)
# We identify features `X` and targets `y`: the column WAGE is our
# target variable (i.e., the variable which we want to predict).
X = survey.data[survey.feature_names]
X.describe(include="all")
X.head()
# Our target for prediction is the wage.
y = survey.target.values.ravel()
survey.target.head()
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
train_dataset = X_train.copy()
train_dataset.insert(0, "WAGE", y_train)
_ = sns.pairplot(train_dataset, kind='reg', diag_kind='kde')
我的問題出在線路上
y = survey.target.values.ravel()
survey.target.head()
如果我們在這些行之后立即檢查survey.target.head()
,輸出是
Out[36]:
0 5.10
1 4.95
2 6.67
3 4.00
4 7.50
Name: WAGE, dtype: float64
模型如何知道WAGE
是目標變量? 是不是必須顯式聲明?
行survey.target.values.ravel()
旨在展平數組,但在本例中它不是必需的。 survey.target 是一個 pd 系列(即 1 列數據框),survey.target.values 是一個 numpy 數組。 您可以將兩者都用於訓練/測試拆分,因為survey.target
只有 1 列。
type(survey.target)
pandas.core.series.Series
type(survey.target.values)
numpy.ndarray
如果我們只使用survey.target,您可以看到回歸將起作用:
y = survey.target
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
train_dataset = X_train.copy()
train_dataset.insert(0, "WAGE", y_train)
sns.pairplot(train_dataset, kind='reg', diag_kind='kde')
如果您有另一個數據集,例如 iris,我想將花瓣寬度與其余數據集進行回歸。 您將使用方括號[]
調用 data.frame 的列:
from sklearn.datasets import load_iris
from sklearn.linear_model import LinearRegression
dat = load_iris(as_frame=True).frame
X = dat[['sepal length (cm)','sepal width (cm)','petal length (cm)']]
y = dat[['petal width (cm)']]
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
LR = LinearRegression()
LR.fit(X_train,y_train)
plt.scatter(x=y_test,y=LR.predict(X_test))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.