如何从 scikit-learn 玩具数据集预测数据

Question

I am studying machine learning and I am trying to analyze the scikit diabetes toy database.我正在研究机器学习，我正在尝试分析 scikit 糖尿病玩具数据库。 In this case, I want to change the default Bunch object to a pandas DataFrame object.在这种情况下，我想将默认的 Bunch object 更改为 pandas DataFrame ZA8CFDE6331BD4B62AC96F8911 I tried using the argument as_frame=True and it did actually change the object type to DataFrame.我尝试使用参数as_frame=True ，它确实将 object 类型更改为 DataFrame。

So after that, I trained the data and the problems come when I'm trying to plot it:所以在那之后，我训练了数据，当我尝试 plot 时，问题就来了：

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn import datasets, linear_model
from sklearn.model_selection import train_test_split

dataset = datasets.load_diabetes(as_frame=True)

X = dataset.data
y = dataset.target

y = y.to_frame()

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8, random_state=42)

regressor = linear_model.LinearRegression()
regressor.fit(X_train, y_train)

plt.scatter(X_train, y_train, color='blue')
plt.plot(X_train, regressor.predict(X_test), color='red')

The problem is when I am trying to plot it using matplotlib, since the as_frame=True returns (data, target) where the data is a DataFrame object and target as Series. The problem is when I am trying to plot it using matplotlib, since the as_frame=True returns (data, target) where the data is a DataFrame object and target as Series.

Traceback (most recent call last):
  File "C:/Users/Kelvin/OneDrive/Documents/analytics/diabetes-sklearn/test.py", line 19, in <module>
    plt.scatter(X_train, y_train, color='blue')
  File "C:\Users\Kelvin\OneDrive\Desktop\analytics\lib\site-packages\matplotlib\pyplot.py", line 3037, in scatter
    __ret = gca().scatter(
  File "C:\Users\Kelvin\OneDrive\Desktop\analytics\lib\site-packages\matplotlib\__init__.py", line 1352, in inner
    return func(ax, *map(sanitize_sequence, args), **kwargs)
  File "C:\Users\Kelvin\OneDrive\Desktop\analytics\lib\site-packages\matplotlib\axes\_axes.py", line 4478, in scatter
    raise ValueError("x and y must be the same size")
ValueError: x and y must be the same size

So, my question is if there are ways that I can change the whole data as DataFrame just like how we get the data using pd.read_csv() ?所以，我的问题是，是否有办法可以将整个数据更改为 DataFrame ，就像我们使用pd.read_csv()获取数据一样？

Answer 1

That is already a dataframe, you are getting error because you are plotting X_train with y_train and X_train has multiple columns.那已经是 dataframe，因为您使用 y_train 绘制 X_train 并且 X_train 有多个列，所以您会遇到错误。

but if you want your dataset in csv file you can use this code.但如果您希望 csv 文件中的数据集，您可以使用此代码。

X.to_csv('train_data.csv')

this will save that dataset into a csv file in your working directory.这会将数据集保存到工作目录中的 csv 文件中。 Now you can use pd.read_csv on train_data.csv .现在您可以在pd.read_csv上使用train_data.csv 。

如何从 scikit-learn 玩具数据集预测数据

问题描述

1 个解决方案

解决方案1
0 2021-04-28 14:24:32

如何从 scikit-learn 玩具数据集预测数据

问题描述

1 个解决方案

解决方案1 0 2021-04-28 14:24:32

解决方案1
0 2021-04-28 14:24:32