简体   繁体   English

如何在 Python sklearn 中加载数据帧?

[英]How do I load a dataframe in Python sklearn?

I did some computations in an IPython Notebook and ended up with a dataframe df which isn't saved anywhere yet.我在 IPython Notebook 中进行了一些计算,最终得到了一个尚未保存在任何地方的数据帧df In the same IPython Notebook, I want to work with this dataframe using sklearn.在同一个 IPython Notebook 中,我想使用 sklearn 处理这个数据框。

df is a dataframe with 4 columns: id (string), value(int), rated(bool), score(float). df 是一个有 4 列的数据框:id(字符串)、值(整数)、评级(布尔)、分数(浮点数)。 I am trying to determine what influences the score the most just like in this example .我试图确定什么对分数影响最大,就像在这个例子中一样 There they load a standard dataset, but instead I want to use my own dataframe in the notebook.他们在那里加载了一个标准数据集,但我想在笔记本中使用我自己的数据框。

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from matplotlib import pyplot as plt

plt.rcParams.update({'figure.figsize': (12.0, 8.0)})
plt.rcParams.update({'font.size': 14})

dataset = df
X = pd.DataFrame(dataset.data, columns=dataset.feature_names)
y = dataset.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=12)

But I get the AttributeError that the 'DataFrame' object has no attribute 'data'但我得到 AttributeError 'DataFrame' object has no attribute 'data'

Ok, so some clarifications first: in your example, it is unclear what the load_boston() function does.好的,首先要澄清一下:在您的示例中,尚不清楚 load_boston() 函数的作用。 they just import it.他们只是导入它。 whatever that function returns has an attribute called "data".该函数返回的任何内容都有一个名为“数据”的属性。

They use this line:他们使用这一行:

X = pd.DataFrame(boston.data, columns=boston.feature_names)

to create a dataframe.创建一个数据框。 Your situation is different because you have a dataframe already and dataframes don't have an attribute ".data".您的情况有所不同,因为您已经有一个数据框,而数据框没有属性“.data”。 Hence, the error you're getting: "DataFrame' object has no attribute 'data'.因此,您得到的错误是:“DataFrame”对象没有属性“data”。

What you need is simply你需要的很简单

X = df
y = df['score']
# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=12)

or if you need only some of the columns from you dataframe:或者如果您只需要数据框中的某些列:

# set data
list_of_columns = ['id','value']
X = df[list_of_columns]
# set target
target_column = 'score'
y = df[target_column]
# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=12)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 Python 2 中加载 Python 3 Pickled SKlearn 模型 - How to load Python 3 Pickled SKlearn Model in Python 2 我如何使用 python sklearn 线性回归预测数据 - How do i predict data with python sklearn linear regression 如何解决 Python sklearn 随机森林中的过度拟合? - How do I solve overfitting in random forest of Python sklearn? 我如何在没有Cython的情况下在Python中运行sklearn? - How do I run sklearn in Python without having Cython? 如何使用sklearn python预测未来的数据帧? - How to forecast future dataframe using sklearn python? sklearn,线性回归 - 如何预测输入 dataframe 中的测试数据之外的未来年份的人口? - sklearn, linear regression - How do I predict population to a future year that is outside of the test data in the input dataframe? 我如何在 pyspark dataframe 上实现 python AutoML 库(如 Pycaret、auto-sklearn)等? - How can I implement python AutoML libraries (like Pycaret, auto-sklearn) etc, on pyspark dataframe? 如何在Python / Sklearn中进行适当的插补 - How to do proper imputation in Python / Sklearn 如何清空python熊猫数据框? - How do I empty a python pandas dataframe? 如何将这本字典转换为 python 中的 dataframe? - How do I convert this dictionary to a dataframe in python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM