如何在主数据集中找到 X_train 索引？

Question

我们可以通过 Python 中的 Sklearn 函数将数据集拆分为 X_train、y_train。

X_train, X_test, y_train, y_test = train_test_split(X, y, shuffle=True, test_size=0.3)

我的问题是：我们如何在我们的数据集中找到 X_train 或 y_train 索引？

假设我们通过以下方式找到了预测

prediction = model.predict(X_test)

另外，我们如何找到预测的索引？

我问是因为当我得到不准确的结果时，我想查看每一行的值。

换句话说，数据是主数据集，子集是数据的子集

数据 = 数组([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
subest = 数组([ 2, 4, 5, 6])

如何在数据中找到子集的索引？

Answer 1

作为记录sklearn.model_selection.train_test_split ，它是一个快速应用sklearn.model_selection.ShuffleSplit ：

from sklearn.model_selection import ShuffleSplit, train_test_split

x_train, x_test, y_train, y_test = train_test_split(X, y, random_state=1, test_size=1)
x_train
array([[2, 3],
       [8, 9],
       [0, 1],
       [6, 7]])

这是来自ShuffleSplit的拆分索引集的收益：

train_ind, test_ind = next(ShuffleSplit(random_state=1).split(X, y))
X[train_ind]
array([[2, 3],
       [8, 9],
       [0, 1],
       [6, 7]])

所以你可以使用ShuffleSplit制作的train_ind和/或test_ind ，它和使用train_test_split

如何在主数据集中找到 X_train 索引？

问题描述

1 个解决方案

解决方案1
0 2019-12-18 02:06:14

如何在主数据集中找到 X_train 索引？

问题描述

1 个解决方案

解决方案1 0 2019-12-18 02:06:14

解决方案1
0 2019-12-18 02:06:14