KMeans 和 Logistic 回归如何与 Pipeline class 中的 MNIST 数据集交互？

Question

我正在查看“使用 Scikit-Learn、Keras 和 Tensorflow 进行机器学习实践”一书。 MNIST 数据集的一种分类方法使用 KMeans 作为预处理数据集的方法，然后使用 LogsticRegression model 执行分类。

from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.cluster import KMeans

X_digits, y_digits = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X_digits, y_digits, random_state=42)

pipeline = Pipeline([
    ("kmeans", KMeans(random_state=42)),
    ("log_reg", LogisticRegression(multi_class="ovr", solver="lbfgs", max_iter=5000, random_state=42)),
])

param_grid = dict(kmeans__n_clusters=range(45, 50))
grid_clf = GridSearchCV(pipeline, param_grid, cv=3, verbose=2)
grid_clf.fit(X_train, y_train)

predict = grid_clf.predict(X_test)

grid_clf.predict(X_test) 的 output 位于原始数字（数字 0-9）中，而不是在管道中的 KMeans 步骤中创建的集群。 我的问题是， predict() 如何将其预测与数据集上的原始标签联系起来？

Answer 1

抛开网格搜索，代码

pipeline = Pipeline([
    ("kmeans", KMeans(n_clusters=45)),
    ("log_reg", LogisticRegression()),
])
pipeline.fit(X_train, y_train)

相当于：

kmeans = KMeans(n_clusters=45)
log_reg = LogisticRegression()
new_X_train = kmeans.fit_transform(X_train)
log_reg.fit(new_X_train, y_train)

因此， KMeans用于转换训练数据。 将具有 64 个特征的原始数据转换为具有 45 个特征的数据，这些数据由数据点到 45 个聚类中心的距离组成。 然后使用此转换后的数据与原始训练数据标签一起拟合LogisticRegression 。

预测的工作方式相同：首先通过KMeans转换测试数据，然后将LogisticRegression与转换后的数据一起使用来预测标签。 因此，而不是

predict = pipeline.predict(X_test)

可以使用：

predict = log_reg.predict(kmeans.transform(X_test))

KMeans 和 Logistic 回归如何与 Pipeline class 中的 MNIST 数据集交互？

问题描述

1 个解决方案

解决方案1
0 2022-02-04 00:54:25

KMeans 和 Logistic 回归如何与 Pipeline class 中的 MNIST 数据集交互？

问题描述

1 个解决方案

解决方案1 0 2022-02-04 00:54:25

解决方案1
0 2022-02-04 00:54:25