SGD的多项式logistic softmax回归

Question

我正在尝试从头开始建立可以对MNIST图像（手写数字）进行分类的模型。 该模型需要输出一个概率列表，该概率列表表示输入图像是某个特定数字的可能性。

这是我到目前为止的代码：

from sklearn.datasets import load_digits
import numpy as np


def softmax(x):
    return np.exp(x) / np.sum(np.exp(x), axis=0)


digits = load_digits()

features = digits.data
targets = digits.target

train_count = int(0.8 * len(features))
train_x = features[: train_count]
train_y = targets[: train_count]

test_x = features[train_count:]
test_y = targets[train_count:]

bias = np.random.rand()
weights = np.random.rand(len(features[0]))
rate = 0.02

for i in range(1000):
    for i, sample in enumerate(train_x):

        prod = np.dot(sample, weights) - bias
        soft = softmax(prod)
        predicted = np.argmax(soft) + 1

        error = predicted - train_y[i]
        weights -= error * rate * sample
        bias -= rate * error
        # print(error)

我正在尝试构建模型，以便它使用随机梯度下降，但是对于传递给softmax函数的内容我有些困惑。 我理解它应该期望有一个数字向量，但是我习惯（在构建一个小的NN时）是该模型应该产生一个数字，然后将其传递给激活函数，从而产生预测。 在这里，我感觉好像错过了一步，我不知道这是什么。

Answer 1

在最简单的实现中，您的最后一层（恰好在softmax之前）应该确实输出10维矢量，该值将被softmax压缩为[0, 1] 。 这意味着weights应为形状[features, 10]的矩阵，而bias应为[10]向量。

除此之外，您应该对 train_y标签进行一次热编码， train_y每个项目转换为[0, 0, ..., 1, ..., 0]向量。 因此train_y的形状为[size, 10] 。

看一下Logistic回归示例 -它位于tensorflow中，但是该模型可能与您的模型相似：它们使用768个特征（所有像素），针对标签的一键编码和单个隐藏层。 他们还使用迷你批处理来加快学习速度。

SGD的多项式logistic softmax回归

问题描述

1 个解决方案

解决方案1
1 2017-10-24 16:09:12

SGD的多项式logistic softmax回归

问题描述

1 个解决方案

解决方案1 1 2017-10-24 16:09:12

解决方案1
1 2017-10-24 16:09:12