简体   繁体   English

没有隐藏层的神经网络不等于逻辑回归

[英]No Hidden Layer Neural Network Doesn't Equal Logistic Regression

In theory, a no-hidden layer neural network should be the same as a logistic regression, however, we collect wildly varied results.理论上,无隐藏层神经网络应该与逻辑回归相同,但是,我们收集的结果差异很大。 What makes this even more bewildering is that the test case is incredibly basic, yet the neural network fails to learn.更令人困惑的是,测试用例非常基础,但神经网络却无法学习。

sklearn logistic regression sklearn 逻辑回归

tensorflow no-hidden-layer neural network tensorflow 无隐藏层神经网络

We have attempted to choose the parameters of both models to be as similar as possible (same number of epochs, no L2 penalty, same loss function, no addition optimizations such as momentum, etc...).我们试图选择尽可能相似的两个模型的参数(相同的时期数,没有 L2 惩罚,相同的损失函数,没有额外的优化,如动量等......)。 The sklearn logistic regression correctly finds the decision boundary consistently, with minimal variation. sklearn 逻辑回归始终正确地找到决策边界,变化最小。 The tensorflow neural network is highly variable, where it looks like the bias is 'struggling' to train.张量流神经网络是高度可变的,看起来偏差正在“努力”训练。

The code is included below to recreate this issue.下面包含代码以重新创建此问题。 An ideal solution would have the tensorflow decision boundary very similar to the logistic regression decision boundary.理想的解决方案将具有与逻辑回归决策边界非常相似的张量流决策边界。

import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Conv1D, Dense, Flatten, Input, Concatenate, Dropout
from tensorflow.keras import Sequential, Model
from tensorflow.keras.optimizers import SGD

import matplotlib.pyplot as plt

%matplotlib inline

from sklearn.linear_model import LogisticRegression


X = np.array([[1, 1],
              [2, 2]])
y = np.array([0, 1])

model = LogisticRegression(penalty = 'none',
                           solver='sag',
                           max_iter = 300,
                           tol = 1e-100)
model.fit(X, y)

model.score(X, y)

model.coef_.flatten()[1]

model.intercept_

w_1 = model.coef_.flatten()[0]
w_2 = model.coef_.flatten()[1]
b = model.intercept_
n = np.linspace(0, 3, 10000, endpoint=False)
x_n = -w_1 / w_2 * n - b / w_2

plt.scatter(X[:, 0], X[:, 1], c = y)
plt.plot(n, x_n)
plt.gca().set_aspect('equal')
plt.show()

X = np.array([[1, 1],
              [2, 2]])
y = np.array([0, 1])

optimizer = SGD(learning_rate=0.01,
                momentum = 0.0,
                nesterov = False,
                name = 'SGD')

inputs = Input(shape = (2,), name='inputs')
outputs = Dense(1, activation = 'sigmoid', name = 'outputs')(inputs)

model = Model(inputs = inputs, outputs = outputs, name = 'model')
model.compile(loss = 'bce', optimizer = optimizer, metrics = ['AUC', 'accuracy'])
model.fit(X, y, epochs = 100, verbose=False)

print(model.evaluate(X, y))

weights, bias = model.layers[1].get_weights()
weights = weights.flatten()

w_1 = weights[0]
w_2 = weights[1]
b = bias
n = np.linspace(0, 3, 10000, endpoint=False)
x_n = -w_1 / w_2 * n - b / w_2

plt.scatter(X[:, 0], X[:, 1], c = y)
plt.plot(n, x_n)
plt.grid()
plt.gca().set_aspect('equal')

plt.show()

A simple way to determine if this is actually a bug is to let the number of epochs in your perceptron go to some arbitrary large number (say, 5000).确定这是否真的是一个错误的一个简单方法是让你的感知器中的 epoch 数量达到任意大的数字(比如 5000)。 You'll note that the decision boundary approaches that of your logistic regression model.您会注意到决策边界接近逻辑回归模型的决策边界。

The natural question is why LR needs fewer iterations to achieve a near-optimal decision boundary.自然的问题是为什么 LR 需要更少的迭代来实现接近最优的决策边界。 For strongly convex functions (like in your example), SAG enjoys much faster convergence than SGD .对于强凸函数(如您的示例), SAG 的收敛速度比 SGD 快得多 Thus, it takes SGD longer to converge to a "globally good" solution (though not many to converge to a locally good solution).因此,SGD 需要更长的时间才能收敛到“全局良好”的解决方案(尽管收敛到局部良好的解决方案并不多)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM