简体   繁体   English

用于平方 (x^2) 近似的 Neural.network

[英]Neural network for square (x^2) approximation

I made a simple module that should figure out the relationship between input and output numbers, in this case, x and x squared.我做了一个简单的模块,它应该弄清楚输入和 output 数字之间的关系,在本例中是 x 和 x 的平方。 The code in Python: Python中的代码:

import numpy as np
import tensorflow as tf

# TensorFlow only log error messages.
tf.logging.set_verbosity(tf.logging.ERROR)

features = np.array([-10, -9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8,
                    9, 10], dtype = float)
labels = np.array([100, 81, 64, 49, 36, 25, 16, 9, 4, 1, 0, 1, 4, 9, 16, 25, 36, 49, 64,
                    81, 100], dtype = float)

model = tf.keras.Sequential([
    tf.keras.layers.Dense(units = 1, input_shape = [1])
])

model.compile(loss = "mean_squared_error", optimizer = tf.keras.optimizers.Adam(0.0001))
model.fit(features, labels, epochs = 50000, verbose = False)
print(model.predict([4, 11, 20]))

I tried a different number of units, and adding more layers, and even using the relu activation function, but the results were always wrong.我尝试了不同数量的单元,并添加了更多层,甚至使用了relu激活 function,但结果总是错误的。 It works with other relationships like x and 2x.它适用于其他关系,如 x 和 2x。 What is the problem here?这里有什么问题?

The problem is that x*x is a very different beast than a*x . 问题是, x*x是一个非常不同的野兽比a*x

Please note what a usual "neural network" does: it stacks y = f(W*x + b) a few times, never multiplying x with itself. 请注意通常的“神经网络”的作用:它将y = f(W*x + b)叠加几次,绝不会将x与自身相乘。 Therefore, you'll never get perfect reconstruction of x*x . 因此,你永远无法完美地重建x*x Unless you set f(x) = x*x or similar. 除非你设置f(x) = x*x或类似。

What you can get is an approximation in the range of values presented during training (and perhaps a very little bit of extrapolation). 你能得到的是训练期间所呈现的值范围的近似值(也许是一点点的推断)。 Anyway, I'd recommend you to work with a smaller range of values, it will be easier to optimize the problem. 无论如何,我建议你使用较小范围的值,更容易优化问题。

And on a philosophical note: In machine learning, I find it more useful to think of good/bad, rather than correct/wrong. 并且在哲学上注意到:在机器学习中,我发现考虑好/坏,而不是正确/错误更有用。 Especially with regression, you cannot get the result "right" unless you have the exact model. 特别是使用回归,除非您拥有确切的模型,否则无法获得“正确”的结果。 In which case there is nothing to learn. 在这种情况下,没有什么可学的。


There actually are some NN architectures multiplying f(x) with g(x) , most notably LSTMs and Highway networks . 实际上有一些NN架构将f(x)乘以g(x) ,最值得注意的是LSTM高速公路网络 But even these have one or both of f(x) , g(s) bounded (by logistic sigmoid or tanh), thus are unable to model x*x fully. 但即使这些也有f(x)g(s)有界(由logistic sigmoid或tanh)中的一个或两个,因此无法完全模拟x*x


Since there is some misunderstanding expressed in comments, let me emphasize a few points: 由于评论中存在一些误解,请允许我强调几点:

  1. You can approximate your data. 可以估算数据。
  2. To do well in any sense, you do need a hidden layer . 要想在任何意义上做得好,你需要一个隐藏层
  3. But no more data is necessary, though if you cover the space, the model will fit more closely, see desernaut's answer . 但是没有更多的数据是必要的,但如果你覆盖空间,模型将更贴合,请参阅desernaut的答案

As an example, here is a result from a model with a single hidden layer of 10 units with tanh activation, trained by SGD with learning rate 1e-3 for 15k iterations to minimize the MSE of your data. 作为一个例子,这是一个模型的结果,该模型具有10个单位的单个隐藏层,具有tanh激活,由SGD训练,学习率为1e-3,用于15k次迭代,以最小化数据的MSE。 Best of five runs: 最好的五个运行:

在OP数据上训练的简单NN的性能

Here is the full code to reproduce the result. 以下是重现结果的完整代码。 Unfortunately, I cannot install Keras/TF in my current environment, but I hope that the PyTorch code is accessible :-) 不幸的是,我无法在当前环境中安装Keras / TF,但我希望PyTorch代码可以访问:-)

#!/usr/bin/env python
import torch
import torch.nn as nn
import matplotlib.pyplot as plt

X = torch.tensor([range(-10,11)]).float().view(-1, 1)
Y = X*X

model = nn.Sequential(
    nn.Linear(1, 10),
    nn.Tanh(),
    nn.Linear(10, 1)
)

optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
loss_func = nn.MSELoss()
for _ in range(15000):
    optimizer.zero_grad()
    pred = model(X)
    loss = loss_func(pred, Y)
    loss.backward()
    optimizer.step()

x = torch.linspace(-12, 12, steps=200).view(-1, 1)
y = model(x)
f = x*x

plt.plot(x.detach().view(-1).numpy(), y.detach().view(-1).numpy(), 'r.', linestyle='None')
plt.plot(x.detach().view(-1).numpy(), f.detach().view(-1).numpy(), 'b')
plt.show()

You are making two very basic mistakes: 你犯了两个非常基本的错误:

  • Your ultra-simple model (a single-layer network with a single unit) hardly qualifies as a neural network at all, let alone a "deep learning" one (as your question is tagged) 您的超简单模型(具有单个单元的单层网络)根本不具备作为神经网络的资格,更不用说“深度学习”(因为您的问题被标记)
  • Similarly, your dataset (just 20 samples) is also ultra-small 同样,您的数据集(仅20个样本)也非常小

It is certainly understood that neural networks need to be of some complexity if they are to solve problems even as "simple" as x*x ; 当然可以理解,如果神经网络要像x*x那样“简单”地解决问题,那么神经网络需要具有一定的复杂性。 and where they really shine is when fed with large training datasets. 当他们获得大量训练数据集时,他们真正发光的地方。

The methodology when trying to solve such function approximations is not to just list the (few possible) inputs and then fed to the model, along with the desired outputs; 尝试解决此类函数近似时的方法不仅仅是列出(几个可能的)输入,然后将其与所需的输出一起馈送到模型中; remember, NNs learn through examples, and not through symbolic reasoning. 记住,NN通过例子学习,而不是通过符号推理。 And the more examples the better. 越多的例子就越好。 What we usually do in similar cases is to generate a large number of examples, which we subsequently feed to the model for training. 我们在类似情况下通常做的是生成大量示例,然后我们将这些示例提供给模型进行培训。

Having said that, here is a rather simple demonstration of a 3-layer neural network in Keras for approximating the function x*x , using as input 10,000 random numbers generated in [-50, 50] : 话虽如此,这里是Keras中3层神经网络的一个相当简单的演示,用于近似函数x*x ,使用[-50, 50]生成的10,000个随机数作为输入:

import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam
from keras import regularizers
import matplotlib.pyplot as plt

model = Sequential()
model.add(Dense(8, activation='relu', kernel_regularizer=regularizers.l2(0.001), input_shape = (1,)))
model.add(Dense(8, activation='relu', kernel_regularizer=regularizers.l2(0.001)))
model.add(Dense(1))

model.compile(optimizer=Adam(),loss='mse')

# generate 10,000 random numbers in [-50, 50], along with their squares
x = np.random.random((10000,1))*100-50
y = x**2

# fit the model, keeping 2,000 samples as validation set
hist = model.fit(x,y,validation_split=0.2,
             epochs= 15000,
             batch_size=256)

# check some predictions:
print(model.predict([4, -4, 11, 20, 8, -5]))
# result:
[[ 16.633354]
 [ 15.031291]
 [121.26833 ]
 [397.78638 ]
 [ 65.70035 ]
 [ 27.040245]]

Well, not that bad! 好吧,没那么糟糕! Remember that NNs are function approximators : we should expect them neither to exactly reproduce the functional relationship nor to "know" that the results for 4 and -4 should be identical. 请记住,NN是函数逼近器 :我们应该期望它们既不能完全重现函数关系,也不能“知道” 4-4的结果应该是相同的。

Let's generate some new random data in [-50,50] (remember, for all practical purposes, these are unseen data for the model) and plot them, along with the original ones, to get a more general picture: 让我们在[-50,50]生成一些新的随机数据(记住,出于所有实际目的,这些是模型的看不见的数据)并将它们与原始数据一起绘制,以获得更一般的图片:

plt.figure(figsize=(14,5))
plt.subplot(1,2,1)
p = np.random.random((1000,1))*100-50 # new random data in [-50, 50]
plt.plot(p,model.predict(p), '.')
plt.xlabel('x')
plt.ylabel('prediction')
plt.title('Predictions on NEW data in [-50,50]')

plt.subplot(1,2,2)
plt.xlabel('x')
plt.ylabel('y')
plt.plot(x,y,'.')
plt.title('Original data')

Result: 结果:

在此输入图像描述

Well, it arguably does look like a good approximation indeed... 好吧,它确实看起来确实是一个很好的近似......

You could also take a look at this thread for a sine approximation. 您还可以查看此线程的正弦近似值。

The last thing to keep in mind is that, although we did get a decent approximation even with our relatively simple model, what we should not expect is extrapolation , ie good performance outside [-50, 50] ; 要记住的最后一件事是,虽然我们没有得到一个体面的逼近,甚至与我们的相对简单的模型,我们应该期望是外推 ,即良好的性能外[-50, 50] ; for details, see my answer in Is deep learning bad at fitting simple non linear functions outside training scope? 有关详细信息,请参阅我的答案是否深入学习在训练范围之外拟合简单的非线性函数?

My answer is a bit different.我的回答有点不同。 For the trivial case x*x, you can just write your own activation function that takes in x and outputs x*x.对于简单的情况 x*x,您可以编写自己的激活函数 function,它接收 x 并输出 x*x。 This answers the question above, "how to build a NN that calcuates x*x?".这回答了上面的问题,“如何构建一个计算 x*x 的神经网络?”。 But this may violate the "spirit" of the question.但这可能违反问题的“精神”。

I mention this because sometimes you want to perform a non-trivial operation like我提到这个是因为有时你想执行一个不平凡的操作,比如
(x --> exp[A * x*x] * sinh[ 1/sqrt( log(k * x)) ] ).\ You could write an activation function for this, but the back propagation operation would be hellish and imp.netrable to another developer. (x --> exp[A * x*x] * sinh[ 1/sqrt( log(k * x)) ] ).\ 你可以为此编写一个激活 function,但反向传播操作将是地狱般的和 imp .netrable 给另一个开发者。

AND suppose you also want the function并且假设您还想要 function
(x --> exp[A * x*x] * cosh[ 1/sqrt( log(k * x) ) ]). (x --> exp[A * x*x] * cosh[ 1/sqrt( log(k * x) ) ])。
Writing another stand-alone activation function would just be wasteful.编写另一个独立的激活 function 只是浪费。

For this reason, you might want to build a library of activation functions with atomic operations like, z*z, exp(z), sinh(z), cosh(z), sqrt(z), log(z).出于这个原因,您可能想要构建一个包含原子操作的激活函数库,例如 z*z、exp(z)、sinh(z)、cosh(z)、sqrt(z)、log(z)。 These activation functions would be applied one at a time with the help of auxiliary.network layers consisting of passthrough (ie no-op) nodes.这些激活函数将在由直通(即无操作)节点组成的辅助网络层的帮助下一次应用一个。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM