简体   繁体   English

使用深度学习恢复多参数

[英]Multiple parameters recovery using Deep Learning

As a simplified version of my actual research problem, let's say I have a second-order polynomial function y = a x^2 + b x + c and I want to use a deep neural network to predict the parameters a, b and c given the variable x and the value of the function y. As a simplified version of my actual research problem, let's say I have a second-order polynomial function y = a x^2 + b x + c and I want to use a deep neural network to predict the parameters a, b and c given变量 x 和 function y 的值。 The variable x and the parameters a,b,c are exctracted from a uniform distribution in the range [0,1].变量 x 和参数 a,b,c 是从 [0,1] 范围内的均匀分布中提取的。

When I try to train the network using different architectures, cost functions and hyperparameters combinations among the most used, I always got the same issue: the train and test losses rapidly converge to a value significantly higher than 0, then starts to fluctuate in a strange way and the predictions are not accurate (see figures as a general example, the predictions for b are similar, c is slightly better but still not satisfactory).当我尝试使用最常用的不同架构、成本函数和超参数组合来训练网络时,我总是遇到同样的问题:训练和测试损失迅速收敛到显着高于 0 的值,然后开始出现奇怪的波动方式和预测不准确(参见数字作为一般示例,b 的预测相似,c 略好但仍不令人满意)。 This happens even if I set higher momentum or lower learning rates.即使我设置了更高的动量或更低的学习率,也会发生这种情况。 Also, I got the same issue if I try to recover one parameter at a time.此外,如果我尝试一次恢复一个参数,我也会遇到同样的问题。

在此处输入图像描述 在此处输入图像描述

As an example, here is the PyTorch code I used for my first test (4 layers, first 3 followed by ReLU, MSELoss, RMSprop optimizer with learning rate = 0.001 and momentum 0.9).例如,这是我在第一次测试中使用的 PyTorch 代码(4 层,前 3 层,然后是 ReLU、MSELoss、RMSprop 优化器,学习率 = 0.001,动量为 0.9)。

class PRNet(nn.Module):
    def __init__(self, input_size, output_size):
        super(PRNet, self).__init__()
        self.input_size = input_size
        self.fc1   = nn.Linear(self.input_size, 32)
        self.relu1 = nn.ReLU()
        self.fc2   = nn.Linear(32, 64)
        self.relu2 = nn.ReLU()
        self.fc3   = nn.Linear(64, 64)
        self.relu3 = nn.ReLU()
        self.fc4   = nn.Linear(64, output_size)

    def forward(self, x):
        output = self.fc1(x)
        output = self.relu1(output)
        output = self.fc2(output)
        output = self.relu2(output)
        output = self.fc3(output)
        output = self.relu3(output)
        output = self.fc4(output)
        return output

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

var_x    = np.random.rand(100000)
pars_abc = np.random.rand(3, 100000)
func_y   = pars[0]*var**2 + pars[1] * var + pars[2]

data = np.vstack((var_x, func_y)).T
parameters = pars_abc.T

X = torch.Tensor(data).to(device).float()
y = torch.Tensor(parameters).to(device).float()

train_size       = int(0.8 * len(data))
batch_size       = 100
train_dataset    = TensorDataset(X[:train_size], y[:train_size])
train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=False)

prnet = PRNet(X.shape[1], 3).to(device)

loss_function = nn.MSELoss()
optimizer = torch.optim.RMSprop(prnet.parameters(), lr=1e-4, momentum=0.9)

num_epochs = 25

for epoch in range(0, num_epochs):
    print(f'Starting epoch {epoch+1}')
    current_loss = 0.0
    
    for i, batch in enumerate(train_dataloader, 0):
        inputs, targets = batch

        optimizer.zero_grad()

        outputs = prnet(inputs)
        test_outputs = prnet(X[train_size:].to(device))
       
        train_loss = loss_function(outputs, targets)
        test_loss  = loss_function(test_outputs, y[train_size:])

        train_loss_plot[epoch,i] = train_loss.item()
        test_loss_plot[epoch,i]  = test_loss.item()

        train_loss.backward()
        optimizer.step()

What could be the cause of this issue?这个问题的原因可能是什么? Are the features not representative enough?特征不够有代表性吗? Do I need a custom loss more suitable for this problem?我是否需要更适合此问题的自定义损失?

During training, when a model's loss starts fluctuating, the most probable cause for such a pattern to show up is that the learning rate is high for the weights to get to the required value.在训练期间,当模型的损失开始波动时,出现这种模式的最可能原因是权重达到所需值的学习率很高。

Consider this example.考虑这个例子。 Suppose in your model, a parameter (weight), initialized with a value of 0.1, needs to get to a value of 0.00423 and the learning rate is set to 0.001.假设在您的 model 中,初始化为 0.1 的参数(权重)需要达到 0.00423 的值,并且学习率设置为 0.001。

Now, let's assume that the parameter has reached a value of 0.004 after a few epochs of training.现在,让我们假设在经过几个 epoch 的训练后,该参数已经达到了 0.004 的值。 Gradient descent will try to increase the value in order to make it equal to the target value but since the learning rate is only upto 3 decimal digits, the parameter value will now become 0.005.梯度下降将尝试增加值以使其等于目标值,但由于学习率最多只有 3 位小数,参数值现在将变为 0.005。 Since the value has now increased, gradient descent will try to decrease the value which will change the parameter value back to 0.004 and thus starting a fluctuation pattern .由于该值现在已经增加,梯度下降将尝试减小该值,这会将参数值更改回 0.004,从而开始波动模式

To solve this issue, using a small learning rate will not help.要解决这个问题,使用小的学习率将无济于事。 Because if you use a small learning rate then the model will learn too slowly and might not converge at all.因为如果您使用较小的学习率,那么 model 将学习太慢并且可能根本不会收敛。 What you are probably looking for is a way to use a variable learning rate policy in your training.您可能正在寻找一种在训练中使用可变学习率策略的方法。 With such a policy, you can begin with a large learning rate initially so that the model learns faster.有了这样的策略,你可以从一个大的学习率开始,这样 model 学习得更快。 And later on, when the model parameters get close to the target values, the learning rate should decrease automatically in order to make the parameters reach as close as possible to the target.之后,当 model 参数接近目标值时,学习率应该自动降低,以使参数尽可能接近目标。 These policies are called learning rate schedulers .这些策略称为学习率调度器。

There are several functions in PyTorch that let you use a learning rate scheduler of your choice. PyTorch 中有几个函数可让您使用您选择的学习率调度程序。 You can look for them in their documentation.您可以在他们的文档中查找它们。

I'll suggest you to go for the Reduce LR on Plateau scheduler .我会建议你 go 用于Plateau 调度程序的 Reduce LR It will let you set a threshold and a factor.它会让你设置一个阈值和一个因素。 Whenever your model loss does not improve over the specified threshold of number of epochs, it will decrease the learning rate by the factor.每当您的 model 损失没有改善超过指定的时期数阈值时,它就会降低学习率。

https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.ReduceLROnPlateau.html#torch.optim.lr_scheduler.ReduceLROnPlateau https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.ReduceLROnPlateau.html#torch.optim.lr_scheduler.ReduceLROnPlateau

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM