简体   繁体   English

Pytorch:权重未更新

[英]Pytorch: weights not updating

I'm trying to train a model, and it doesn't work because weights aren't updating when I call the following:我正在尝试训练 model,但它不起作用,因为当我调用以下命令时权重没有更新:

self.optimizer = Adam(self.PPO.parameters(), lr=0.1, eps=epsilon)
total_loss = Variable(policy_loss + 0.5*value_loss - entropy_loss.mean() * 0.01, requires_grad=True)

self.optimizer.zero_grad()
(total_loss * 10).backward()
self.optimizer.step()

When I print the weights, they're all the same (loss isn't zero, and learning rate set to 0.1), and when I compare them (even with clone() called on each param) it always returns True.当我打印权重时,它们都是相同的(损失不为零,并且学习率设置为 0.1),当我比较它们时(即使每个参数都调用了 clone()),它总是返回 True。 Total loss has a grad_fn attribute too... The optimizer is created in the constructor of my agent class. Total loss 也有 grad_fn 属性...优化器是在我的代理 class 的构造函数中创建的。

My code is based on this repository: https://github.com/andreiliphd/tennis-ppo/blob/master/agent.py我的代码基于此存储库: https://github.com/andreiliphd/tennis-ppo/blob/master/agent.py

This is my agent constructor:这是我的代理构造函数:

    def __init__(self, PPO, learning_rate, epsilon, discount_rate, entropy_coefficient, ppo_clip, gradient_clip,
                 rollout_length, tau):
        self.PPO = PPO
        self.learning_rate = learning_rate
        self.epsilon = epsilon
        self.discount_rate = discount_rate
        self.entropy_coefficient = entropy_coefficient
        self.ppo_clip = 0.2
        self.gradient_clip = 5
        self.rollout_length = rollout_length
        self.tau = tau
        self.optimizer = Adam(self.PPO.actor.parameters(), lr=0.1, eps=epsilon)
        self.device = torch.device('cpu')

This is my PPO class, which creates two networks with a forward function, and some hidden layers这是我的 PPO class,它创建了两个具有前向 function 和一些隐藏层的网络

class PPO(nn.Module):

    def __init__(self, state_shape, action_num, mlp_layers, device=torch.device('cpu')):
        super(PPO, self).__init__()
        self.state_shape = state_shape
        self.action_num = action_num
        self.mlp_layers = mlp_layers
        self.device = torch.device('cpu')

        layer_dims = [np.prod(self.state_shape)] + self.mlp_layers
        self.actor = PPO_Network(state_shape, action_num, layer_dims, True)
        self.actor = self.actor.to(device)
        self.critic = PPO_Network(state_shape, 1, layer_dims, False)
        self.critic = self.critic.to(device)
        self.to(device)

Any indications on why this is happening, and what I am overlooking are very welcome.非常欢迎任何关于为什么会发生这种情况以及我所忽略的迹象的迹象。 :) I can give more info or code if needed. :) 如果需要,我可以提供更多信息或代码。

I fixed this:) It was very stupid, it was because I was converted one of the values returned by my networks to numpy, and then converted it back to a tensor.我修复了这个:) 这很愚蠢,因为我将网络返回的值之一转换为 numpy,然后将其转换回张量。 It took me a while to realize because of some messy code.由于一些混乱的代码,我花了一段时间才意识到。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM