繁体   English   中英

如何在不破坏梯度计算的情况下记录 Pytorch 中的变量?

[英]How to Record Variables in Pytorch Without Breaking Gradient Computation?

我正在尝试实施一些类似于的策略梯度训练。 但是,我想在进行反向传播之前操纵奖励(如折扣未来总和和其他可微分操作)。

考虑定义为计算对 go 的奖励manipulate function :

def manipulate(reward_pool):
    n = len(reward_pool)
    R = np.zeros_like(reward_pool)
    for i in reversed(range(n)):
        R[i] = reward_pool[i] + (R[i+1] if i+1 < n else 0)
    return T.as_tensor(R)

我试图将奖励存储在列表中:

#pseudocode
reward_pool = [0 for i in range(batch_size)]

for k in batch_size:
  act = net(state)
  state, reward = env.step(act)
  reward_pool[k] = reward

R = manipulate(reward_pool)
R.backward()
optimizer.step()

似乎就地操作破坏了梯度计算,代码给了我一个错误: one of the variables needed for gradient computation has been modified by an inplace operation

我也尝试先初始化一个空张量,并将其存储在张量中,但就地操作仍然是问题所在——在就地操作a view of a leaf Variable that requires grad is being used in an in-place operation.

我是 PyTorch 的新手。有人知道在这种情况下记录奖励的正确方法是什么吗?

编辑:找到解决方案

只需为每次迭代初始化空池(列表),并在计算新奖励时将 append 初始化到池中,即

reward_pool = []

for k in batch_size:
  act = net(state)
  state, reward = env.step(act)
  reward_pool.append(reward)

R = manipulate(reward_pool)
R.backward()
optimizer.step()

问题是由于分配给现有的 object。只需为每次迭代初始化空池(列表),并在计算新奖励时将 append 初始化到池中,即

reward_pool = []

for k in batch_size:
  act = net(state)
  state, reward = env.step(act)
  reward_pool.append(reward)

R = manipulate(reward_pool)
R.backward()
optimizer.step()

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM