簡體   English   中英

如何適當優化演員與評論家之間的共享網絡?

[英]How to properly optimize shared network between actor and critic?

我正在建立一個演員評論強化學習算法來解決環境。 我想使用一個編碼器來查找我的環境。

當我與演員和評論家共享編碼器時,我的網絡無法學習任何東西:

class Encoder(nn.Module):
  def __init__(self, state_dim):
    super(Encoder, self).__init__()

    self.l1 = nn.Linear(state_dim, 512)

  def forward(self, state):
    a = F.relu(self.l1(state))
    return a

class Actor(nn.Module):
  def __init__(self, state_dim, action_dim, max_action):
    super(Actor, self).__init__()

    self.l1 = nn.Linear(state_dim, 128)
    self.l3 = nn.Linear(128, action_dim)

    self.max_action = max_action

  def forward(self, state):
    a = F.relu(self.l1(state))
    # a = F.relu(self.l2(a))
    a = torch.tanh(self.l3(a)) * self.max_action
    return a

class Critic(nn.Module):
  def __init__(self, state_dim, action_dim):
    super(Critic, self).__init__()

    self.l1 = nn.Linear(state_dim + action_dim, 128)
    self.l3 = nn.Linear(128, 1)

  def forward(self, state, action):
    state_action = torch.cat([state, action], 1)

    q = F.relu(self.l1(state_action))
    # q = F.relu(self.l2(q))
    q = self.l3(q)
    return q

但是,當我對演員使用不同的編碼器而對評論家使用不同的編碼器時,它可以正常學習。

class Actor(nn.Module):
def __init__(self, state_dim, action_dim, max_action):
    super(Actor, self).__init__()

    self.l1 = nn.Linear(state_dim, 400)
    self.l2 = nn.Linear(400, 300)
    self.l3 = nn.Linear(300, action_dim)

    self.max_action = max_action

def forward(self, state):
    a = F.relu(self.l1(state))
    a = F.relu(self.l2(a))
    a = torch.tanh(self.l3(a)) * self.max_action
    return a

class Critic(nn.Module):
  def __init__(self, state_dim, action_dim):
    super(Critic, self).__init__()

    self.l1 = nn.Linear(state_dim + action_dim, 400)
    self.l2 = nn.Linear(400, 300)
    self.l3 = nn.Linear(300, 1)

  def forward(self, state, action):
    state_action = torch.cat([state, action], 1)

    q = F.relu(self.l1(state_action))
    q = F.relu(self.l2(q))
    q = self.l3(q)
    return q

我很確定它的原因是優化程序。 在共享編碼器代碼中,我將其定義為foolow:

self.actor_optimizer = optim.Adam(list(self.actor.parameters())+
                                      list(self.encoder.parameters()))
self.critic_optimizer = optim.Adam(list(self.critic.parameters()))
                                         +list(self.encoder.parameters()))

在單獨的編碼器中,它只是:

self.actor_optimizer = optim.Adam((self.actor.parameters()))
self.critic_optimizer = optim.Adam((self.critic.parameters()))

必須使用actor評論家算法來優化兩個優化器。

如何結合兩個優化器來正確優化編碼器?

我不確定您共享編碼器的精確程度。

但是,我建議您創建編碼器的實例,並將其傳遞給演員和評論家

encoder_net = Encoder(state_dim)
actor = Actor(encoder_net, state_dim, action_dim, max_action)
critic = Critic(encoder_net, state_dim)

在正向傳遞過程中,首先將狀態批傳遞通過編碼器,然后再傳遞到網絡的其余部分,例如:

class Encoder(nn.Module):
    def __init__(self, state_dim):
        super(Encoder, self).__init__()

        self.l1 = nn.Linear(state_dim, 512)

    def forward(self, state):
        a = F.relu(self.l1(state))
        return a

class Actor(nn.Module):
    def __init__(self, encoder, state_dim, action_dim, max_action):
        super(Actor, self).__init__()
        self.encoder = encoder

        self.l1 = nn.Linear(512, 128)
        self.l3 = nn.Linear(128, action_dim)

        self.max_action = max_action

    def forward(self, state):
        state = self.encoder(state)
        a = F.relu(self.l1(state))
        # a = F.relu(self.l2(a))
        a = torch.tanh(self.l3(a)) * self.max_action
        return a

class Critic(nn.Module):
    def __init__(self, encoder, state_dim):
        super(Critic, self).__init__()
        self.encoder = encoder

        self.l1 = nn.Linear(512, 128)
        self.l3 = nn.Linear(128, 1)

    def forward(self, state):
        state = self.encoder(state)

        q = F.relu(self.l1(state))
        # q = F.relu(self.l2(q))
        q = self.l3(q)
        return q

注意:評論器網絡現在是狀態值函數V(s)的函數近似器,而不是狀態作用值函數Q(s,a)的函數近似器。

通過此實現,您可以執行優化而無需將編碼器參數傳遞給優化器,如下所示:

self.actor_optimizer = optim.Adam((self.actor.parameters()))
self.critic_optimizer = optim.Adam((self.critic.parameters()))

因為現在兩個網絡之間共享編碼器參數。

希望這可以幫助! :)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM