简体   繁体   English

深度强化学习如何消除为代理绘制或探索每个状态、动作对的需要?

[英]How does Deep Reinforcement Learning remove the need to map or explore every state, action pair for an agent?

I am interested in using Deep Reinforcement Learning to teach an AI how to play a game, where the AI knows the model of the game at the start (so I would use model-based deep reinforcement learning?)我有兴趣使用深度强化学习来教 AI 如何玩游戏,其中 AI 在开始时就知道游戏的模型(所以我会使用基于模型的深度强化学习?)

But, the number of possible states and actions combinations that can be taken is very large, and I can't map every pair out.但是,可以采取的可能状态和动作组合的数量非常大,我无法绘制出每一对。 I heard that Deep Reinforcement Learning is a solution for this very large states space, but I'm not sure how exactly the Neural Net can be trained which action to take at any (future) state, if it hasn't experienced each possible state yet.我听说深度强化学习是这个非常大的状态空间的解决方案,但我不确定如何训练神经网络在任何(未来)状态下采取哪些行动,如果它没有经历过每种可能的状态然而。

Could anyone please provide clarification on this subject?任何人都可以就这个问题提供澄清吗?

OK, a couple things here. 好,这里有几件事。

First, connect is designed to wait for a bit before timing out in case the server is busy. 首先,在服务器繁忙的情况下,connect设计为在超时之前等待一会儿。 You can adjust the timeout length, although I can't remember exactly how to do that off the top of my head. 您可以调整超时时间,尽管我不记得确切该怎么做。

Second, your code will find a server, but how do you know it's the server you are looking for? 其次,您的代码将找到一个服务器,但是您如何知道它在寻找的服务器呢? It could be some other app that is just listening on the same port. 可能是其他一些仅在同一端口上监听的应用程序。 Unless you are just doing a scan for any server in particular, you'll need to do some verification to be sure of who you are talking to on the other end. 除非您仅对任何服务器进行扫描,否则都需要进行一些验证以确保另一端正在与谁交谈。

Finally, assuming you are writing both the client and the server, a better solution would be to have the client send out a broadcast/multicast message and have the server (or servers if there are more than one) listen for and respond to that message. 最后,假设您同时编写客户端和服务器,则更好的解决方案是让客户端发送广播/多播消息,并使服务器(或服务器(如果有多个))侦听并响应该消息。 。 The client then just waits some specified period of time for responses to figure out where the server(s) are. 然后,客户端只需等待指定的时间段,以便做出响应以弄清服务器的位置。

我将进行一项研究,以确定winsock是否支持异步I / O。

Is the server IP address so random that you need to do this each time? 服务器IP地址是否是如此随机,以至于您每次都需要这样做? I have not done any socket programming in a long time, but with timeouts and such this might not get much better. 我已经很长时间没有做任何套接字编程了,但是超时了,这样可能不会变得更好。

Other options: 其他选项:

  • How about a configuration file on a network share with the IP address? IP地址在网络共享上的配置文件如何? This could be rewritten whenever the server starts up. 每当服务器启动时,都可以将其重写。
  • make the server's IP address static and hard-code or put in a configuration file 将服务器的IP地址设置为静态和硬编码,或放入配置文件中
  • Look-up via DNS or NETBIOS name of the machine 通过DNS或计算机的NETBIOS名称进行查找

Most forms of approximation in machine learning also lead to generalisation - the ability to give better-than-guesswork estimates for a target variable when presented with a previously unseen example.机器学习中的大多数近似形式也会导致泛化——当出现一个以前没有见过的例子时,能够为目标变量提供优于猜测的估计。

Outside of RL, using a training dataset with a neural network or other function approximator, achieving this generalisation is the most common goal when training.在 RL 之外,使用带有神经网络或其他函数逼近器的训练数据集,实现这种泛化是训练时最常见的目标。 This is the reason for cross-validation and test datasets, in order to measure how well the model has learned to generalise.这就是交叉验证和测试数据集的原因,以衡量模型学习概括的程度。

Deep RL, when exploring a very large state/action space, relies on this generalisation effect in order to learn effectively.深度强化学习在探索非常大的状态/动作空间时,依赖这种泛化效应来有效学习。

It can still be hard for an approximator to generalise well in board games where a very small difference in state can lead to radically different results.近似器仍然很难在棋盘游戏中很好地概括,在这种情况下,状态的微小差异可能会导致完全不同的结果。 Hence self-playing learning systems like AlphaZero use complex architectures and significant compute resources to gain large amounts of experience (millions of games) in a small amount of time.因此,像AlphaZero这样的自学学习系统使用复杂的架构和大量的计算资源在短时间内获得大量的经验(数百万个游戏)。 This still falls far short of brute-forcing all possible states (by many orders of magnitude), so does still heavily rely on generalisation.这仍然远远不能强制所有可能的状态(许多数量级),因此仍然严重依赖泛化。

If you know the server is on the Subnet, why not send a broadcast message with the local reception port number as the message data? 如果您知道服务器在子网上,为什么不发送带有本地接收端口号作为消息数据的广播消息? Then the server can simply listen for this message and connect back to that port, or send it's own config data back to that port so the client can connect directly. 然后,服务器可以简单地侦听此消息并连接回该端口,或者将其自己的配置数据发送回该端口,以便客户端可以直接连接。 In this way, you only need to send one message out instead of looping over 256 IP addresses. 这样,您只需要发送一条消息即可,而无需遍历256个IP地址。

I've done this in the past with great success back in the "everybody has port 139" open days. 过去,在“每个人都有139端口”的开放日里,我都取得了巨大的成功。

I found that using multiple threads (Sadly, I used about 500, but it was a one time shot and just for fun) and I pinged the server before I attempted a connection allowed me to traverse through several thourand IP's per second. 我发现使用多个线程(可悲的是,我使用了大约500个线程,但这只是一次射击,只是为了好玩),在尝试连接之前,我对服务器进行了ping操作,使我每秒可以穿越数小时和IP。

I still have the source code (C++) if you would like to check it out just leave me a message. 我仍然有源代码(C ++),如果您想检查一下,请给我留言。

Also, why on earth would it ever be necessary to scan IPs? 另外,为什么到底有必要扫描IP? Even if its dynamic, you should be able to look ip up by its host name. 即使它是动态的,您也应该能够通过其主机名来查找ip。 See gethostbyname() or getaddrinfo(). 请参阅gethostbyname()或getaddrinfo()。

I see you are using windows. 我看到您正在使用Windows。 But if you are using Linux you can create a connection function which has a timeout by combining non-blocking sockets and select: 但是,如果您使用的是Linux,则可以通过组合非阻塞套接字来创建具有超时功能的连接函数,然后选择:

int connect_with_timeout(int sock, struct sockaddr *addr, int size_addr, int timeout) {
#if defined(Linux)
    int             error = 0;
    fd_set          rset;
    fd_set          wset;
    int             n;

    // set the socket as nonblocking IO
    int flags = fcntl (sock, F_GETFL, 0);
    fcntl(sock, F_SETFL, flags | O_NONBLOCK);

    errno = 0;

    // we connect, but it will return soon
    n = connect(sock, addr, size_addr);

    if(n < 0) { 
        if (errno != EINPROGRESS) {
            return -1;
        }
    } else if (n == 0) {
        goto done;
    }

    FD_ZERO(&rset);
    FD_ZERO(&wset);
    FD_SET(sock, &rset);
    FD_SET(sock, &wset);

    struct timeval tval;
    tval.tv_sec = timeout;
    tval.tv_usec = 0;

    // We "select()" until connect() returns its result or timeout
    n = select(sock + 1, &rset, &wset, 0, timeout ? &tval : 0);
    if(n == 0) {    
        errno = ETIMEDOUT;
        return -1;
    }

    if (FD_ISSET(sock, &rset) || FD_ISSET(sock, &wset)) {
        socklen_t len = sizeof(error);
        if (getsockopt(sock, SOL_SOCKET, SO_ERROR, &error, &len) < 0) {
            return -1;
        }
    } else {
        return -1;
    }

done:
    // We change the socket options back to blocking IO
    if (fcntl(sock, F_SETFL, flags) == -1) {
        return -1;
    }
    return 0;
#else
    return connect(sock, addr, size_addr);
#endif
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM