How does Deep Reinforcement Learning remove the need to map or explore every state, action pair for an agent?

Question

I am interested in using Deep Reinforcement Learning to teach an AI how to play a game, where the AI knows the model of the game at the start (so I would use model-based deep reinforcement learning?)

But, the number of possible states and actions combinations that can be taken is very large, and I can't map every pair out. I heard that Deep Reinforcement Learning is a solution for this very large states space, but I'm not sure how exactly the Neural Net can be trained which action to take at any (future) state, if it hasn't experienced each possible state yet.

Could anyone please provide clarification on this subject?

Answer 1

OK, a couple things here.

First, connect is designed to wait for a bit before timing out in case the server is busy. You can adjust the timeout length, although I can't remember exactly how to do that off the top of my head.

Second, your code will find a server, but how do you know it's the server you are looking for? It could be some other app that is just listening on the same port. Unless you are just doing a scan for any server in particular, you'll need to do some verification to be sure of who you are talking to on the other end.

Finally, assuming you are writing both the client and the server, a better solution would be to have the client send out a broadcast/multicast message and have the server (or servers if there are more than one) listen for and respond to that message. The client then just waits some specified period of time for responses to figure out where the server(s) are.

Answer 2

我将进行一项研究，以确定winsock是否支持异步I / O。

Answer 3

Is the server IP address so random that you need to do this each time? I have not done any socket programming in a long time, but with timeouts and such this might not get much better.

Other options:

How about a configuration file on a network share with the IP address? This could be rewritten whenever the server starts up.
make the server's IP address static and hard-code or put in a configuration file
Look-up via DNS or NETBIOS name of the machine

Answer 4

Most forms of approximation in machine learning also lead to generalisation - the ability to give better-than-guesswork estimates for a target variable when presented with a previously unseen example.

Outside of RL, using a training dataset with a neural network or other function approximator, achieving this generalisation is the most common goal when training. This is the reason for cross-validation and test datasets, in order to measure how well the model has learned to generalise.

Deep RL, when exploring a very large state/action space, relies on this generalisation effect in order to learn effectively.

It can still be hard for an approximator to generalise well in board games where a very small difference in state can lead to radically different results. Hence self-playing learning systems like AlphaZero use complex architectures and significant compute resources to gain large amounts of experience (millions of games) in a small amount of time. This still falls far short of brute-forcing all possible states (by many orders of magnitude), so does still heavily rely on generalisation.

Answer 5

If you know the server is on the Subnet, why not send a broadcast message with the local reception port number as the message data? Then the server can simply listen for this message and connect back to that port, or send it's own config data back to that port so the client can connect directly. In this way, you only need to send one message out instead of looping over 256 IP addresses.

Answer 6

I've done this in the past with great success back in the "everybody has port 139" open days.

I found that using multiple threads (Sadly, I used about 500, but it was a one time shot and just for fun) and I pinged the server before I attempted a connection allowed me to traverse through several thourand IP's per second.

I still have the source code (C++) if you would like to check it out just leave me a message.

Also, why on earth would it ever be necessary to scan IPs? Even if its dynamic, you should be able to look ip up by its host name. See gethostbyname() or getaddrinfo().

Answer 7

I see you are using windows. But if you are using Linux you can create a connection function which has a timeout by combining non-blocking sockets and select:

int connect_with_timeout(int sock, struct sockaddr *addr, int size_addr, int timeout) {
#if defined(Linux)
    int             error = 0;
    fd_set          rset;
    fd_set          wset;
    int             n;

    // set the socket as nonblocking IO
    int flags = fcntl (sock, F_GETFL, 0);
    fcntl(sock, F_SETFL, flags | O_NONBLOCK);

    errno = 0;

    // we connect, but it will return soon
    n = connect(sock, addr, size_addr);

    if(n < 0) { 
        if (errno != EINPROGRESS) {
            return -1;
        }
    } else if (n == 0) {
        goto done;
    }

    FD_ZERO(&rset);
    FD_ZERO(&wset);
    FD_SET(sock, &rset);
    FD_SET(sock, &wset);

    struct timeval tval;
    tval.tv_sec = timeout;
    tval.tv_usec = 0;

    // We "select()" until connect() returns its result or timeout
    n = select(sock + 1, &rset, &wset, 0, timeout ? &tval : 0);
    if(n == 0) {    
        errno = ETIMEDOUT;
        return -1;
    }

    if (FD_ISSET(sock, &rset) || FD_ISSET(sock, &wset)) {
        socklen_t len = sizeof(error);
        if (getsockopt(sock, SOL_SOCKET, SO_ERROR, &error, &len) < 0) {
            return -1;
        }
    } else {
        return -1;
    }

done:
    // We change the socket options back to blocking IO
    if (fcntl(sock, F_SETFL, flags) == -1) {
        return -1;
    }
    return 0;
#else
    return connect(sock, addr, size_addr);
#endif
}

How does Deep Reinforcement Learning remove the need to map or explore every state, action pair for an agent?

Question

1 answers

solution1
3 2009-02-17 20:48:21

solution2
1 2009-02-17 20:47:41

solution3
1 2009-02-17 20:49:39

solution4
0 2021-12-29 11:24:15

solution5
0 2009-02-17 21:00:18

solution6
0 2009-02-17 21:20:17

solution7
0 2009-02-17 21:49:40

How does Deep Reinforcement Learning remove the need to map or explore every state, action pair for an agent?

Question

1 answers

solution1 3 2009-02-17 20:48:21

solution2 1 2009-02-17 20:47:41

solution3 1 2009-02-17 20:49:39

solution4 0 2021-12-29 11:24:15

solution5 0 2009-02-17 21:00:18

solution6 0 2009-02-17 21:20:17

solution7 0 2009-02-17 21:49:40

solution1
3 2009-02-17 20:48:21

solution2
1 2009-02-17 20:47:41

solution3
1 2009-02-17 20:49:39

solution4
0 2021-12-29 11:24:15

solution5
0 2009-02-17 21:00:18

solution6
0 2009-02-17 21:20:17

solution7
0 2009-02-17 21:49:40