简体   繁体   中英

Non-Blocking NIO SocketChannel causes IOException timeout on read?

I'm still learning NIO and came up with a simple Ping/Pong application. My client is written in NIO. The server is the telnet-endpoint of Memcached. The client is constantly issuing stats\\r\\n and the server responds with some statistics.

This setup is working in a non-blocking way with SocketChannel and Selector.select()

The part that I do not fully understand is the error case where the connection between client and server get in limbo state or the server went away without closing the connection (During testing I simulate that with a iptables DROP rule):

I end up (on OSX and Linux) with the following stacktrace on the read operation after a successful select() operation:

java.io.IOException: Operation timed out
  at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.8.0_73]
  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[na:1.8.0_73]
  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[na:1.8.0_73]
  at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[na:1.8.0_73]
  at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[na:1.8.0_73]

The code in use looks something like this:

    Selector selector = Selector.open();
    SocketChannel memcachedClient = SocketChannel.open(MEMCACHED_HOST);
    memcachedClient.configureBlocking(false);

    int ops = memcachedClient.validOps();
    memcachedClient.register(selector, ops, null);

    while (true) {
        LOGGER.info("Masterloop - Start");
        //selector.select();
        selector.select(1 * 1000);
        LOGGER.debug("Masterloop - After select");

        Set<SelectionKey> selectedKeys = selector.selectedKeys();
        Iterator<SelectionKey> selectedKeysIterator = selectedKeys.iterator();
        LOGGER.debug("Found Selectors {}", selectedKeys.size());

        while (selectedKeysIterator.hasNext()) {
            SelectionKey key = selectedKeysIterator.next();

            LOGGER.info("Key: w={} r={} v={} - {}",
                    new Object[] { key.isWritable(), key.isReadable(), key.isValid(), key.toString() });

            if (key.isReadable()) {
                read(key);
            } else if (key.isWritable()) {
                write(key);
            }

            selectedKeysIterator.remove();
        }
    }

What is strange to me are the two observed behaviors depending on the usage of selector.select() and selector.select(timeout) :

Selector.select() blocks and fails

Without a timeout, the Selector.select() call blocks for ~50 seconds after I let all packages DROP (or unplug the network cable) and comes back with a single key:

2016-07-12 11:51:55 INFO  App:88 - Masterloop - Start
2016-07-12 11:52:49 DEBUG App:92 - Masterloop - After select
2016-07-12 11:52:49 DEBUG App:96 - Found Selectors 1
2016-07-12 11:52:49 INFO  App:101 - Key: w=false r=true v=true - sun.nio.ch.SelectionKeyImpl@2e817b38

and I end up with the stacktrace above in my read() method:

private static void read(SelectionKey key) throws IOException {
    SocketChannel socketChannel = (SocketChannel) key.channel();

    ByteBuffer readBuffer = ByteBuffer.allocate(8192);

    int numRead;
    try {
        numRead = socketChannel.read(readBuffer); // Exception is thrown here
    } catch (IOException e) {
        LOGGER.warn(e.getMessage(), e);
        key.cancel();
        socketChannel.close();
        return;
    }
// do something with readBuffer
key.interestOps(SelectionKey.OP_WRITE);
}

Selector.select(timeout) returns but throws an IOException later, too

With an timeout (ie 1000 ms), the Selector.select(1000) call returns after I let all packages DROP (or unplug the network cable) and with zero keys. That happens constantly as my while -loop continues to run. Again after ~45 seconds the then-active loop iteration returns a single, readable key that ends again with an IOException on read()

2016-07-12 11:54:24 INFO  App:113 - ================
2016-07-12 11:54:24 INFO  App:88 - Masterloop - Start
2016-07-12 11:54:25 DEBUG App:92 - Masterloop - After select
2016-07-12 11:54:25 DEBUG App:96 - Found Selectors 0
2016-07-12 11:54:25 INFO  App:113 - ================
2016-07-12 11:54:25 INFO  App:88 - Masterloop - Start
2016-07-12 11:54:26 DEBUG App:92 - Masterloop - After select
2016-07-12 11:54:26 DEBUG App:96 - Found Selectors 0
2016-07-12 11:54:26 INFO  App:113 - ================
...
2016-07-12 11:55:16 INFO  App:113 - ================
2016-07-12 11:55:16 INFO  App:88 - Masterloop - Start
2016-07-12 11:55:17 DEBUG App:92 - Masterloop - After select
2016-07-12 11:55:17 DEBUG App:96 - Found Selectors 0
2016-07-12 11:55:17 INFO  App:113 - ================
2016-07-12 11:55:17 INFO  App:88 - Masterloop - Start
2016-07-12 11:55:18 DEBUG App:92 - Masterloop - After select
2016-07-12 11:55:18 DEBUG App:96 - Found Selectors 1
2016-07-12 11:55:18 INFO  App:101 - Key: w=false r=true v=true - sun.nio.ch.SelectionKeyImpl@2e817b38
2016-07-12 11:55:18 WARN  App:30 - Operation timed out
java.io.IOException: Operation timed out
  at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.8.0_73]
  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[na:1.8.0_73]
  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[na:1.8.0_73]
  at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[na:1.8.0_73]
  at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[na:1.8.0_73]

Question

I somewhat understand the behavior in general: It is impossible to judge if the server went away or the connection is still active without actively probing it.

Looking at Wireshark I see a bunch of resubmission (which makes sense as the server never ACKed my client's request) and after the 8th resubmission the IOException happens. But how can I influence this timeout? Is that only on the TCP level? If so which settings (kernel?) are these? I played with some settings but they seem to work only together with KEEP_ALIVE . Can't I get a feedback from NIO earlier (say 10 seconds) that something went wrong?

Thanks!

  1. The operation that has timed out is a prior send. This is delivered to the read() immediately, not after a timeout.
  2. You aren't distinguishing between select success (returns > 0) and select timeout (returns zero).
  3. When you get (1), the select will have returned > 0 with the affected channel marked as readable: however what is there to be read is not data but the error. (It can also be end of stream, if the peer has disconnected.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM