简体   繁体   English

Java线程套接字连接超时

[英]Java threaded socket connection timeouts

I have to make simultaneous tcp socket connections every x seconds to multiple machines, in order to get something like a status update packet. 我必须每x秒将多台计算机同时进行tcp套接字连接,以获取状态更新数据包之类的信息。

I use a Callable thread class, which creates a future task that connects to each machine, sends a query packet, and receives a reply which is returned to the main thread that creates all the callable objects. 我使用一个Callable线程类,该类创建了一个将来的任务,该任务连接到每台计算机,发送查询包,并收到回复,该回复返回到创建所有可调用对象的主线程。

My socket connection class is : 我的套接字连接类是:

public class ClientConnect implements Callable<String> {
    Connection con = null;
    Statement st = null;
    ResultSet rs = null;
    String hostipp, hostnamee; 
    ClientConnect(String hostname, String hostip) {
        hostnamee=hostname;
        hostipp = hostip;
    }
    @Override
    public String call() throws Exception {
        return GetData();
    }
    private String GetData()  {
            Socket so = new Socket();
            SocketAddress sa =  null;
            PrintWriter out = null;
            BufferedReader in = null;
        try {
            sa = new InetSocketAddress(InetAddress.getByName(hostipp), 2223);
        } catch (UnknownHostException e1) {
            e1.printStackTrace();
        }
        try {
            so.connect(sa, 10000);

            out = new PrintWriter(so.getOutputStream(), true);
            out.println("\1IDC_UPDATE\1");
            in = new BufferedReader(new InputStreamReader(so.getInputStream()));
            String [] response = in.readLine().split("\1");             
            out.close();in.close();so.close(); so = null;

            try{
                Integer.parseInt(response[2]);
            } catch(NumberFormatException e) {
                System.out.println("Number format exception");
                return hostnamee + "|-1" ;
            }

            return hostnamee + "|" + response[2];
        } catch (IOException e) {
            try {
                if(out!=null)out.close();
                if(in!=null)in.close();
                so.close();so = null;
                return hostnamee + "|-1" ;
            } catch (IOException e1) {
                // TODO Auto-generated catch block
                return hostnamee + "|-1" ;
            }
        }
    }
}

And this is the way i create a pool of threads in my main class : 这就是我在主类中创建线程池的方式:

private void StartThreadPool()
{
    ExecutorService pool = Executors.newFixedThreadPool(30);
    List<Future<String>> list = new ArrayList<Future<String>>();
    for (Map.Entry<String, String> entry : pc_nameip.entrySet()) 
    {
        Callable<String> worker = new ClientConnect(entry.getKey(),entry.getValue());
        Future<String> submit = pool.submit(worker);
        list.add(submit);
    }
    for (Future<String> future : list) {
        try {
            String threadresult;
            threadresult = future.get();
            //........ PROCESS DATA HERE!..........//
        } catch (InterruptedException e) {
            e.printStackTrace();
        } catch (ExecutionException e) {
            e.printStackTrace();
        }
    }       
}

The pc_nameip map contains (hostname, hostip) values and for every entry i create a ClientConnect thread object. pc_nameip映射包含(主机名,hostip)值,并且为每个条目创建一个ClientConnect线程对象。

My problem is that when my list of machines contains lets say 10 pcs (which most of them are not alive), i get a lot of timeout exceptions (in alive pcs) even though my timeout limit is set to 10 seconds. 我的问题是,当我的计算机列表中包含10台计算机(其中大多数都不处于运行状态)时,即使我的超时限制设置为10秒,我也会收到很多超时异常(处于运行状态的计算机)。

If i force the list to contain a single working pc, I have no problem. 如果我强制列表包含单个工作PC,则没有问题。 The timeouts are pretty random, no clue what's causing them. 超时是非常随机的,不知道是什么原因造成的。

All machines are in a local network, the remote servers are written by my also (in C/C++) and been working in another setup for more than 2 years without any problems. 所有机器都在本地网络中,远程服务器也由我编写(使用C / C ++),并且在另一种设置下工作了2年以上,没有任何问题。

Am i missing something or could it be an os network restriction problem? 我是否缺少某些东西,或者这可能是操作系统网络限制的问题? I am testing this code on windows xp sp3. 我正在Windows XP SP3上测试此代码。 Thanks in advance! 提前致谢!



UPDATE: 更新:

After creating two new server machines, and keeping one that was getting a lot of timeouts, i have the following results : 创建了两台新的服务器计算机并保持其中一台服务器超时后,我得到以下结果:

For 100 thread runs over 20 minutes :

NEW_SERVER1 : 99 successful connections/ 1 timeouts
NEW_SERVER2 : 94 successful connections/ 6 timeouts
OLD_SERVER  : 57 successful connections/ 43 timeouts

Other info : - I experienced a JRE crash (EXCEPTION_ACCESS_VIOLATION (0xc0000005)) once and had to restart the application. 其他信息:-我经历了一次JRE崩溃(EXCEPTION_ACCESS_VIOLATION(0xc0000005)),不得不重启应用程序。 - I noticed that while the app was running my network connection was struggling as i was browsing the internet. -我注意到,当应用程序运行时,我在浏览互联网时网络连接很困难。 I have no idea if this is expected but i think my having at MAX 15 threads is not that much. 我不知道这是否可以预期,但我认为我在MAX 15线程上的工作量不是很多。

So, fisrt of all my old servers had some kind of problem. 因此,我所有旧服务器的服务器都出现了某种问题。 No idea what that was, since my new servers were created from the same OS image. 不知道那是什么,因为我的新服务器是从同一OS映像创建的。

Secondly, although the timeout percentage has dropped dramatically, i still think it is uncommon to get even one timeout in a small LAN like ours. 其次,尽管超时百分比已大大降低,但我仍然认为在像我们这样的小型LAN中甚至一次超时也不常见。 But this could be a server's application part problem. 但这可能是服务器的应用程序部分的问题。

Finally my point of view is that, apart from the old server's problem (i still cannot beleive i lost so much time with that!), there must be either a server app bug, or a JDK related bug (since i experienced that JRE crash). 最后,我的观点是,除了旧服务器的问题(我仍然无法相信我花了那么多时间!)之外,肯定还有服务器应用程序错误或与JDK相关的错误(因为我经历过JRE崩溃) )。

ps I use Eclipse as IDE and my JRE is the latest. ps我使用Eclipse作为IDE,而我的JRE是最新的。

If any of the above ring any bells to you, please comment. 如果以上任何一个给您敲响了钟声,请发表评论。 Thank you. 谢谢。

-----EDIT----- - - -编辑 - - -

Could it be that PrintWriter and/or BufferedReader are not actually thread safe????!!!? 难道是PrintWriter和/或BufferedReader实际上不是线程安全的??????

----NEW EDIT 09 Sep 2013---- ---- 2013年9月9日的新编辑----

After re-reading all the comments and thanks to @Gray and his comment : 重新阅读所有评论并感谢@Gray和他的评论后:

When you run multiple servers does the first couple work and the rest of them timeout? 当您运行多个服务器时,前几个服务器是否工作,其余服务器是否超时? Might be interesting to put a small sleep in your fork loop (like 10 or 100ms) to see if it works that way. 在fork循环中睡一小段时间(例如10或100ms)可能会很有趣,看看它是否可以那样工作。

I rearanged the tree list of the hosts/ip's and got some really strange results. 我重新整理了主机/ IP的树形列表,并得到了一些非常奇怪的结果。 It seems that if an alive host is placed on top of the tree list, thus being first to start a socket connection, has no problem connecting and receiving packets without any delay or timeout. 看来,如果将活动主机放置在树列表的顶部,从而首先启动套接字连接,则在没有任何延迟或超时的情况下连接和接收数据包都不会出现问题。

On the contrary, if an alive host is placed at the bottom of the list, with several dead hosts before it, it just takes too long to connect and with my previous timeout of 10 secs it failed to connect. 相反,如果将活动主机放置在列表的底部,并且之前有几台死机,则连接时间太长,而我之前的超时时间为10秒,因此连接失败。 But after changing the timeout to 60 seconds (thanks to @EJP) i realised that no timeouts are occuring! 但是在将超时更改为60秒(由于@EJP)之后,我意识到没有超时发生!

It just takes too long to connect (more than 20 seconds in some occasions). 连接时间太长(有时超过20秒)。 Something is blobking new socket connections, and it isn't that the hosts or network is to busy to respond. 正在使新的套接字连接中断,这并不是主机或网络忙于响应。

I have some debug data here, if you would like to take a look : http://pastebin.com/2m8jDwKL 如果您想看一下,我这里有一些调试数据: http : //pastebin.com/2m8jDwKL

You could simply check for availability before you connect to the socket. 您可以在连接到套接字之前简单地检查可用性。 There is an answer who provides some kind of hackish workaround https://stackoverflow.com/a/10145643/1809463 有一个答案提供了一种变通的解决方法https://stackoverflow.com/a/10145643/1809463

Process p1 = java.lang.Runtime.getRuntime().exec("ping -c 1 " + ip);
int returnVal = p1.waitFor();
boolean reachable = (returnVal==0);

by jayunit100 jayunit100

It should work on unix and windows, since ping is a common program. 由于ping是一个常见程序,因此它应该可以在UNIX和Windows上运行。

My problem is that when my list of machines contains lets say 10 pcs (which most of them are not alive), i get a lot of timeout exceptions (in alive pcs) even though my timeout limit is set to 10 seconds. 我的问题是,当我的计算机列表中包含10台计算机(其中大多数都不处于运行状态)时,即使我的超时限制设置为10秒,我也会收到很多超时异常(处于运行状态的计算机)。

So as I understand the problem, if you have (for example) 10 PCs in your map and 1 is alive and the other 9 are not online, all 10 connections time out. 因此,据我所知,如果(例如)您的地图中有10台PC,并且1台处于活动状态,而另外9台未处于联机状态,则所有10个连接都将超时。 If you just put the 1 alive PC in the map, it shows up as fine. 如果您仅将1个处于活动状态的PC放置在地图中,它将显示为正常。

This points to some sort of concurrency problem but I can't see it. 这指向某种并发问题,但我看不到。 I would have thought that there was some sort of shared data that was not being locked or something. 我本以为存在某种未锁定的共享数据或其他东西。 I see your test code is using Statement and ResultSet . 我看到您的测试代码正在使用StatementResultSet Maybe there is a database connection that is being shared without locking or something? 也许有一个没有锁定就可以共享的数据库连接? Can you try just returning the result string and printing it out? 您可以尝试仅返回结果字符串并打印出来吗?

Less likely is some sort of network or firewall configuration but the idea that one failed connection would cause another to fail is just strange. 不太可能进行某种类型的网络或防火墙配置,但是一个连接失败会导致另一个连接失败的想法只是一个奇怪的想法。 Maybe try running your program on one of the servers or from another computer? 也许尝试在其中一台服务器上或从另一台计算机上运行程序?

If I try your test code, it seems to work fine. 如果我尝试您的测试代码,它似乎可以正常工作。 Here's the source code for my test class . 这是我的测试类源代码 It has no problems contacting a combination of online and offline hosts. 与联机和脱机主机的组合联系没有问题。

Lastly some quick comments about your code: 最后,简要介绍一下您的代码:

  • You should close the streams, readers, and sockets in a finally block. 您应该在finally块中关闭流,读取器和套接字。 Check my test class for a better pattern there. 检查我的测试班级那里是否有更好的模式。
  • You should return a small Result class instead of passing back a String that they has to be parsed. 您应该返回一个小的Result类,而不是传回必须对其进行解析的String

Hope this helps. 希望这可以帮助。

After a lot of reading and experimentation i will have to answer my own question (if i am allowed to do of course). 经过大量阅读和实验之后,我将不得不回答自己的问题(如果可以的话,我当然可以这样做)。

Java just can't handle concurrent multiple socket connections without adding a big performance overhead. Java不能在不增加大量性能开销的情况下处理并发多个套接字连接。 At least in a Core2Duo/4GB RAM/ Windows XP machine. 至少在Core2Duo / 4GB RAM / Windows XP计算机中。

Creating multiple concurrent socket connections to remote hosts (using of course the code i posted) creates some kind of resource bottleneck, or blocking situation, wich i am still not aware of. 创建多个到远程主机的并发套接字连接(当然使用我发布的代码)会造成某种资源瓶颈或阻塞情况,尽管我仍然不知道。

If you try to connect to 20 hosts simultaneously, and a lot of them are disconnected, then you cannot guarantee a "fast" connection to the alive ones. 如果您尝试同时连接到20台主机,并且其中许多主机已断开连接,那么您将无法保证与有生命的主机的“快速”连接。 You will get connected but could be after 20-25 seconds. 您将建立连接,但可能需要20-25秒。 Meaning that you'll have to set socket timeout to something like 60 seconds. 这意味着您必须将套接字超时设置为60秒左右。 (not acceptable for my application) (我的申请不接受)

If an alive host is lucky to start its connection try first (having in mind that concurrency is not absolute. the for loop still has sequentiality), then he will probably get connected very fast and get a response. 如果有生命的主机很幸运可以先尝试建立连接(请注意并发不是绝对的。for循环仍具有顺序性),那么他可能会很快连接并获得响应。

If it is unlucky, the socket.connect() method will block for some time, depending on how many are the hosts before it that will timeout eventually. 如果不是很幸运,socket.connect()方法将阻塞一段时间,具体取决于它之前最终将超时的主机数量。

After adding a small sleep between the pool.submit(worker) method calls (100 ms) i realised that it makes some difference. 在pool.submit(worker)方法调用之间添加了一个小小的睡眠(100毫秒)之后,我意识到它会有所作为。 I get to connect faster to the "unlucky" hosts. 我可以更快地连接到“不幸的”主机。 But still if the list of dead hosts is increased, the results are almost the same. 但是,如果增加死主机的数量,结果几乎相同。

If i edit my host list and place a previously "unlucky" host at the top (before dead hosts), all problems dissapear... 如果我编辑主机列表,并将以前“不走运”的主机放在顶部(在死主机之前),所有问题都会消失...

So, for some reason the socket.connect() method creates a form of bottleneck when the hosts to connect to are many, and not alive. 因此,由于某些原因,当要连接的主机很多且没有运行时,socket.connect()方法会形成瓶颈形式。 Be it a JVM problem, a OS limitation or bad coding from my side, i have no clue... 无论是JVM问题,操作系统限制还是我的编码不好,我都不知道。

I will try a different coding approach and hopefully tommorow i will post some feedback. 我将尝试其他编码方法,希望明天我会发表一些反馈。

ps This answer made me think of my problem : https://stackoverflow.com/a/4351360/2025271 ps这个答案让我想到了我的问题: https : //stackoverflow.com/a/4351360/2025271

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM