简体   繁体   English

c#为什么通过线程调用WebClient时,大多数时间都会超时?

[英]c# Why the WebClient times out most of the timeswhen it is invoked through a thread?

I am working on a project which uses a timed web client. 我正在使用定时Web客户端的项目中工作。 Class structure is like this. 类的结构是这样的。

Controller => Main supervisor of class Form1, SourceReader, ReportWriter, UrlFileReader, HTTPWorker, TimedWebClient. 控制器=>类Form1,SourceReader,ReportWriter,UrlFileReader,HTTPWorker,TimedWebClient的主主管。

HTTPworker is the class to get the page source when the url is given. HTTPworker是在指定网址时获取页面源的类。 TimedWebClient is the class to handle the timeout of the WebClient. TimedWebClient是处理WebClient超时的类。 Here is the code. 这是代码。

class TimedWebClient : WebClient
{
    int Timeout; 

    public TimedWebClient()
    {
        this.Timeout = 5000;
    }


      protected override WebRequest GetWebRequest(Uri address)
    {
        var objWebRequest = base.GetWebRequest(address);
        objWebRequest.Timeout = this.Timeout;
        return objWebRequest;
    }
}

In HTTPWorker i have 在HTTPWorker中,我有

 TimedWebClient wclient = new TimedWebClient();
 wclient.Proxy = WebRequest.GetSystemWebProxy();
 wclient.Headers["Accept"] = "application/x-ms-application, image/jpeg, application/xaml+xml, image/gif, image/pjpeg, application/x-ms-xbap, application/x-shockwave-flash, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*";
 wclient.Headers["User-Agent"] = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; MDDC)";
 string pagesource = wclient.DownloadData(requestUrl);
 UTF8Encoding objUTF8 = new UTF8Encoding();
 responseData = objUTF8.GetString(pagesource);

I have handled exceptions there. 我在那里处理过异常。 In Form1 i have a background controller and a urllist. 在Form1中,我有一个后台控制器和一个URL列表。

First Implementation : 首次实施:

First I took one url at a time and gave it to the ONLY Controller object to process. 首先,我一次获取一个网址,并将其提供给仅控制器对象进行处理。 Then it worked fine. 然后工作正常。 But as it is sequential it took a long time when the list is too large. 但是,由于列表是顺序的,因此花费了很长时间,因为列表太大了。

Second Implementation: 二次执行:

Then in the Do_Work of the backgroundworker I made seven controllers and seven threads. 然后在后台工作人员的Do_Work中,我制作了七个控制器和七个线程。 Each controller has unique HTTPWorker object. 每个控制器都有唯一的HTTPWorker对象。 But now it throws exceptions saying "timedout". 但是现在它抛出异常,说“超时”。

Below is the code in Form1.cs backgroundworker1_DoWork. 下面是Form1.cs backgroundworker1_DoWork中的代码。

private void backgroundWorker1_DoWork(object sender, DoWorkEventArgs e)
{
    bool done = false;

    while (!backgroundWorker1.CancellationPending && !done)
    {
        int iterator = 1;
        int tempiterator = iterator;
        Controller[] cntrlrarray = new Controller[numofcontrollers];
        Thread[] threadarray = new Thread[numofcontrollers];
        int cntrlcntr = 0;
        for ( cntrlcntr = 0; cntrlcntr < numofcontrollers; cntrlcntr++)
        {
            cntrlrarray[cntrlcntr] = new Controller();

        }
        cntrlcntr = 0; 
        for (iterator = 1; iterator <= this.urlList.Count; iterator++)
        {
            int assignedthreads = 0;

            for (int threadcounter = 0; threadcounter < numofcontrollers; threadcounter++)
            {
                cntrlcntr = threadcounter;
                threadarray[threadcounter] = new Thread(() => cntrlrarray[cntrlcntr].Process(iterator - 1));
                threadarray[threadcounter].Name = this.urlList[iterator - 1];
                threadarray[threadcounter].Start();
                backgroundWorker1.ReportProgress(iterator);
                assignedthreads++;

                if (iterator == this.urlList.Count)
                {
                    break;
                }
                else
                {
                    iterator++;
                }

            }

            for (int threadcounter = 0; threadcounter < assignedthreads; threadcounter++)
            {
                cntrlcntr = threadcounter;
                threadarray[threadcounter].Join();

            }
            if (iterator == this.urlList.Count)
            {
                break;
            }
            else
            {
                iterator--;
            }

        }
        done = true;
    }
}

What is the reason and the solution for this? 这是什么原因和解决方案? Appolgises for being too lengthy. 太过冗长。 Thank you in advance. 先感谢您。

The sky... it's full of Threads! 天空...充满了线程! Seriously, though - don't use this many threads. 严重的是-不要使用这么多线程。 That's what asynchronous I/O is for. 这就是异步I / O的目的。 If you're using .NET 4.5, this is very easy to do using await/async, otherwise it's a bit of boilerplate code, but it's still far preferable to this. 如果您使用的是.NET 4.5,则使用await / async非常容易做到这一点,否则,它有点样板代码,但仍然比这更好。

With that out of the way, the amount of TCP connections is quite limited by default. 这样一来,默认情况下,TCP连接的数量就非常有限。 Even if there was a use for having 1000 downloads at once (and it probably isn't, since you're sharing bandwidth), you simply can't create and drop TCP connections willy-nilly - there's a limit to open TCP connections (anywhere from 5 to 20, unless you're on a server). 即使可以一次下载1000次(由于共享带宽,也可能没有下载),但您根本无法随意创建和删除TCP连接-打开TCP连接是有限制的(除非您在服务器上,否则5到20之间的任何值)。 You can change this, but it's usually preferred to do things differently. 您可以更改此设置,但通常首选以其他方式进行操作。 See this entry . 请参阅此条 This might also be a problem if this application is not running alone (which it probably isn't, given that you wouldn't have such a problem on server Windows). 如果此应用程序不是单独运行,那么这也可能是个问题(鉴于您在服务器Windows上不会遇到此类问题,可能不是这样)。 For example, torrent clients often bump into the half-open connection limit (a connection which is still waiting for the end of the initial TCP handskahe). 例如,洪流客户端经常遇到半开放连接限制(连接仍在等待初始TCP握手结束的连接)。 This would be detriminal to your application, of course). 当然,这将不利于您的应用程序。

Now, even if you keep under this limit, there's also a fixed amount of outbound and inbound ports to use when communicating. 现在,即使您保持在此限制之下,在通信时也将使用固定数量的出站和入站端口。 This is a problem when you quickly open and close TCP connections, because TCP keeps the connection alive in the background for about 4 minutes (to make sure no wrong packets arrive to the port, which could be reused in the meantime). 当您快速打开和关闭TCP连接时,这是一个问题,因为TCP会将连接在后台保持活动状态约4分钟(以确保没有错误的数据包到达端口,在此期间可以重复使用)。 This means that if you create enough connections in this time interval, you're going to "starve" your port pool, and every new TCP connection will be denied (so your browser will temporarily stop working, etc.). 这意味着,如果在此时间间隔内创建足够的连接,您将“饿死”您的端口池,并且每个新的TCP连接都将被拒绝(因此您的浏览器将暂时停止工作,等等)。

Next, a 5 second timeout is pretty low. 接下来,5秒的超时时间非常短。 Really. 真。 Imagine that it would take a second to complete a handshake (that's a ping of ~300ms, which is still within the realm of reasonable internet response). 想象一下,完成一次握手将花费一秒钟的时间(大约300ms,这仍然在合理的互联网响应范围之内)。 Suddenly, you've got a new connection, which has to wait for the other handshakes to finish, and it might take a few seconds just for that. 突然,您有了一个新的连接,必须等待其他握手完成,仅此可能要花费几秒钟。 And that's still just the initiation of the connection. 而且这仍然只是连接的开始。 Then there's the DNS lookup, and the response of the HTTP server itself... 5 seconds is a low timeout. 然后是DNS查找,以及HTTP服务器本身的响应... 5秒钟是很短的超时。

In short, it's not the multi-threading - it's the massive amounts of (useless) connections you're opening. 简而言之,它不是多线程,而是您正在打开的大量(无用)连接。 Also, for URLs on a single web, you should look into Keep-Alive connections - they can reuse the already opened TCP connection, which significantly mitigates this problem. 另外,对于单个Web上的URL,您应该研究“保持活动”连接-它们可以重用已经打开的TCP连接,从而大大缓解了此问题。

Now, to get deeper into this. 现在,深入了解这一点。 You're starting and destroying threads needlessly. 您正在不必要地启动和销毁线程。 Instead, it would be a better idea to have a URL queue and several thread consumers, that would take input from the queue. 取而代之的是,最好有一个URL队列和几个线程使用者,它们将从队列中获取输入。 This way, you'll only have those 7 (or whatever the number) threads that poll from the queue as long as there's something in it, which saves a lot of system resources (and improves your performance). 这样,只要队列中有东西,您就只能从队列中轮询这7个(或任何数目)线程,这样可以节省大量系统资源(并提高性能)。 I'm thinking that the Thread.Join you're doing might also have something to do with your issues. 我认为您正在执行的Thread.Join可能也与您的问题有关。 Even though you're running the thing in a background worker, it just might be possible there's something strange hapenning in there. 即使您是在后台工作人员中运行该程序,也可能会在其中发生一些奇怪的现象。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM