简体   繁体   中英

Dns.BeginGetHost… methods blocking

So I want to make a lot of DNS queries.

I create (thousands) of Tasks from the Begin/EndGetHostEntry async pair:

var lookupTask = Task.Factory.FromAsync
   ( Dns.BeginGetHostEntry,
     (Func<IAsyncResult, IPHostEntry>) Dns.EndGetHostEntry,
     "google.com", 
     null
   )

then Task.WaitAll for everything to complete. I'm seeing the number of ThreadPool threads increase drastically in response to my requests. If I force the ThreadPool minThreads to 500, the workload is consumed considerably faster. All of this points to blocking in the Dns asynchronous implementation.

If I replace Dns with a managed Dns client , I can consume the same workload with only 1 or 2 threads in the ThreadPool with cpu virtually idling.

The thing is, the Dns implementation is absolutely core to many networking APIs ( HttpWebRequest , WebClient , HttpClient ), and they all seem to be affected by this issue. If I resolve DNS with a 3rd party library, and make HTTP requests using the IP address as the host in the uri, then alter the Host header to fix the request, I get blistering performance in comparison to anything involving System.Net.Dns .

What's going on here? Have I missed something or is the System.Net.Dns implementation really that bad?

System.Net.Dns uses the windows gethostbyname function for DNS queries and doesn't really have asynchronous functions at all. The BeginGetHostEntry function is basically just a wrapper for a synchronous GetHostEntry invocation on the thread pool.

Last time I had this same problem with slow/synchronous DNS lookups I eventually just used a large ThreadPool to get the job done since not a single built-in windows or .net DNS related function supports proper (parallel) asynchronous execution.

This may not be a whole answer but:

The DNS resolving within .net, opens a connection to dns, asks a question and closes. The examples for the managed dns client you linked, clearly show, that this library make a connection, and then while that remains open you can make many questions just like doing

nslookup -

>hostname1
>hostname2
...

under dos/unix

Often when opening it can take a while, by making multiple calls to the already open connection you are not having to do the reverselookup on yourself, and itself, and all the other rubbish the connection to the dns server does when it first connects. For example: if the first DNS server on my list is busy, my machine often takes time to resolve to a different server that was available, as a result, if you encountered that each and every time you did a look up under the .net library, you would see a long wait, and so many threads would be needed, and of course bulk up the CPU load, while really doing not a lot.

The implementation isnt "bad" its just not designed for multiple batch jobs. Unless there are calls I missed too.

I don't have a dataset of 1000 URLs to test your code with, and requesting the same URL repeatedly should result in hitting the cache (not the DNS server for my network). So please comment as to the success/failure once you test this.

My recommendation for testing this (or any other hypothesis) would be to create a test dataset of 1000 URLs you want to resolve and number them. Then setup some logging (ie: log4net or similar) and write out a statement when each DNS resolution task finishes including the index of the completed task. I believe you will see these 1000 tasks complete somewhat synchronously. Or at least in groups of 2-8 asynchronous results at a time, where all the groups of 2-8 are synchronous.

The reason for that is connection management. Internally .Net will only allow so many concurrent connections to the same endpoint. If you open up 1000 connection to your dns server, only a few will succeed at a time. The rest need to wait until some earlier connections are closed before they can establish another connection to that same endpoint (your DNS server).

There are good reasons for this limitation normally. But for something like DNS which is relatively small amounts of data and relatively low cost to service the request, I'd be ok to open up that limitation up to say 100-200 simultaneous DNS requests.

You can open up this limitation with this configuration:

<configuration>
  <system.net>
    <connectionManagement>
      <add address="*" maxconnection="100"/>
    </connectionManagement>
  </system.net>
</configuration>

MSDN for System.Net.ConnectionManagement

You can specify a specific endpoint address (URL or IP) and the maximum connections to that address. Some load testing applications will just use the wildcard * and 65535 to open it right up for everything.

I suspect that managed DNS implementation is either reusing the same connection to the DNS server or has some internal configuration like the above.

Some more details you might include in your question is whether you are querying a local DNS server on the same physical network or a DNS server from your local ISP, or a public DNS server like OpenDNS . The configuration of those specific DNS servers may impose there own limitations (ISPs may rate limit, I don't know).

Normal usage does not usually have better performance when the dns lookup is asynchronous as the code needs the answer to continue working. Going parallel gains nothing. Only when you solely want to lookup multiple DNS'es this becomes a real issue.

For why it's a bit slow, and improving performance check this SO Question and answer(s) GetHostEntry is very slow

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM