简体   繁体   中英

.NET sockets vs C++ sockets at high performance

My question is to settle an argument with my co-workers on C++ vs C#.

We have implemented a server that receives a large amount of UDP streams. This server was developed in C++ using asynchronous sockets and overlapped I/O using completion ports. We use 5 completion ports with 5 threads. This server can easily handle a 500 Mbps throughput on a gigabit network without any lost of packets / error (we didn't push our tests farther than 500 Mbps).

We have tried to re-implement the same kind of server in C# and we have not been able to reach the same incoming throughput. We are using asynchronous receive using ReceiveAsync method and a pool of SocketAsyncEventArgs to avoid the overhead of creating new object for every receive call. Each SAEventArgs has a buffer set to it so we do not need to allocate memory for every receive. The pool is very, very large so we can queue more than 100 receive requests. This server is unable to handle an incoming throughput of more than 240 Mbps. Over that limit, we lose some packets in our UDP streams.

My question is this: should I expect the same performance using C++ sockets and C# sockets? My opinion is that it should be the same performance if memory is managed correctly in .NET.

Side question: would anybody know a good article/reference explaining how .NET sockets use I/O completion ports under the hood?

would anybody know a good article/reference explaining how .NET sockets use I/O completion ports under the hood?

I suspect the only reference would be the implementation (ie. Reflector or other assembly de-compiler). With that you will find that all asynchronous IO goes through an IO Completion Port with call backs being processed in the IO-thread pool (which is separate to the normal thread pool).

use 5 completion ports

I would expect to use a single completion port processing all the IO into a single pool of threads with one thread per pool servicing completions (assuming you are doing any other IO, including disk, asynchronously as well).

Multiple completion ports would make sense if you have some form of prioritisation going on.

My question is this: should I expect the same performance using C++ sockets and C# sockets?

Yes or no, depending on how narrowly you define the "using ... sockets" part. In terms of the operations from the start of the asynchronous operation until the completion is posted to the completion port I would expect no significant difference (all the processing is in the Win32 API or Windows kernel).

However the safety that the .NET runtime provides will add some overhead. Eg. buffer lengths will be checked, delegates validated etc. If the limit on the application is CPU then this is likely to make a difference, and at the extreme a small difference can easily add up.

Also the .NET version will occasionally pause for GC (.NET 4.5 does asynchronous collection, so this will get better in the future). There are techniques to minimise garbage accumulating (eg. reuse objects rather than creating them, make use of structures while avoiding boxing).

In the end, if the C++ version works and is meeting your performance needs, why port?

You can't do a straight port of the code from C++ to C# and expect the same performance. .NET does a lot more than C++ when it comes to memory management (GC) and making sure that your code is safe (boundary checks etc).

I would allocate one large buffer for all IO operations (for instance 65535 x 500 = 32767500 bytes) and then assign a chunk to each SocketAsyncEventArgs (and for send operations). Memory is cheaper than CPU. Use a buffer manager / factory to provide chunks for all connections and IO operations (Flyweight pattern). Microsoft does this in their Async example.

Both Begin/End and Async methods uses IO completion ports in the background. The latter doesn't need to allocate objects for each operation which boosts performance.

My guess is that you're not seeing the same performance because .NET and C++ are actually doing different things. Your C++ code may not be as safe, or check boundaries. Also, are you simply measuring the ability to receive the packets without any processing? Or does your throughput include packet processing time? If so, then the code you may have written to process the packets may not be as efficient.

I'd suggest using a profiler to check where the most time is being spent and trying to optimize that. The actual socket code should be quite performant.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM