简体繁体 English

在C / C ++中重写C＃代码的性能提升

[英]Performance gains in re-writing C# code in C/C++

原文 2010-11-17 11:03:45 3 9 c#/ c++/ c/ multithreading/ performance

I wrote part of a program that does some heavy work with strings in C#. 我写了一个程序的一部分，用C＃中的字符串做了一些繁重的工作。 I initially chose C# not only because it was easier to use .NET's data structures, but also because I need to use this program to analyse some 2-3 million text records in a database, and it is much easier to connect to databases using C#. 我最初选择C＃不仅因为它更容易使用.NET的数据结构，而且因为我需要使用该程序来分析数据库中的大约2-3百万条文本记录，并且使用C＃连接到数据库要容易得多。

There was a part of the program that was slowing down the whole code, and I decided to rewrite it in C using pointers to access every character in the string, and now the part of the code that took some 119 seconds to analyse 10,000,000 strings in C# takes the C code only 5 seconds! 有一部分程序正在减慢整个代码的速度，我决定使用指针来重写它，使用指针来访问字符串中的每个字符，现在代码的一部分花费了大约119秒来分析10,000,000个字符串。 C＃只需5秒即可获得C代码！ Performance is a priority, so I am considering rewriting the whole program in C, compiling it into a dll (something which I didn't know how to do when I started writing the program) and using DllImport from C# to use its methods to work with the database strings. 性能是一个优先级，所以我正在考虑用C重写整个程序，将它编译成一个dll（当我开始编写程序时我不知道该怎么做）并使用C＃中的DllImport来使用它的方法来工作用数据库字符串。

Given that rewriting the whole program will take some time, and since using DllImport to work with C#'s strings requires marshalling and such things, my question is will the performance gains from the C dll's faster string handling outweigh the performance hit of having to repeatedly marshal strings to access the C dll from C#? 考虑到重写整个程序需要一些时间，并且由于使用DllImport来处理C＃的字符串需要编组等等，我的问题是C dll更快的字符串处理的性能提升将超过必须重复编组字符串的性能损失从C＃访问C dll？

9 个解决方案

一种选择是将C代码重写为不安全的C＃，它应该具有大致相同的性能并且不会产生任何互操作性惩罚。

First, profile your code. 首先，分析您的代码。 You might find some real headsmacker that speeds the C# code up greatly. 您可能会发现一些真正的headmacker可以大大加快C＃代码的速度。

Second, writing the code in C using pointers is not really a fair comparison. 其次，使用指针在C中编写代码并不是一个公平的比较。 If you are going to use pointers why not write it in assembly language and get real performance? 如果您打算使用指针，为什么不用汇编语言编写它并获得真正的性能呢？ (Not really, just reductio ad absurdam .) A better comparison for native code would be to use std::string . （不是真的，只是减少荒谬 。）对本机代码的更好比较是使用std::string 。 That way you still get a lot of help from the string class and C++ exception-safety. 这样你仍然可以从string类和C ++异常安全中获得很多帮助。

Given that you have to read 2-3 million records from the DB to do this work, I very much doubt that the time spent cracking the strings is going to outweigh the elapsed time taken to load the data from the DB. 鉴于您必须从数据库中读取2-3百万条记录来完成这项工作，我非常怀疑破解字符串所花费的时间将超过从数据库加载数据所花费的时间。 So, consider instead how to structure your code so that you can begin string processing while the DB load is in progress. 因此，请考虑如何构建代码，以便在数据库负载正在进行时开始字符串处理。

If you use a SqlDataReader (say) to load the rows sequentially, it should be possible to batch up N rows as fast as possible and hand off to a separate thread for the post-processing that is your current headache and reason for this question. 如果您使用SqlDataReader （比如说）按顺序加载行，则应该可以尽快批量处理N行，并将其移交给单独的线程进行后处理，这是您当前头痛的问题所在。 If you are on .Net 4.0 this is simplest to do using Task Parallel Library , and System.Collections.Concurrent could also be useful for collation of results between the threads. 如果你在.Net 4.0上这是最简单的使用任务并行库， System.Collections.Concurrent也可用于线程之间的结果整理。

This approach should mean that neither the DB latency nor the string processing is a show-stopping bottleneck, because they happen in parallel. 这种方法应该意味着DB延迟和字符串处理都不是一个显示停止的瓶颈，因为它们并行发生。 This applies even if you are on a single-processor machine because your app can process strings while it's waiting for the next batch of data to come back from the DB over the network. 即使您使用的是单处理器计算机，这也适用，因为您的应用程序可以在等待下一批数据通过网络从数据库返回时处理字符串。 If you find string processing is the slowest, use more threads (ie. Task s) for that. 如果您发现字符串处理最慢，请使用更多线程（即Task ）。 If the DB is the bottleneck, then you have to look at external means to improve its performance - DB hardware or schema, network infrastructure. 如果数据库是瓶颈，那么您必须查看外部方法以提高其性能 - 数据库硬件或架构，网络基础架构。 If you need some results in hand before processing more data, TPL allows dependencies to be created between Task s and the coordinating thread. 如果在处理更多数据之前需要掌握一些结果，TPL允许在Task和协调线程之间创建依赖关系。

My point is that I doubt it's worth the pain of re-engineering the entire app in native C or whatever. 我的观点是，我怀疑在本地C或其他任何地方重新设计整个应用程序的痛苦是值得的。 There are lots of ways to skin this cat. 有很多方法可以给这只猫上皮。

There's no reason to write in C over C++, and C/C++ does not exist. 没有理由使用C ++编写C语言，并且C / C ++不存在。

The performance implications of marshalling are fairly simple. 编组的性能影响相当简单。 If you have to marshal every string individually, then your performance is gonna suck. 如果你必须单独编组每个字符串，那么你的表现会很糟糕。 If you can marshal all ten million strings in one call, then marshalling isn't gonna make any difference at all. 如果你可以在一个电话中整理所有一千万个字符串，那么编组根本不会产生任何影响。 P/Invoke is not the fastest operation in the world but if you only invoke it a few times, it's not really gonna matter. P / Invoke不是世界上最快的操作，但是如果你只调用它几次，那就不重要了。

It might be easier to re-write your core application in C++ and then use C++/CLI to merge it with the C# database end. 用C ++重新编写核心应用程序然后使用C ++ / CLI将其与C＃数据库端合并可能更容易。

There are some pretty good answers here already, especially @Steve Townsend's. 这里有一些非常好的答案，尤其是@Steve Townsend's。

However, I felt it worth underlining a key point: There is intrinisically no reason why C code "will be faster" than C# code . 但是，我觉得值得强调一个关键点： 内在没有理由为什么C代码“比C＃代码更快” 。 That idea is a myth. 这个想法是一个神话。 Under the bonnet they both produce machine code that runs on the same CPU. 在引擎盖下，它们都生成在同一CPU上运行的机器代码。 As long as you don't ask the C# to do more work than the C, then it can perform just as well. 只要你不要求C＃做比C 更多的工作 ，那么它也可以表现得更好。

By switching to C, you forced yourself to be more frugal (you avoided using high level features like managed strings, bounds-checking, garbage collection, exception handling, etc, and simply treated your strings as blocks of raw bytes). 通过切换到C，您强迫自己更节俭（您避免使用托管字符串，边界检查，垃圾收集，异常处理等高级功能，并简单地将您的字符串视为原始字节块）。 If you applied these low-level techniques to your C# code (ie treating your data as raw blocks of bytes as you did in C), you would find much less difference in the speed. 如果您将这些低级技术应用于C＃代码（即将数据视为原始字节块，就像在C中那样），您会发现速度差异要小得多。

For example: Last week I re-wrote (in C#) a class that a junior had written (also in C#). 例如：上周我重写了（在C＃中）一个大三学生写的课（也在C＃中）。 I achieved a 25x speed improvement over the original code by applying the same approach I would use if I were writing it in C (ie thinking about performance). 我通过应用如果我用C写它（即考虑性能），我会用同样的方法实现了超过原码25倍的速度提升。 I achieved the same speedup you're claiming without having to change to a different language at all. 我实现了你所声称的相同的加速，而不必改为使用不同的语言。

Finally, just because an isolated case can be made 24x faster, it does not mean you can make your whole program 24x faster across the board by porting it all to C. As Steve said, profile it to work out where it's slow, and expend your effort only where it'll provide significant benefits. 最后，仅仅因为一个孤立的案例可以快24倍，这并不意味着你可以通过将它全部移植到C来使你的整个程序全速提高24倍。正如史蒂夫所说的那样，对其进行分析以确定它的速度慢，并且花费很多你的努力只会在它提供重大利益的地方。 If you blindly convert to C you'll probably find you've spent a lot of time making some already-working-code a lot less maintainable. 如果你盲目地转换为C，你可能会发现你花了很多时间使一些已经工作的代码更难以维护。

(PS My viewpoint comes from 29 years experience writing assembler, C, C++, and C# code, and understanding that the language is just a tool for generating machine-code - in the case of C# vs C++ vs C, it is primarily the programmer's skill, not the language used, that determines whether the code will run quickly or slowly. C/C++ programmers tend to be better than C# programmers because they have to be - C# allows you to be lazy and get the code written quickly, while C/C++ make you do more work and the code takes longer to write. But a good programmer can get great performance out of C#, and a poor programmer can wrest abysmal performance out of C/C++) （PS我的观点来自29年编写汇编程序，C，C ++和C＃代码的经验，并且理解该语言只是生成机器代码的工具 - 在C＃vs C ++ vs C的情况下，它主要是程序员的技能，而不是使用的语言，决定代码是快速还是慢速运行.C / C ++程序员往往比C＃程序员更好，因为他们必须 - C＃允许你懒惰并快速编写代码，而C / C ++让你做更多的工作，代码需要更长的时间来编写。但是一个好的程序员可以从C＃中获得很好的性能，而一个糟糕的程序员可以从C / C ++中榨取糟糕的性能）

With strings being immutable in .NET, I have no doubt that an optimised C implementation will outperform an optimised C# implemented - no doubt! 由于字符串在.NET中是不可变的，我毫不怀疑优化的 C实现将胜过优化的 C＃实现 - 毫无疑问！

P/Invoke does incur an overhead but if you implement the bulk of the logic in C and only expose very granular API for C#, I believe you are in a much better shape. P / Invoke确实会产生开销，但是如果你在C中实现大部分逻辑并且只为C＃公开非常精细的API，我相信你的状态要好得多。

At the end of the day, writing an implementation in C means taking longer - but that will give you better performance if you preprepared for extra development cost. 在一天结束时，用C语言编写实现意味着需要更长时间 - 但如果您准备好额外的开发成本，这将为您提供更好的性能。

Make yourself familiar with mixed assemblies - this is better than Interop. 让自己熟悉混合组件 - 这比Interop更好。 Interop is a fast track way to deal with native libs, but mixed assemblies perform better. Interop是处理本机库的快速方法，但混合程序集的性能更好。
Mixed assemblies on MSDN MSDN上的混合程序集
As usual the main thing is testing and measuring... 像往常一样，主要是测试和测量......

For concatenation of long strings or multiple strings always use StringBuilder . 对于长字符串或多个字符串的串联，请始终使用StringBuilder 。 What not everybody knows, is that StringBuilder cannot only be used to make concatenating strings faster, but also Insertion, Removal and Replacement of characters. 不是每个人都知道， StringBuilder不仅可以用来更快地连接字符串，还可以用来插入，删除和替换字符。

If this isn't fast enough for you, you can use char- or byte-arrays instead of a strings and operate on these. 如果这对你来说不够快，你可以使用字符串或字节数组而不是字符串来操作它们。 If you are done with manipulation you can convert the array back to a string. 如果完成了操作，则可以将数组转换回字符串。

There is also the option in C# to use unsafe code to get a pointer to a string and modifiy the otherwise immutable string but I wouldn't really recommend this. 在C＃中还有一个选项是使用不安全的代码来获取指向字符串的指针并修改其他不可变的字符串，但我不会真的推荐这个。

As others have alread said, you can use managed C++ (C++/CLI) to nicely interoperate between .NET and managed code. 正如其他人所说，您可以使用托管C ++ （C ++ / CLI）在.NET和托管代码之间进行良好的互操作。

Would you mind showing us the code, maybe there are other options for optimizing? 您是否介意向我们展示代码，也许还有其他优化选项？

When you start to optimize a program at a late stage (the application was written without optimization in mind) then you have to identify the bottlenecks. 当您开始在后期阶段优化程序时（应用程序编写时没有考虑优化），您必须确定瓶颈。

Profiling is the first step to see where all those CPU cycles are going. 分析是查看所有CPU周期的第一步。

Just keep in mind that C# profilers will only profile your .Net application -not the IIS server implemented in the kernel nor the network stack. 请记住，C＃分析器只会分析您的.Net应用程序 - 不是内核中实现的IIS服务器，也不是网络堆栈。

And this can be an invisible bottleneck that beats by several orders of magnitude what you are focussing on when trying to make progress. 这可能是一个看不见的瓶颈，在你努力取得进步时，你会关注几个数量级的瓶颈。

There you think that you have no influence on IIS implemented as a kernel driver -and you are right. 你认为你对作为内核驱动程序实现的IIS没有任何影响 - 你是对的。

But you can do without it - and save a lot of time and money. 但你可以没有它 - 并节省了大量的时间和金钱。

Put your talent where it can make the difference - not where you are forced to run with your feet tied together. 把你的才能放在能够发挥作用的地方 - 而不是你被迫用脚绑在一起的地方。

The inherent differences are usually given as 2x less CPU, 5x memory. 固有的差异通常是2倍的CPU，5倍的内存。 In practice, few people are good enough at or C++ to gain the benefits. 在实践中，很少有人能够或C ++获得好处。

There's additional gain for skimping on Unicode support, but only you can know your application well enough to know if that's safe. 缩小Unicode支持有额外的好处，但只有你能够充分了解你的应用程序才能知道这是否安全。

Use the profiler first, make sure you not I/O bound. 首先使用分析器，确保没有I / O限制。