简体   繁体   English

String.Replace() 与 StringBuilder.Replace()

[英]String.Replace() vs. StringBuilder.Replace()

I have a string in which I need to replace markers with values from a dictionary.我有一个字符串,我需要用字典中的值替换标记。 It has to be as efficient as possible.它必须尽可能高效。 Doing a loop with a string.replace is just going to consume memory (strings are immutable, remember).使用 string.replace 循环只会消耗 memory (字符串是不可变的,请记住)。 Would StringBuilder.Replace() be any better since this is was designed to work with string manipulations? StringBuilder.Replace() 会更好吗,因为它是为字符串操作而设计的?

I was hoping to avoid the expense of RegEx, but if that is going to be a more efficient then so be it.我希望避免 RegEx 的费用,但如果这样会更有效率,那就这样吧。

Note: I don't care about code complexity, only how fast it runs and the memory it consumes.注意:我不关心代码的复杂性,只关心它的运行速度和它消耗的 memory。

Average stats: 255-1024 characters in length, 15-30 keys in the dictionary.平均统计数据:长度为 255-1024 个字符,字典中有 15-30 个键。

Using RedGate Profiler using the following code使用以下代码使用 RedGate Profiler

class Program
    {
        static string data = "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz";
        static Dictionary<string, string> values;

        static void Main(string[] args)
        {
            Console.WriteLine("Data length: " + data.Length);
            values = new Dictionary<string, string>()
            {
                { "ab", "aa" },
                { "jk", "jj" },
                { "lm", "ll" },
                { "yz", "zz" },
                { "ef", "ff" },
                { "st", "uu" },
                { "op", "pp" },
                { "x", "y" }
            };

            StringReplace(data);
            StringBuilderReplace1(data);
            StringBuilderReplace2(new StringBuilder(data, data.Length * 2));

            Console.ReadKey();
        }

        private static void StringReplace(string data)
        {
            foreach(string k in values.Keys)
            {
                data = data.Replace(k, values[k]);
            }
        }

        private static void StringBuilderReplace1(string data)
        {
            StringBuilder sb = new StringBuilder(data, data.Length * 2);
            foreach (string k in values.Keys)
            {
                sb.Replace(k, values[k]);
            }
        }

        private static void StringBuilderReplace2(StringBuilder data)
        {
            foreach (string k in values.Keys)
            {
                data.Replace(k, values[k]);
            }
        }
    }
  • String.Replace = 5.843ms String.Replace = 5.843ms
  • StringBuilder.Replace #1 = 4.059ms StringBuilder.Replace #1 = 4.059ms
  • Stringbuilder.Replace #2 = 0.461ms Stringbuilder.Replace #2 = 0.461ms

String length = 1456字符串长度 = 1456

stringbuilder #1 creates the stringbuilder in the method while #2 does not so the performance difference will end up being the same most likely since you're just moving that work out of the method. stringbuilder #1 在方法中创建 stringbuilder 而 #2 没有,因此性能差异最终很可能是相同的,因为您只是将工作移出方法。 If you start with a stringbuilder instead of a string then #2 might be the way to go instead.如果您从 stringbuilder 而不是 string 开始,那么 #2 可能是 go 的方式。

As far as memory, using RedGateMemory profiler, there is nothing to worry about until you get into MANY replace operations in which stringbuilder is going to win overall.就 memory 而言,使用 RedGateMemory 分析器,没有什么可担心的,直到你进入许多替换操作,其中 stringbuilder 将赢得整体。

This may be of help: https://docs.microsoft.com/en-us/archive/blogs/debuggingtoolbox/comparing-regex-replace-string-replace-and-stringbuilder-replace-which-has-better-performance这可能会有所帮助: https://docs.microsoft.com/en-us/archive/blogs/debuggingtoolbox/comparing-regex-replace-string-replace-and-stringbuilder-replace-which-has-better-performance

The short answer appears to be that String.Replace is faster, although it may have a larger impact on your memory footprint / garbage collection overhead.简短的回答似乎是 String.Replace 更快,尽管它可能会对您的 memory 占用空间/垃圾收集开销产生更大的影响。

Yes, StringBuilder will give you both gain in speed and memory (basically because it won't create an instance of a string each time you will make a manipulation with it - StringBuilder always operates with the same object).是的, StringBuilder会给您带来速度和 memory 的增益(基本上是因为它不会在您每次使用它进行操作时创建字符串的实例 - StringBuilder始终使用相同的对象进行操作)。 Here is an MSDN link with some details.这是一个包含一些详细信息的MSDN 链接

Would stringbuilder.replace be any better [than String.Replace] stringbuilder.replace 会更好[比 String.Replace]

Yes, a lot better.是的,好多了。 And if you can estimate an upper bound for the new string (it looks like you can) then it will probably be fast enough.如果您可以估计新字符串的上限(看起来可以),那么它可能会足够快。

When you create it like:当你像这样创建它时:

  var sb = new StringBuilder(inputString, pessimisticEstimate);

then the StringBuilder will not have to re-allocate its buffer.那么 StringBuilder 将不必重新分配其缓冲区。

My two cents here, I just wrote couple of lines of code to test how each method performs and, as expected, result is "it depends".我的两分钱在这里,我只写了几行代码来测试每种方法的执行方式,并且正如预期的那样,结果是“取决于”。

For longer strings Regex seems to be performing better, for shorter strings, String.Replace it is.对于较长的字符串, Regex似乎表现更好,对于较短的字符串, String.Replace它是。 I can see that usage of StringBuilder.Replace is not very useful, and if wrongly used, it could be lethal in GC perspective (I tried to share one instance of StringBuilder ).我可以看到StringBuilder.Replace的使用不是很有用,如果使用不当,从 GC 的角度来看它可能是致命的(我试图分享一个StringBuilder的实例)。

Check my StringReplaceTests GitHub repo .检查我的StringReplaceTests GitHub repo

The problem with @DustinDavis' answer is that it recursively operates on the same string. @DustinDavis 的答案的问题是它递归地对同一个字符串进行操作。 Unless you're planning on doing a back-and-forth type of manipulation, you really should have separate objects for each manipulation case in this kind of test.除非您打算进行来回类型的操作,否则在这种测试中,您确实应该为每个操作案例设置单独的对象。

I decided to create my own test because I found some conflicting answers all over the Web, and I wanted to be completely sure.我决定创建自己的测试,因为我在 Web 中发现了一些相互矛盾的答案,我想完全确定。 The program I am working on deals with a lot of text (files with tens of thousands of lines in some cases).我正在处理的程序处理大量文本(在某些情况下,文件有数万行)。

So here's a quick method you can copy and paste and see for yourself which is faster.所以这里有一个快速的方法,您可以复制和粘贴并自己查看哪个更快。 You may have to create your own text file to test, but you can easily copy and paste text from anywhere and make a large enough file for yourself:您可能必须创建自己的文本文件进行测试,但您可以轻松地从任何地方复制和粘贴文本,并为自己制作一个足够大的文件:

using System;
using System.Diagnostics;
using System.IO;
using System.Text;
using System.Windows;

void StringReplace_vs_StringBuilderReplace( string file, string word1, string word2 )
{
    using( FileStream fileStream = new FileStream( file, FileMode.Open, FileAccess.Read ) )
    using( StreamReader streamReader = new StreamReader( fileStream, Encoding.UTF8 ) )
    {
        string text = streamReader.ReadToEnd(),
               @string = text;
        StringBuilder @StringBuilder = new StringBuilder( text );
        int iterations = 10000;

        Stopwatch watch1 = new Stopwatch.StartNew();
        for( int i = 0; i < iterations; i++ )
            if( i % 2 == 0 ) @string = @string.Replace( word1, word2 );
            else @string = @string.Replace( word2, word1 );
        watch1.Stop();
        double stringMilliseconds = watch1.ElapsedMilliseconds;

        Stopwatch watch2 = new Stopwatch.StartNew();
        for( int i = 0; i < iterations; i++ )
            if( i % 2 == 0 ) @StringBuilder = @StringBuilder .Replace( word1, word2 );
            else @StringBuilder = @StringBuilder .Replace( word2, word1 );
        watch2.Stop();
        double StringBuilderMilliseconds = watch1.ElapsedMilliseconds;

        MessageBox.Show( string.Format( "string.Replace: {0}\nStringBuilder.Replace: {1}",
                                        stringMilliseconds, StringBuilderMilliseconds ) );
    }
}

I got that string.Replace() was faster by about 20% every time swapping out 8-10 letter words.我得到了那个 string.Replace() 每次换出 8-10 个字母的单词时,速度提高了大约 20%。 Try it for yourself if you want your own empirical evidence.如果您想要自己的经验证据,请亲自尝试。

Converting data from a String to a StringBuilder and back will take some time.将数据从 String 转换为 StringBuilder 并返回需要一些时间。 If one is only performing a single replace operation, this time may not be recouped by the efficiency improvements inherent in StringBuilder.如果只执行一次替换操作,那么这个时间可能无法通过 StringBuilder 中固有的效率改进来弥补。 On the other hand, if one converts a string to a StringBuilder, then performs many Replace operations on it, and converts it back at the end, the StringBuilder approach is apt to be faster.另一方面,如果将字符串转换为 StringBuilder,然后对其执行多次替换操作,最后再将其转换回来,则 StringBuilder 方法往往更快。

Rather than running 15-30 replace operations on the entire string, it might be more efficient to use something like a trie data structure to hold your dictionary.与其对整个字符串运行 15-30 次替换操作,不如使用trie数据结构之类的东西来保存字典可能更有效。 Then you can loop through your input string once to do all your searching/replacing.然后,您可以遍历输入字符串一次以进行所有搜索/替换。

It will depend a lot on how many of the markers are present in a given string on average.这在很大程度上取决于给定字符串中平均存在多少个标记。

Performance of searching for a key is likely to be similar between StringBuilder and String, but StringBuilder will win if you have to replace many markers in a single string.在 StringBuilder 和 String 之间搜索键的性能可能相似,但如果您必须替换单个字符串中的许多标记,则 StringBuilder 会胜出。

If you only expect one or two markers per string on average, and your dictionary is small, I would just go for the String.Replace.如果您平均每个字符串只期望一个或两个标记,并且您的字典很小,那么我只会 go 用于 String.Replace。

If there are many markers, you might want to define a custom syntax to identify markers - eg enclosing in braces with a suitable escaping rule for a literal brace.如果有很多标记,您可能需要定义自定义语法来识别标记 - 例如,使用适合文字大括号的 escaping 规则将其括在大括号中。 You can then implement a parsing algorithm that iterates through the characters of the string once, recognizing and replacing each marker that it finds.然后,您可以实现一个解析算法,该算法遍历字符串的字符一次,识别并替换它找到的每个标记。 Or use a regex.或者使用正则表达式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM