简体   繁体   English

String.Replace .NET Framework 的内存效率和性能

[英]Memory Efficiency and Performance of String.Replace .NET Framework

 string str1 = "12345ABC...\\...ABC100000"; 
 // Hypothetically huge string of 100000 + Unicode Chars
 str1 = str1.Replace("1", string.Empty);
 str1 = str1.Replace("22", string.Empty);
 str1 = str1.Replace("656", string.Empty);
 str1 = str1.Replace("77ABC", string.Empty);

 // ...  this replace anti-pattern might happen with upto 50 consecutive lines of code.

 str1 = str1.Replace("ABCDEFGHIJD", string.Empty);

I have inherited some code that does the same as the snippet above.我继承了一些与上面的代码片段相同的代码。 It takes a huge string and replaces (removes) constant smaller strings from the large string.它需要一个巨大的字符串并从大字符串中替换(删除)常量较小的字符串。

I believe this is a very memory intensive process given that new large immutable strings are being allocated in memory for each replace, awaiting death via the GC.我相信这是一个非常内存密集型的过程,因为每次替换都会在内存中分配新的大型不可变字符串,等待通过 GC 死亡。

1. What is the fastest way of replacing these values, ignoring memory concerns? 1. 忽略内存问题,替换这些值的最快方法是什么?

2. What is the most memory efficient way of achieving the same result? 2. 达到相同结果的最节省内存的方法是什么?

I am hoping that these are the same answer!我希望这些是相同的答案!

Practical solutions that fit somewhere in between these goals are also appreciated.适合介于这些目标之间的实用解决方案也受到赞赏。

Assumptions:假设:

  • All replacements are constant and known in advance所有替换都是不变的,并且提前知道
  • Underlying characters do contain some unicode [non-ascii] chars底层字符确实包含一些 unicode [non-ascii] 字符

All characters in a .NET string are "unicode chars". .NET 字符串中的所有字符都是“unicode 字符”。 Do you mean they're non-ascii?你的意思是他们是非ASCII? That shouldn't make any odds - unless you run into composition issues, eg an "e + acute accent" not being replaced when you try to replace an "e acute".这应该不会有任何问题 - 除非您遇到构图问题,例如,当您尝试替换“e 锐音”时未替换“e + 锐音”。

You could try using a regular expression with Regex.Replace , or StringBuilder.Replace .您可以尝试将正则表达式与Regex.ReplaceStringBuilder.Replace Here's sample code doing the same thing with both:这是对两者执行相同操作的示例代码:

using System;
using System.Text;
using System.Text.RegularExpressions;

class Test
{
    static void Main(string[] args)
    {
        string original = "abcdefghijkl";

        Regex regex = new Regex("a|c|e|g|i|k", RegexOptions.Compiled);

        string removedByRegex = regex.Replace(original, "");
        string removedByStringBuilder = new StringBuilder(original)
            .Replace("a", "")
            .Replace("c", "")
            .Replace("e", "")
            .Replace("g", "")
            .Replace("i", "")
            .Replace("k", "")
            .ToString();

        Console.WriteLine(removedByRegex);
        Console.WriteLine(removedByStringBuilder);
    }
}

I wouldn't like to guess which is more efficient - you'd have to benchmark with your specific application.我不想猜测哪个更有效 - 您必须对您的特定应用程序进行基准测试。 The regex way may be able to do it all in one pass, but that pass will be relatively CPU-intensive compared with each of the many replaces in StringBuilder. regex 方式可能可以一次完成所有操作,但与 StringBuilder 中的许多替换中的每一个相比,该过程将相对占用 CPU。

If you want to be really fast, and I mean really fast you'll have to look beyond the StringBuilder and just write well optimized code.如果你想非常快,我的意思是非常快,你必须超越 StringBuilder 并编写优化好的代码。

One thing your computer doesn't like to do is branching, if you can write a replace method which operates on a fixed array (char *) and doesn't branch you have great performance.您的计算机不喜欢做的一件事是分支,如果您可以编写一个对固定数组 (char *) 进行操作且不分支的替换方法,那么您将获得出色的性能。

What you'll be doing is that the replace operation is going to search for a sequence of characters and if it finds any such sub string it will replace it.您将要做的是替换操作将搜索一个字符序列,如果找到任何这样的子字符串,它将替换它。 In effect you'll copy the string and when doing so, preform the find and replace.实际上,您将复制字符串,并在执行此操作时执行查找和替换。

You'll rely on these functions for picking the index of some buffer to read/write.您将依赖这些函数来选择要读/写的某些缓冲区的索引。 The goal is to preform the replace method such that when nothing has to change you write junk instead of branching.目标是执行替换方法,以便在没有任何更改时编写垃圾而不是分支。

You should be able to complete this without a single if statement and remember to use unsafe code.您应该能够在没有单个 if 语句的情况下完成此操作,并记住使用不安全的代码。 Otherwise you'll be paying for index checking for every element access.否则,您将为每个元素访问的索引检查付费。

unsafe
{
    fixed( char * p = myStringBuffer )
    {
        // Do fancy string manipulation here
    }
}

I've written code like this in C# for fun and seen significant performance improvements, almost 300% speed up for find and replace.为了好玩,我在 C# 中编写了这样的代码,并看到了显着的性能改进,查找和替换的速度几乎提高了 300%。 While the .NET BCL (base class library) performs quite well it is riddled with branching constructs and exception handling this will slow down you code if you use the built-in stuff.虽然 .NET BCL(基类库)性能很好,但它充满了分支结构和异常处理,如果您使用内置的东西,这会减慢您的代码速度。 Also these optimizations while perfectly sound are not preformed by the JIT-compiler and you'll have to run the code as a release build without any debugger attached to be able to observe the massive performance gain.此外,这些优化虽然完美无缺,但不是由 JIT 编译器执行的,您必须将代码作为发布版本运行,而无需附加任何调试器才能观察到巨大的性能提升。

I could provide you with more complete code but it is a substantial amount of work.我可以为您提供更完整的代码,但这是大量的工作。 However, I can guarantee you that it will be faster than anything else suggested so far.但是,我可以向您保证,它会比迄今为止建议的任何其他方法都快。

1. What is the fastest way of replacing these values, ignoring memory concerns? 1. 忽略内存问题,替换这些值的最快方法是什么?

The fastest way is to build a custom component that's specific to your use case.最快的方法是构建一个特定于您的用例的自定义组件。 As of .NET 4.6, There's no class in the BCL designed for multiple string replacements.从 .NET 4.6 开始,BCL 中没有为多个字符串替换设计的类。

If you NEED something fast out of the BCL, StringBuilder is the fastest BCL component for simple string replacement.如果您需要从 BCL 中快速获得一些东西,StringBuilder 是用于简单字符串替换的最快的 BCL 组件。 The source code can be found here : It's pretty efficient for replacing a single string.源代码可以在这里找到:替换单个字符串非常有效。 Only use Regex if you really need the pattern-matching power of regular expressions.仅当您确实需要正则表达式的模式匹配功能时才使用正则表达式。 It's slower and a little more cumbersome, even when compiled.即使在编译时,它也更慢且更麻烦。

2. What is the most memory efficient way of achieving the same result? 2. 达到相同结果的最节省内存的方法是什么?

The most memory-efficient way is to perform a filtered stream copy from the source to the destination (explained below).最节省内存的方法是执行从源到目标的过滤流复制(如下所述)。 Memory consumption will be limited to your buffer, however this will be more CPU intensive;内存消耗将仅限于您的缓冲区,但这将更加占用 CPU; as a rule of thumb, you're going to trade CPU performance for memory consumption.根据经验,您将用 CPU 性能换取内存消耗。

Technical Details技术细节

String replacements are tricky.字符串替换很棘手。 Even when performing a string replacement in a mutable memory space (such as with StringBuilder ), it's expensive.即使在可变内存空间中执行字符串替换(例如使用StringBuilder ),它也是昂贵的。 If the replacement string is a different length than original string, you're going to be relocating every character following the replacement string to keep the whole string contiguous.如果替换字符串的长度与原始字符串的长度不同,您将重新定位替换字符串后面的每个字符以保持整个字符串连续。 This results in a LOT of memory writes, and even in the case of StringBuilder , causes you to rewrite most of the string in-memory on every call to Replace.这会导致大量内存写入,甚至在StringBuilder的情况下,也会导致您在每次调用 Replace 时重写内存中的大部分字符串。

So what is the fastest way to do string replacements?那么进行字符串替换的最快方法是什么? Write the new string using a single-pass: Don't let your code go back and have to re-write anything.使用单次通过编写新字符串:不要让您的代码返回并且必须重新编写任何内容。 Writes are more expensive than reads.写入比读取更昂贵。 You're going to have to code this yourself for best results.您将不得不自己编写代码以获得最佳结果。

High-Memory Solution高内存解决方案

The class I've written generates strings based on templates.我编写的类基于模板生成字符串。 I place tokens ($ReplaceMe$) in a template which marks places where I want to insert a string later.我将令牌 ($ReplaceMe$) 放在一个模板中,该模板标记了我稍后要插入字符串的位置。 I use it in cases where XmlWriter is too onerous for XML that's largely static and repetitive, and I need to produce large XML (or JSON) data streams.我在 XmlWriter 对于主要是静态和重复的 XML 来说过于繁重的情况下使用它,并且我需要生成大型 XML(或 JSON)数据流。

The class works by slicing the template up into parts and places each part into a numbered dictionary.该类的工作原理是将模板分成几部分并将每个部分放入一个编号的字典中。 Parameters are also enumerated.参数也被枚举。 The order in which the parts and parameters are inserted into a new string are placed into an integer array.部件和参数插入新字符串的顺序被放置到一个整数数组中。 When a new string is generated, the parts and parameters are picked from the dictionary and used to create a new string.生成新字符串时,将从字典中选取部分和参数并用于创建新字符串。

It's neither fully-optimized nor is it bulletproof, but it works great for generating very large data streams from templates.它既不是完全优化的,也不是防弹的,但它非常适合从模板生成非常大的数据流。

Low-Memory Solution低内存解决方案

You'll need to read small chunks from the source string into a buffer, search the buffer using an optimized search algorithm, and then write the new string to the destination stream / string.您需要将源字符串中的小块读入缓冲区,使用优化的搜索算法搜索缓冲区,然后将新字符串写入目标流/字符串。 There are a lot of potential caveats here, but it would be memory efficient and a better solution for source data that's dynamic and can't be cached, such as whole-page translations or source data that's too large to reasonably cache.这里有很多潜在的警告,但对于动态且无法缓存的源数据,例如整页翻译或太大而无法合理缓存的源数据,它会是内存效率和更好的解决方案。 I don't have a sample solution for this handy.我没有这个方便的示例解决方案。

Sample Code示例代码

Desired Results预期结果

<DataTable source='Users'>
  <Rows>
    <Row id='25' name='Administrator' />
    <Row id='29' name='Robert' />
    <Row id='55' name='Amanda' />
  </Rows>
</DataTable>

Template模板

<DataTable source='$TableName$'>
  <Rows>
    <Row id='$0$' name='$1$'/>
  </Rows>
</DataTable>

Test Case测试用例

class Program
{
  static string[,] _users =
  {
    { "25", "Administrator" },
    { "29", "Robert" },
    { "55", "Amanda" },
  };

  static StringTemplate _documentTemplate = new StringTemplate(@"<DataTable source='$TableName$'><Rows>$Rows$</Rows></DataTable>");
  static StringTemplate _rowTemplate = new StringTemplate(@"<Row id='$0$' name='$1$' />");
  static void Main(string[] args)
  {
    _documentTemplate.SetParameter("TableName", "Users");
    _documentTemplate.SetParameter("Rows", GenerateRows);

    Console.WriteLine(_documentTemplate.GenerateString(4096));
    Console.ReadLine();
  }

  private static void GenerateRows(StreamWriter writer)
  {
    for (int i = 0; i <= _users.GetUpperBound(0); i++)
      _rowTemplate.GenerateString(writer, _users[i, 0], _users[i, 1]);
  }
}

StringTemplate Source字符串模板源

public class StringTemplate
{
  private string _template;
  private string[] _parts;
  private int[] _tokens;
  private string[] _parameters;
  private Dictionary<string, int> _parameterIndices;
  private string[] _replaceGraph;
  private Action<StreamWriter>[] _callbackGraph;
  private bool[] _graphTypeIsReplace;

  public string[] Parameters
  {
    get { return _parameters; }
  }

  public StringTemplate(string template)
  {
    _template = template;
    Prepare();
  }

  public void SetParameter(string name, string replacement)
  {
    int index = _parameterIndices[name] + _parts.Length;
    _replaceGraph[index] = replacement;
    _graphTypeIsReplace[index] = true;
  }

  public void SetParameter(string name, Action<StreamWriter> callback)
  {
    int index = _parameterIndices[name] + _parts.Length;
    _callbackGraph[index] = callback;
    _graphTypeIsReplace[index] = false;
  }

  private static Regex _parser = new Regex(@"\$(\w{1,64})\$", RegexOptions.Compiled);
  private void Prepare()
  {
    _parameterIndices = new Dictionary<string, int>(64);
    List<string> parts = new List<string>(64);
    List<object> tokens = new List<object>(64);
    int param_index = 0;
    int part_start = 0;

    foreach (Match match in _parser.Matches(_template))
    {
      if (match.Index > part_start)
      {
        //Add Part
        tokens.Add(parts.Count);
        parts.Add(_template.Substring(part_start, match.Index - part_start));
      }


      //Add Parameter
      var param = _template.Substring(match.Index + 1, match.Length - 2);
      if (!_parameterIndices.TryGetValue(param, out param_index))
        _parameterIndices[param] = param_index = _parameterIndices.Count;
      tokens.Add(param);

      part_start = match.Index + match.Length;
    }

    //Add last part, if it exists.
    if (part_start < _template.Length)
    {
      tokens.Add(parts.Count);
      parts.Add(_template.Substring(part_start, _template.Length - part_start));
    }

    //Set State
    _parts = parts.ToArray();
    _tokens = new int[tokens.Count];

    int index = 0;
    foreach (var token in tokens)
    {
      var parameter = token as string;
      if (parameter == null)
        _tokens[index++] = (int)token;
      else
        _tokens[index++] = _parameterIndices[parameter] + _parts.Length;
    }

    _parameters = _parameterIndices.Keys.ToArray();
    int graphlen = _parts.Length + _parameters.Length;
    _callbackGraph = new Action<StreamWriter>[graphlen];
    _replaceGraph = new string[graphlen];
    _graphTypeIsReplace = new bool[graphlen];

    for (int i = 0; i < _parts.Length; i++)
    {
      _graphTypeIsReplace[i] = true;
      _replaceGraph[i] = _parts[i];
    }
  }

  public void GenerateString(Stream output)
  {
    var writer = new StreamWriter(output);
    GenerateString(writer);
    writer.Flush();
  }

  public void GenerateString(StreamWriter writer)
  {
    //Resolve graph
    foreach(var token in _tokens)
    {
      if (_graphTypeIsReplace[token])
        writer.Write(_replaceGraph[token]);
      else
        _callbackGraph[token](writer);
    }
  }

  public void SetReplacements(params string[] parameters)
  {
    int index;
    for (int i = 0; i < _parameters.Length; i++)
    {
      if (!Int32.TryParse(_parameters[i], out index))
        continue;
      else
        SetParameter(index.ToString(), parameters[i]);
    }
  }

  public string GenerateString(int bufferSize = 1024)
  {
    using (var ms = new MemoryStream(bufferSize))
    {
      GenerateString(ms);
      ms.Position = 0;
      using (var reader = new StreamReader(ms))
        return reader.ReadToEnd();
    }
  }

  public string GenerateString(params string[] parameters)
  {
    SetReplacements(parameters);
    return GenerateString();
  }

  public void GenerateString(StreamWriter writer, params string[] parameters)
  {
    SetReplacements(parameters);
    GenerateString(writer);
  }
}

StringBuilder: http://msdn.microsoft.com/en-us/library/2839d5h5.aspx StringBuilder: http : //msdn.microsoft.com/en-us/library/2839d5h5.aspx

The performance of the Replace operation itself should be roughly same as string.Replace and according to Microsoft no garbage should be produced. Replace 操作本身的性能应该与 string.Replace 大致相同,并且根据 Microsoft 的说法,不应产生垃圾。

Here's a quick benchmark...这是一个快速基准...

        Stopwatch s = new Stopwatch();
        s.Start();
        string replace = source;
        replace = replace.Replace("$TS$", tsValue);
        replace = replace.Replace("$DOC$", docValue);
        s.Stop();

        Console.WriteLine("String.Replace:\t\t" + s.ElapsedMilliseconds);

        s.Reset();

        s.Start();
        StringBuilder sb = new StringBuilder(source);
        sb = sb.Replace("$TS$", tsValue);
        sb = sb.Replace("$DOC$", docValue);
        string output = sb.ToString();
        s.Stop();

        Console.WriteLine("StringBuilder.Replace:\t\t" + s.ElapsedMilliseconds);

I didn't see much difference on my machine (string.replace was 85ms and stringbuilder.replace was 80), and that was against about 8MB of text in "source"...我在我的机器上没有看到太大的区别(string.replace 是 85 毫秒,stringbuilder.replace 是 80),这与“源”中大约 8MB 的文本相对应......

StringBuilder sb = new StringBuilder("Hello string");
sb.Replace("string", String.Empty);
Console.WriteLine(sb);  

StringBuilder , a mutable string. StringBuilder ,一个可变字符串。

Here is my benchmark :这是我的基准

using System;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

internal static class MeasureTime
{
    internal static TimeSpan Run(Action func, uint count = 1)
    {
        if (count <= 0)
        {
            throw new ArgumentOutOfRangeException("count", "Must be greater than zero");
        }

        long[] arr_time = new long[count];
        Stopwatch sw = new Stopwatch();
        for (uint i = 0; i < count; i++)
        {
            sw.Start();
            func();
            sw.Stop();
            arr_time[i] = sw.ElapsedTicks;
            sw.Reset();
        }

        return new TimeSpan(count == 1 ? arr_time.Sum() : Convert.ToInt64(Math.Round(arr_time.Sum() / (double)count)));
    }
}

public class Program
{
    public static string RandomString(int length)
    {
        Random random = new Random();
        const string chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
        return new String(Enumerable.Range(1, length).Select(_ => chars[random.Next(chars.Length)]).ToArray());
    }

    public static void Main()
    {
        string rnd_str = RandomString(500000);
        Regex regex = new Regex("a|c|e|g|i|k", RegexOptions.Compiled);
        TimeSpan ts1 = MeasureTime.Run(() => regex.Replace(rnd_str, "!!!"), 10);
        Console.WriteLine("Regex time: {0:hh\\:mm\\:ss\\:fff}", ts1);
        
        StringBuilder sb_str = new StringBuilder(rnd_str);
        TimeSpan ts2 = MeasureTime.Run(() => sb_str.Replace("a", "").Replace("c", "").Replace("e", "").Replace("g", "").Replace("i", "").Replace("k", ""), 10);
        Console.WriteLine("StringBuilder time: {0:hh\\:mm\\:ss\\:fff}", ts2);
        
        TimeSpan ts3 = MeasureTime.Run(() => rnd_str.Replace("a", "").Replace("c", "").Replace("e", "").Replace("g", "").Replace("i", "").Replace("k", ""), 10);
        Console.WriteLine("String time: {0:hh\\:mm\\:ss\\:fff}", ts3);

        char[] ch_arr = {'a', 'c', 'e', 'g', 'i', 'k'};
        TimeSpan ts4 = MeasureTime.Run(() => new String((from c in rnd_str where !ch_arr.Contains(c) select c).ToArray()), 10);
        Console.WriteLine("LINQ time: {0:hh\\:mm\\:ss\\:fff}", ts4);
    }

}

Regex time: 00:00:00:008正则表达式时间:00:00:00:008

StringBuilder time: 00:00:00:015 StringBuilder 时间:00:00:00:015

String time: 00:00:00:005字符串时间:00:00:00:005

LINQ can't process rnd_str (Fatal Error: Memory usage limit was exceeded) LINQ 无法处理 rnd_str(致命错误:超出内存使用限制)

String.Replace is fastest String.Replace 是最快的

I've ended up in this thread a couple of times now and I haven't been totally convinced after reading the previous answers since some of the benchmarks was done with a StopWatch which might give some kind of indication but does not feel great.我现在已经在这个线程中结束了几次,并且在阅读了之前的答案后我并没有完全相信,因为一些基准测试是用秒表完成的,秒表可能会给出某种指示,但感觉不太好。

My use case is that I have a string that might be quite big ie a HTML output from a website.我的用例是我有一个可能很大的字符串,即来自网站的 HTML 输出。 I need to replace a number of placeholders inside this string (about 10, max 20) with values.我需要用值替换此字符串中的一些占位符(大约 10 个,最多 20 个)。

I created a Benchmark.NET-test to get some robust data around this and here is what I found:我创建了一个 Benchmark.NET 测试来获得一些可靠的数据,这是我发现的:

TLDR:域名注册地址:

  • Don't use String.Replace if performance/memory is a concern.如果性能/内存是一个问题,请不要使用 String.Replace
  • Regex.Replace is the fastest but uses slightly more memory than StringBuilder.Replace. Regex.Replace 是最快的,但比 StringBuilder.Replace 使用的内存稍多。 Compiled-regex is fastest if you intend to reuse the same pattern, for low number of usage a non-compile Regex-instance is cheaper to create.如果您打算重用相同的模式,Compiled-regex 是最快的,如果使用次数较少,则创建非编译 Regex 实例的成本更低。
  • If you only care about memory consumption and are fine with slower execution, use StringBuilder.Replace如果您只关心内存消耗并且执行速度较慢,请使用 StringBuilder.Replace

Results from the test:测试结果:

|                Method | ItemsToReplace |       Mean |     Error |    StdDev |   Gen 0 |  Gen 1 | Gen 2 | Allocated |
|---------------------- |--------------- |-----------:|----------:|----------:|--------:|-------:|------:|----------:|
|         StringReplace |              3 |  21.493 us | 0.1182 us | 0.1105 us |  3.6926 | 0.0305 |     - |  18.96 KB |
|  StringBuilderReplace |              3 |  35.383 us | 0.1341 us | 0.1119 us |  2.5024 |      - |     - |  13.03 KB |
|          RegExReplace |              3 |  19.620 us | 0.1252 us | 0.1045 us |  3.4485 | 0.0305 |     - |  17.75 KB |
| RegExReplace_Compiled |              3 |   4.573 us | 0.0318 us | 0.0282 us |  2.7084 | 0.0610 |     - |  13.91 KB |
|         StringReplace |             10 |  74.273 us | 0.7900 us | 0.7390 us | 12.2070 | 0.1221 |     - |  62.75 KB |
|  StringBuilderReplace |             10 | 115.322 us | 0.5820 us | 0.5444 us |  2.6855 |      - |     - |  13.84 KB |
|          RegExReplace |             10 |  24.121 us | 0.1130 us | 0.1002 us |  4.4250 | 0.0916 |     - |  22.75 KB |
| RegExReplace_Compiled |             10 |   8.601 us | 0.0298 us | 0.0279 us |  3.6774 | 0.1221 |     - |  18.92 KB |
|         StringReplace |             20 | 150.193 us | 1.4508 us | 1.3571 us | 24.6582 | 0.2441 |     - | 126.89 KB |
|  StringBuilderReplace |             20 | 233.984 us | 1.1707 us | 1.0951 us |  2.9297 |      - |     - |   15.3 KB |
|          RegExReplace |             20 |  28.699 us | 0.1179 us | 0.1045 us |  4.8218 | 0.0916 |     - |  24.79 KB |
| RegExReplace_Compiled |             20 |  12.672 us | 0.0599 us | 0.0560 us |  4.0894 | 0.1221 |     - |  20.95 KB |

So my conclusions is:所以我的结论是:

  • Regex.Replace is the way to go for fast execution and reasonable memory usage. Regex.Replace 是实现快速执行和合理内存使用的方法。 Use a Compiled shared instance to speed up.使用 Compiled 共享实例来加速。
  • StringBuilder has the lowest memory footprint but is a lot slower than Regex.Replace. StringBuilder 具有最低的内存占用,但比 Regex.Replace 慢很多。 I would only use it if memory is the only thing that matters.如果内存是唯一重要的事情,我只会使用它。

The code for the benchmark looks like this:基准测试的代码如下所示:

[MemoryDiagnoser]
[HtmlExporter]
[PlainExporter]
[RPlotExporter]
public class String_Replace
{
    private Dictionary<string, string> _valuesToReplace = new Dictionary<string, string>()
    {
        {"foo","fooRep" },
        {"bar","barRep" },
        {"lorem","loremRep" },
        {"ipsum","ipsumRep" },
        {"x","xRep" },
        {"y","yRep" },
        {"z","zRep" },
        {"yada","yadaRep" },
        {"old","oldRep" },
        {"new","newRep" },

        {"enim","enimRep" },
        {"amet","ametRep" },
        {"sit","sitRep" },
        {"convallis","convallisRep" },
        {"vehicula","vehiculaRep" },
        {"suspendisse","suspendisseRep" },
        {"accumsan","accumsanRep" },
        {"suscipit","suscipitRep" },
        {"ligula","ligulaRep" },
        {"posuere","posuereRep" }
    };

    private Regex _regexCompiled;

    private string GetText_With_3_Tags()
    {
        return @"<html>
        <body>
        Lorem ipsum dolor sit [foo], consectetur [bar] elit. Proin nulla quam, faucibus a ligula quis, posuere commodo elit. Nunc at tincidunt elit. Sed ipsum ex, accumsan sed viverra sit amet, tincidunt id nibh. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia curae; Nam interdum ex eget blandit lacinia. Nullam a tortor id sapien fringilla pellentesque vel ac purus. Fusce placerat dapibus tortor id luctus. Aenean in lacinia neque. Fusce quis ultrices odio. Nam id leo neque.

Etiam erat lorem, tincidunt volutpat odio at, finibus pharetra felis. Sed magna enim, accumsan at convallis a, aliquet eu quam. Vestibulum faucibus tincidunt ipsum et lacinia. Sed cursus ut arcu a commodo. Integer euismod eros at efficitur sollicitudin. In quis magna non orci sollicitudin condimentum. Fusce sed lacinia lorem, nec varius erat. In quis odio viverra, pharetra ex ac, hendrerit ante. Mauris congue enim et tellus sollicitudin pulvinar non sit amet tortor. Suspendisse at ex pharetra, semper diam ut, molestie velit. Cras lacinia urna neque, sit amet laoreet ex venenatis nec. Mauris at leo massa.

Aliquam mollis ultrices mi, sit amet venenatis enim rhoncus nec. Integer sit amet lectus tempor, finibus nisl quis, sodales ante. Curabitur suscipit dolor a dignissim consequat. Nulla eget vestibulum massa. Nam fermentum congue velit a placerat. Vivamus bibendum ex velit, id auctor ipsum bibendum eu. Praesent id gravida dui. Curabitur sollicitudin lobortis purus ac tempor. Sed felis enim, ornare et est egestas, blandit tincidunt lacus. Ut commodo dignissim augue, eget bibendum augue facilisis non.

Ut tortor neque, dignissim sit amet [lorem] ut, facilisis sit amet quam. Nullam et leo ut est congue vehicula et accumsan dolor. Aliquam erat dolor, eleifend a ipsum et, maximus suscipit ipsum. Nunc nec diam ex. Praesent suscipit aliquet condimentum. Nulla sodales lobortis fermentum. Maecenas ut laoreet sem. Ut id pulvinar urna, vel gravida lacus. Integer nunc urna, euismod eget vulputate sit amet, pharetra nec velit. Donec vel elit ac dolor varius faucibus tempus sed tortor. Donec metus diam, condimentum sit amet odio at, cursus cursus risus. Interdum et malesuada fames ac ante ipsum primis in faucibus. Nullam maximus tellus id quam consequat vestibulum. Curabitur rutrum eros tellus, eget commodo mauris sollicitudin a. In dignissim non est at pretium. Nunc bibendum pharetra dui ac ullamcorper.

Sed rutrum vehicula pretium. Morbi eu felis ante. Aliquam vel mauris at felis tempus dictum ac a justo. Suspendisse ultricies nisi turpis, non sagittis magna porttitor venenatis. Aliquam ac risus quis leo semper viverra non ac nunc. Phasellus lacinia augue sed libero elementum, at interdum nunc posuere. Duis lacinia rhoncus urna eget scelerisque. Morbi ullamcorper tempus bibendum. Proin at est eget nibh dignissim bibendum. Fusce imperdiet ut urna nec mattis. Aliquam massa mauris, consequat tristique magna non, sodales tempus massa. Ut lobortis risus rhoncus, molestie mi vitae, accumsan enim. Quisque dapibus libero elementum lectus dignissim, non finibus lacus lacinia.
        </p><p>
        </body>
        </html>";
    }

    
    private string GetText_With_10_Tags()
    {
          return @"<html>
        <body>
        Lorem ipsum dolor sit [foo], consectetur [bar] elit. Proin nulla quam, faucibus a ligula quis, posuere commodo elit. Nunc at tincidunt elit. Sed ipsum ex, accumsan sed viverra sit amet, tincidunt id nibh. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia curae; Nam interdum ex eget blandit lacinia. Nullam a tortor id sapien fringilla pellentesque vel ac purus. Fusce placerat dapibus tortor id luctus. Aenean in lacinia neque. Fusce quis ultrices odio. Nam id leo neque.

Etiam erat lorem, tincidunt volutpat odio at, finibus pharetra felis. Sed magna enim, [z] at convallis a, aliquet eu quam. Vestibulum faucibus tincidunt ipsum et lacinia. Sed cursus ut arcu a commodo. Integer euismod eros at efficitur sollicitudin. In quis magna non orci sollicitudin condimentum. Fusce sed lacinia lorem, nec varius erat. In quis odio viverra, pharetra ex ac, hendrerit ante. Mauris congue enim et tellus sollicitudin pulvinar non sit amet tortor. Suspendisse at ex pharetra, semper diam ut, molestie velit. Cras lacinia urna neque, sit amet laoreet ex venenatis nec. Mauris at leo massa.

Aliquam mollis ultrices mi, sit amet venenatis enim rhoncus nec. Integer sit amet [y] tempor, finibus nisl quis, sodales ante. Curabitur suscipit dolor a dignissim consequat. Nulla eget vestibulum massa. Nam fermentum congue velit a placerat. Vivamus bibendum ex velit, id auctor ipsum bibendum eu. Praesent id gravida dui. Curabitur sollicitudin lobortis purus ac tempor. Sed felis enim, ornare et est egestas, blandit tincidunt lacus. Ut commodo dignissim augue, eget bibendum augue facilisis non.

Ut tortor neque, dignissim sit amet [lorem] ut, [ipsum] sit amet quam. [x] et leo ut est congue [new] et accumsan dolor. Aliquam erat dolor, eleifend a ipsum et, maximus suscipit ipsum. Nunc nec diam ex. Praesent suscipit aliquet condimentum. Nulla sodales lobortis fermentum. Maecenas ut laoreet sem. Ut id pulvinar urna, vel gravida lacus. Integer nunc urna, euismod eget vulputate sit amet, pharetra nec velit. Donec vel elit ac dolor varius faucibus tempus sed tortor. Donec metus diam, condimentum sit amet odio at, cursus cursus risus. Interdum et malesuada fames ac ante ipsum primis in faucibus. Nullam maximus tellus id quam consequat vestibulum. Curabitur rutrum eros tellus, eget commodo mauris sollicitudin a. In dignissim non est at pretium. Nunc bibendum pharetra dui ac ullamcorper.

Sed rutrum vehicula pretium. Morbi eu felis ante. Aliquam vel [old] at felis [yada] dictum ac a justo. Suspendisse ultricies nisi turpis, non sagittis magna porttitor venenatis. Aliquam ac risus quis leo semper viverra non ac nunc. Phasellus lacinia augue sed libero elementum, at interdum nunc posuere. Duis lacinia rhoncus urna eget scelerisque. Morbi ullamcorper tempus bibendum. Proin at est eget nibh dignissim bibendum. Fusce imperdiet ut urna nec mattis. Aliquam massa mauris, consequat tristique magna non, sodales tempus massa. Ut lobortis risus rhoncus, molestie mi vitae, accumsan enim. Quisque dapibus libero elementum lectus dignissim, non finibus lacus lacinia.
        </p><p>
        </body>
        </html>";
    }

    private string GetText_With_20_Tags()
    {
           return @"<html>
        <body>
        Lorem ipsum dolor sit [foo], consectetur [bar] elit. Proin nulla [convallis], faucibus a [vehicula] quis, posuere commodo elit. Nunc at tincidunt elit. Sed ipsum ex, accumsan sed viverra sit amet, tincidunt id nibh. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia curae; Nam interdum ex eget blandit lacinia. Nullam a tortor id sapien fringilla pellentesque vel ac purus. Fusce placerat dapibus tortor id luctus. Aenean in lacinia neque. Fusce quis ultrices odio. Nam id leo neque.

Etiam erat lorem, tincidunt [posuere] odio at, finibus pharetra felis. Sed magna enim, [z] at convallis a, [enim] eu quam. Vestibulum faucibus tincidunt ipsum et lacinia. Sed cursus ut arcu a commodo. Integer euismod eros at efficitur sollicitudin. In quis magna non orci sollicitudin condimentum. Fusce sed lacinia lorem, nec varius erat. In quis odio viverra, pharetra ex ac, hendrerit ante. Mauris congue enim et tellus sollicitudin pulvinar non sit amet tortor. Suspendisse at ex pharetra, semper diam ut, molestie velit. Cras lacinia urna neque, sit amet laoreet ex venenatis nec. Mauris at leo massa.

[suspendisse] mollis [amet] mi, sit amet venenatis enim rhoncus nec. Integer sit amet [y] tempor, finibus nisl quis, sodales ante. Curabitur suscipit dolor a dignissim consequat. Nulla eget vestibulum massa. Nam fermentum congue velit a placerat. Vivamus bibendum ex velit, id auctor ipsum bibendum eu. Praesent id gravida dui. Curabitur sollicitudin lobortis purus ac tempor. Sed felis enim, ornare et est egestas, blandit tincidunt lacus. Ut commodo dignissim augue, eget bibendum augue facilisis non.

Ut tortor neque, dignissim sit amet [lorem] ut, [ipsum] sit amet quam. [x] et leo ut est congue [new] et accumsan [ligula]. Aliquam erat dolor, eleifend a ipsum et, maximus suscipit ipsum. Nunc nec diam ex. Praesent suscipit aliquet condimentum. Nulla sodales lobortis fermentum. Maecenas ut laoreet sem. Ut id pulvinar urna, vel gravida lacus. Integer nunc urna, euismod eget vulputate sit amet, pharetra nec velit. Donec vel elit ac dolor varius faucibus tempus sed tortor. Donec metus diam, condimentum sit amet odio at, cursus cursus risus. Interdum et malesuada fames ac ante ipsum primis in faucibus. Nullam maximus tellus id quam consequat vestibulum. Curabitur rutrum eros tellus, eget commodo mauris sollicitudin a. In dignissim non est at pretium. Nunc bibendum pharetra dui ac ullamcorper.

Sed rutrum vehicula [accumsan]. Morbi eu [suscipit] [sit]. Aliquam vel [old] at felis [yada] dictum ac a justo. Suspendisse ultricies nisi turpis, non sagittis magna porttitor venenatis. Aliquam ac risus quis leo semper viverra non ac nunc. Phasellus lacinia augue sed libero elementum, at interdum nunc posuere. Duis lacinia rhoncus urna eget scelerisque. Morbi ullamcorper tempus bibendum. Proin at est eget nibh dignissim bibendum. Fusce imperdiet ut urna nec mattis. Aliquam massa mauris, consequat tristique magna non, sodales tempus massa. Ut lobortis risus rhoncus, molestie mi vitae, accumsan enim. Quisque dapibus libero elementum lectus dignissim, non finibus lacus lacinia.
        </p><p>
        </body>
        </html>";
    }

    private string GetText(int numberOfReplace)
    {
        if (numberOfReplace == 3)
            return GetText_With_3_Tags();
        if (numberOfReplace == 10)
            return GetText_With_10_Tags();
        if (numberOfReplace == 20)
            return GetText_With_20_Tags();

        return "";
    }

    public String_Replace()
    {
        _regexCompiled = new Regex(@"\[([^]]*)\]",RegexOptions.Compiled);
    }

    [Params(3,10,20)]
    public int ItemsToReplace { get; set; }

    [Benchmark]
    public void StringReplace()
    {
        var str = GetText(ItemsToReplace);
        foreach (var rep  in _valuesToReplace.Take(ItemsToReplace))
        {
            str = str.Replace("[" + rep.Key + "]", rep.Value);
        }
    }

    [Benchmark]
    public void StringBuilderReplace()
    {
        var sb = new StringBuilder(GetText(ItemsToReplace));
        foreach (var rep  in _valuesToReplace.Take(ItemsToReplace))
        {
            sb.Replace("[" + rep.Key + "]", rep.Value);
        }
        var res = sb.ToString();
    }

    [Benchmark]
    public void RegExReplace()
    {
        var str = GetText(ItemsToReplace);
        Regex regex = new Regex(@"\[([^]]*)\]");
        
        str = regex.Replace(str, Replace);
        var res = str;
    }

    

    [Benchmark]
    public void RegExReplace_Compiled()
    {
        var str = GetText(ItemsToReplace);

        str = _regexCompiled.Replace(str, Replace);
        var res = str;
    }

    private string Replace(Match match)
    {
        if(match.Groups.Count > 0)
        { 
            string collectionKey = match.Groups[1].Value;

            return _valuesToReplace[collectionKey];
        }
        return string.Empty;
    }

}

if you want a built in class in dotnet i think StringBuilder is the best.如果你想在 dotnet 中创建一个内置类,我认为 StringBuilder 是最好的。 to make it manully you can use unsafe code with char* and iterate through your string and replace based on your criteria为了手动操作,您可以使用带有 char* 的不安全代码并遍历您的字符串并根据您的标准进行替换

由于您对一个字符串进行了多次替换,因此我建议您在 StringBuilder 上使用 RegEx。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM