在C＃中转义字符串中字符的最快方法

Question

I'd like to find the fastest way to replace all reserved characters in a string with their escaped version. 我想找到一种最快的方法，用转义的版本替换字符串中的所有保留字符。

There are two naive ways that come to my mind spontaneously (note that the set of reserved characters is just an example): 我自然想到两种天真的方法（请注意，保留字符集只是一个例子）：

A: Using a lookup dictionary and String.Replace 答：使用查找字典和String.Replace

private Dictionary<string, string> _someEscapeTokens = new Dictionary<string, string>()
{
    {"\t", @"\t"},
    {"\n", @"\n"},
    {"\r", @"\r"}
};

public string GetEscapedStringByNaiveLookUp(string s)
{
    foreach (KeyValuePair<string, string> escapeToken in _someEscapeTokens.Where(kvp => s.Contains(kvp.Key)))
    {
        s = s.Replace(escapeToken.Key, escapeToken.Value);
    }
    return s;
}

B: Traversing each character in the string B：遍历字符串中的每个字符

public string GetEscapedStringByTraversingCharArray(string s)
{
    char[] chars = s.ToCharArray();
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < chars.Length; i++)
    {
        switch (chars[i])
        {
            case '\t':
                sb.Append(@"\t"); break;
            case '\n':
                sb.Append(@"\n"); break;
            case '\r':
                sb.Append(@"\r"); break;
            default:
                sb.Append(chars[i]); break;
         }
    }
    return sb.ToString();
}

As I've already tested, Version B outperforms the first one easily. 正如我已经测试过的，版本B轻松胜过第一个。

Note: I already considered Regex.Escape but since the character set doesn't match mine it doesn't fit. 注意：我已经考虑过Regex.Escape，但是由于字符集与我的字符集不匹配，因此不合适。

However are there other ways you would appraoch this problem (with performance in mind)? 但是，还有其他方法可以解决此问题（考虑性能）？

Update: Testing 更新：测试

I've done some more testing and would like to share the results. 我做了更多测试，并希望分享结果。 See below for the code. 参见下面的代码。

Testing was done on two different systems targeting the .Net Framework 4.0 . 在针对.Net Framework 4.0的 两个不同系统上进行了测试。 Anyway the results are pretty much the same: 无论如何，结果几乎是相同的：

Char Array (short string) average time: 38600 ns
Foreach (short string) average time: 26680 ns
Char Array (long string) average time: 48,1 ms
Foreach (long string) average time: 64,2 ms
Char Array (escaping only) average time: 13,6 ms
Foreach (escaping only) average time: 17,3 ms

Which leads me to the conclusion that the foreach version seems to be slightly faster for short strings but somehow "falls of" for longer strings. 这使我得出以下结论：对于短字符串， foreach版本似乎稍快一些，但对于较长的字符串则以某种方式“掉线”。 However we're talking about really small differences here. 但是，我们在这里谈论的是很小的差异。

Testing code: 测试代码：

private static void Main(string[] args)
{
    //around 700 characters
    string shortString = new StackTrace().ToString();
    string longString;
    string pureEscape;
    //loading from a file with 1000000 words http://loremipsum.de/
    using (StreamReader sr = new StreamReader(@"C:\users\ekrueger\desktop\LoremIpsum.txt"))
    {
        longString = sr.ReadToEnd();
    }
    //text file containing only escapable characters (length ~1000000)
    using (StreamReader sr = new StreamReader(@"C:\users\ekrueger\desktop\PureEscape.txt"))
    {
        pureEscape = sr.ReadToEnd();
    }
    List<double> timesCharArrayShortString = new List<double>();
    List<double> timesForeachShortString = new List<double>();
    List<long> timesCharArrayLongString = new List<long>();
    List<long> timesForeachLongString = new List<long>();
    List<long> timesCharArrayPureEscape = new List<long>();
    List<long> timesForeachPureEscape = new List<long>();
    Stopwatch sw = new Stopwatch();

    for (int i = 0; i < 10; i++)
    {
        sw.Restart();
        GetEscapedStringByTraversingCharArray(shortString);
        sw.Stop();
        timesCharArrayShortString.Add(sw.Elapsed.TotalMilliseconds * 1000000);
    }

    for (int i = 0; i < 10; i++)
    {
        sw.Restart();
        GetEscapedStringForeach(shortString);
        sw.Stop();
        timesForeachShortString.Add(sw.Elapsed.TotalMilliseconds * 1000000);
    }

    for (int i = 0; i < 10; i++)
    {
        sw.Restart();
        GetEscapedStringByTraversingCharArray(longString);
        sw.Stop();
        timesCharArrayLongString.Add(sw.ElapsedMilliseconds);
    }

    for (int i = 0; i < 10; i++)
    {
        sw.Restart();
        GetEscapedStringForeach(longString);
        sw.Stop();
        timesForeachLongString.Add(sw.ElapsedMilliseconds);
    }

    for (int i = 0; i < 10; i++)
    {
        sw.Restart();
        GetEscapedStringByTraversingCharArray(pureEscape);
        sw.Stop();
        timesCharArrayPureEscape.Add(sw.ElapsedMilliseconds);
    }

    for (int i = 0; i < 10; i++)
    {
        sw.Restart();
        GetEscapedStringForeach(pureEscape);
        sw.Stop();
        timesForeachPureEscape.Add(sw.ElapsedMilliseconds);
    }

    Console.WriteLine("Char Array (short string) average time: {0} ns", timesCharArrayShortString.Average());
    Console.WriteLine("Foreach (short string) average time: {0} ns", timesForeachShortString.Average());
    Console.WriteLine("Char Array (long string) average time: {0} ms", timesCharArrayLongString.Average());
    Console.WriteLine("Foreach (long string) average time: {0} ms", timesForeachLongString.Average());
    Console.WriteLine("Char Array (escaping only) average time: {0} ms", timesCharArrayPureEscape.Average());
    Console.WriteLine("Foreach (escaping only) average time: {0} ms", timesForeachPureEscape.Average());

    Console.Read();
}

private static string GetEscapedStringByTraversingCharArray(string s)
{
    if (String.IsNullOrEmpty(s))
        return s;

    char[] chars = s.ToCharArray();
    StringBuilder sb = new StringBuilder(s.Length);
    for (int i = 0; i < chars.Length; i++)
    {
        switch (chars[i])
        {
            case '\t':
                sb.Append(@"\t"); break;
            case '\n':
                sb.Append(@"\n"); break;
            case '\r':
                sb.Append(@"\r"); break;
            case '\f':
                sb.Append(@"\f"); break;
            default:
                sb.Append(chars[i]); break;
        }
    }
    return sb.ToString();
}

public static string GetEscapedStringForeach(string s)
{
    if (String.IsNullOrEmpty(s))
        return s;

    StringBuilder sb = new StringBuilder(s.Length);
    foreach (Char ch in s)
    {
        switch (ch)
        {
            case '\t':
                sb.Append(@"\t"); break;
            case '\n':
                sb.Append(@"\n"); break;
            case '\r':
                sb.Append(@"\r"); break;
            default:
                sb.Append(ch); break;
        }
    }
    return sb.ToString();
}

Answer 1

It makes sense that the first option is slower since you're creating a lot of string objects with: 第一个选择的速度较慢是有道理的，因为您使用以下方法创建了许多字符串对象：

s = s.Replace(escapeToken.Key, escapeToken.Value);

In the second method, there's no need to create a char[] because string has an indexer too. 在第二种方法中，无需创建char []，因为string也具有索引器。 Perhaps the only thing you can do to improve performance is initialize the StringBuilder with a capacity, so it doesn't need to resize. 可能唯一可以提高性能的方法就是使用容量初始化StringBuilder，因此不需要调整大小。 You can still use the Dictionary in the second method. 您仍然可以在第二种方法中使用“词典”。

Answer 2

You have no need to convert string into Char[] , so the solution can be slightly improved: 您无需将string转换为Char[] ，因此可以稍微改善解决方案：

public string GetEscapedStringByTraversingCharArray(string s) {
  // do not forget about null...
  if (String.IsNullOrEmpty(s))
    return s;

  // we can be sure, that it requires at least s.Length symbols, let's allocate then
  // in case we want to trade memory for speed we can even put
  //   StringBuilder sb = new StringBuilder(2 * s.Length);
  // for the worst case when all symbols should be escaped
  StringBuilder sb = new StringBuilder(s.Length);

  foreach(Char ch in s) {
    switch (ch) {
      case '\t':
        sb.Append(@"\t"); 

        break;
      case '\n':
        sb.Append(@"\n"); 

        break;
      case '\r':
        sb.Append(@"\r"); 

        break;
      default:
        sb.Append(ch); 

        break;
    }
  }

  return sb.ToString();
}

在C＃中转义字符串中字符的最快方法

问题描述

2 个解决方案

解决方案1
2 2014-05-13 12:52:18

解决方案2
2 已采纳 2014-05-13 13:08:11

在C＃中转义字符串中字符的最快方法

问题描述

2 个解决方案

解决方案1 2 2014-05-13 12:52:18

解决方案2 2 已采纳 2014-05-13 13:08:11

解决方案1
2 2014-05-13 12:52:18

解决方案2
2 已采纳 2014-05-13 13:08:11