简体   繁体   English

在C#中转义字符串中字符的最快方法

[英]Fastest way to escape characters in a string in C#

I'd like to find the fastest way to replace all reserved characters in a string with their escaped version. 我想找到一种最快的方法,用转义的版本替换字符串中的所有保留字符。

There are two naive ways that come to my mind spontaneously (note that the set of reserved characters is just an example): 我自然想到两种天真的方法(请注意,保留字符集只是一个例子):

A: Using a lookup dictionary and String.Replace 答:使用查找字典和String.Replace

private Dictionary<string, string> _someEscapeTokens = new Dictionary<string, string>()
{
    {"\t", @"\t"},
    {"\n", @"\n"},
    {"\r", @"\r"}
};

public string GetEscapedStringByNaiveLookUp(string s)
{
    foreach (KeyValuePair<string, string> escapeToken in _someEscapeTokens.Where(kvp => s.Contains(kvp.Key)))
    {
        s = s.Replace(escapeToken.Key, escapeToken.Value);
    }
    return s;
}

B: Traversing each character in the string B:遍历字符串中的每个字符

public string GetEscapedStringByTraversingCharArray(string s)
{
    char[] chars = s.ToCharArray();
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < chars.Length; i++)
    {
        switch (chars[i])
        {
            case '\t':
                sb.Append(@"\t"); break;
            case '\n':
                sb.Append(@"\n"); break;
            case '\r':
                sb.Append(@"\r"); break;
            default:
                sb.Append(chars[i]); break;
         }
    }
    return sb.ToString();
}

As I've already tested, Version B outperforms the first one easily. 正如我已经测试过的,版本B轻松胜过第一个。

Note: I already considered Regex.Escape but since the character set doesn't match mine it doesn't fit. 注意:我已经考虑过Regex.Escape,但是由于字符集与我的字符集不匹配,因此不合适。

However are there other ways you would appraoch this problem (with performance in mind)? 但是,还有其他方法可以解决此问题(考虑性能)?

Update: Testing 更新:测试

I've done some more testing and would like to share the results. 我做了更多测试,并希望分享结果。 See below for the code. 参见下面的代码。

Testing was done on two different systems targeting the .Net Framework 4.0 . 在针对.Net Framework 4.0的 两个不同系统上进行了测试。 Anyway the results are pretty much the same: 无论如何,结果几乎是相同的:

Char Array (short string) average time: 38600 ns
Foreach (short string) average time: 26680 ns
Char Array (long string) average time: 48,1 ms
Foreach (long string) average time: 64,2 ms
Char Array (escaping only) average time: 13,6 ms
Foreach (escaping only) average time: 17,3 ms

Which leads me to the conclusion that the foreach version seems to be slightly faster for short strings but somehow "falls of" for longer strings. 这使我得出以下结论:对于短字符串, foreach版本似乎稍快一些,但对于较长的字符串则以某种方式“掉线”。 However we're talking about really small differences here. 但是,我们在这里谈论的是很小的差异。

Testing code: 测试代码:

private static void Main(string[] args)
{
    //around 700 characters
    string shortString = new StackTrace().ToString();
    string longString;
    string pureEscape;
    //loading from a file with 1000000 words http://loremipsum.de/
    using (StreamReader sr = new StreamReader(@"C:\users\ekrueger\desktop\LoremIpsum.txt"))
    {
        longString = sr.ReadToEnd();
    }
    //text file containing only escapable characters (length ~1000000)
    using (StreamReader sr = new StreamReader(@"C:\users\ekrueger\desktop\PureEscape.txt"))
    {
        pureEscape = sr.ReadToEnd();
    }
    List<double> timesCharArrayShortString = new List<double>();
    List<double> timesForeachShortString = new List<double>();
    List<long> timesCharArrayLongString = new List<long>();
    List<long> timesForeachLongString = new List<long>();
    List<long> timesCharArrayPureEscape = new List<long>();
    List<long> timesForeachPureEscape = new List<long>();
    Stopwatch sw = new Stopwatch();

    for (int i = 0; i < 10; i++)
    {
        sw.Restart();
        GetEscapedStringByTraversingCharArray(shortString);
        sw.Stop();
        timesCharArrayShortString.Add(sw.Elapsed.TotalMilliseconds * 1000000);
    }

    for (int i = 0; i < 10; i++)
    {
        sw.Restart();
        GetEscapedStringForeach(shortString);
        sw.Stop();
        timesForeachShortString.Add(sw.Elapsed.TotalMilliseconds * 1000000);
    }

    for (int i = 0; i < 10; i++)
    {
        sw.Restart();
        GetEscapedStringByTraversingCharArray(longString);
        sw.Stop();
        timesCharArrayLongString.Add(sw.ElapsedMilliseconds);
    }

    for (int i = 0; i < 10; i++)
    {
        sw.Restart();
        GetEscapedStringForeach(longString);
        sw.Stop();
        timesForeachLongString.Add(sw.ElapsedMilliseconds);
    }

    for (int i = 0; i < 10; i++)
    {
        sw.Restart();
        GetEscapedStringByTraversingCharArray(pureEscape);
        sw.Stop();
        timesCharArrayPureEscape.Add(sw.ElapsedMilliseconds);
    }

    for (int i = 0; i < 10; i++)
    {
        sw.Restart();
        GetEscapedStringForeach(pureEscape);
        sw.Stop();
        timesForeachPureEscape.Add(sw.ElapsedMilliseconds);
    }

    Console.WriteLine("Char Array (short string) average time: {0} ns", timesCharArrayShortString.Average());
    Console.WriteLine("Foreach (short string) average time: {0} ns", timesForeachShortString.Average());
    Console.WriteLine("Char Array (long string) average time: {0} ms", timesCharArrayLongString.Average());
    Console.WriteLine("Foreach (long string) average time: {0} ms", timesForeachLongString.Average());
    Console.WriteLine("Char Array (escaping only) average time: {0} ms", timesCharArrayPureEscape.Average());
    Console.WriteLine("Foreach (escaping only) average time: {0} ms", timesForeachPureEscape.Average());

    Console.Read();
}

private static string GetEscapedStringByTraversingCharArray(string s)
{
    if (String.IsNullOrEmpty(s))
        return s;

    char[] chars = s.ToCharArray();
    StringBuilder sb = new StringBuilder(s.Length);
    for (int i = 0; i < chars.Length; i++)
    {
        switch (chars[i])
        {
            case '\t':
                sb.Append(@"\t"); break;
            case '\n':
                sb.Append(@"\n"); break;
            case '\r':
                sb.Append(@"\r"); break;
            case '\f':
                sb.Append(@"\f"); break;
            default:
                sb.Append(chars[i]); break;
        }
    }
    return sb.ToString();
}

public static string GetEscapedStringForeach(string s)
{
    if (String.IsNullOrEmpty(s))
        return s;

    StringBuilder sb = new StringBuilder(s.Length);
    foreach (Char ch in s)
    {
        switch (ch)
        {
            case '\t':
                sb.Append(@"\t"); break;
            case '\n':
                sb.Append(@"\n"); break;
            case '\r':
                sb.Append(@"\r"); break;
            default:
                sb.Append(ch); break;
        }
    }
    return sb.ToString();
}

It makes sense that the first option is slower since you're creating a lot of string objects with: 第一个选择的速度较慢是有道理的,因为您使用以下方法创建了许多字符串对象:

s = s.Replace(escapeToken.Key, escapeToken.Value);

In the second method, there's no need to create a char[] because string has an indexer too. 在第二种方法中,无需创建char [],因为string也具有索引器。 Perhaps the only thing you can do to improve performance is initialize the StringBuilder with a capacity, so it doesn't need to resize. 可能唯一可以提高性能的方法就是使用容量初始化StringBuilder,因此不需要调整大小。 You can still use the Dictionary in the second method. 您仍然可以在第二种方法中使用“词典”。

You have no need to convert string into Char[] , so the solution can be slightly improved: 您无需将string转换为Char[] ,因此可以稍微改善解决方案:

public string GetEscapedStringByTraversingCharArray(string s) {
  // do not forget about null...
  if (String.IsNullOrEmpty(s))
    return s;

  // we can be sure, that it requires at least s.Length symbols, let's allocate then
  // in case we want to trade memory for speed we can even put
  //   StringBuilder sb = new StringBuilder(2 * s.Length);
  // for the worst case when all symbols should be escaped
  StringBuilder sb = new StringBuilder(s.Length);

  foreach(Char ch in s) {
    switch (ch) {
      case '\t':
        sb.Append(@"\t"); 

        break;
      case '\n':
        sb.Append(@"\n"); 

        break;
      case '\r':
        sb.Append(@"\r"); 

        break;
      default:
        sb.Append(ch); 

        break;
    }
  }

  return sb.ToString();
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM