简体   繁体   English

反向String.Replace - 更快的方式吗?

[英]Inverse String.Replace - Faster way of doing it?

I have a method to replace every character except those I specify. 我有一个方法来替换除我指定的字符之外的每个字符。 For example, 例如,

ReplaceNot("test. stop; or, not", ".;/\\".ToCharArray(), '*'); 

would return 会回来的

"****.*****;***,****".

Now, this is not an instance of premature optimization. 现在,这不是过早优化的实例。 I call this method quite a few times during a network operation. 我在网络操作期间多次调用此方法。 I found that on longer strings, it is causing some latency, and removing it helped a bit. 我发现在更长的字符串上,它会导致一些延迟,并且删除它会有所帮助。 Any help to speed this up would be appreciated. 任何有助于加快这一点的帮助将不胜感激。

    public static string ReplaceNot(this string original, char[] pattern, char replacement)
    {           
        int index = 0;
        int old = -1;

        StringBuilder sb = new StringBuilder(original.Length);

        while ((index = original.IndexOfAny(pattern, index)) > -1)
        {
            sb.Append(new string(replacement, index - old - 1));
            sb.Append(original[index]);
            old = index++;
        }

        if (original.Length - old > 1)
        {
            sb.Append(new string(replacement, original.Length - (old + 1)));
        }

        return sb.ToString();
    }

Final #'s. 最后的#。 I also added a test case for a 3K character string, ran at 100K times instead of 1M to see how well each of these scales. 我还为一个3K字符串添加了一个测试用例,运行时间为100K而不是1M,以查看每个字符串的大小。 The only surprise was that the regular expression 'scaled better' than the others, but it is no help since it is very slow to begin with: 唯一令人惊讶的是,正则表达式“比其他表达式更好”,但它没有任何帮助,因为它开始时非常缓慢:

User            Short * 1M  Long * 100K     Scale
John            319             2125            6.66
Luke            360             2659            7.39
Guffa           409             2827            6.91
Mine            447             3372            7.54
DirkGently      1094            9134            8.35
Michael         1591            12785           8.04
Peter           21106           94386           4.47

Update: I made the creation of the regular expression for Peter's version a static variable, and set it to RegexOptions.Compiled to be fair: 更新:我为Peter的版本创建了一个静态变量的正则表达式,并将其设置为RegexOptions.Compiled为公平:

User            Short * 1M      Long * 100K     Scale
Peter           8997            74715           8.30

Pastebin link to my testing code, please correct me if it is wrong: 粘贴到我的测试代码的链接,如果错误请纠正我: http://pastebin.com/f64f260ee http://pastebin.com/f64f260ee

Can't you use Regex.Replace like so: 你不能像这样使用Regex.Replace:

Regex regex = new Regex(@"[^.;/\\]");
string s = regex.Replace("test. stop; or, not", "*");

Alright, on a ~60KB string, this will perform about 40% faster than your version: 好吧,在大约60KB的字符串上,这比你的版本快40%:

public static string ReplaceNot(this string original, char[] pattern, char replacement)
{
    int index = 0;

    StringBuilder sb = new StringBuilder(new string(replacement, original.Length));

    while ((index = original.IndexOfAny(pattern, index)) > -1)
    {
        sb[index] = original[index++];
    }

    return sb.ToString();
}

The trick is to initialize a new string with all replacement characters, since most of them will be replaced. 诀窍是初始化一个包含所有替换字符的新字符串,因为大多数字符都将被替换。

I don't know if this will be any faster, but it avoids newing up strings just so they can be appended to the string builder, which may help: 我不知道这是否会更快,但它避免了新的字符串,因此可以将它们附加到字符串构建器,这可能会有所帮助:

    public static string ReplaceNot(this string original, char[] pattern, char replacement)
    {
        StringBuilder sb = new StringBuilder(original.Length);

        foreach (char ch in original) {
            if (Array.IndexOf( pattern, ch) >= 0) {
                sb.Append( ch);
            }
            else {
                sb.Append( replacement);
            }
        }

        return sb.ToString();
    }

If the number of chars in pattern will be of any size (which I'm guessing it generally won't), it might pay to sort it and perform an Array.BinarySearch() instead of the Array.indexOf() . 如果pattern的字符数量将是任何大小(我猜它通常不会),可能需要对它进行排序并执行Array.BinarySearch()而不是Array.indexOf()

For such a simple transformation, I'd bet that it'll have no problem being faster than a regex, too. 对于这样一个简单的转换,我敢打赌,它也没有比正则表达式更快的问题。

Also, since your set of characters in pattern are likely to usually come from a string anyway (at least that's been my general experience with this type of API), why don't you have the method signature be: 此外,由于你的pattern中的字符集通常可能来自字符串(至少这是我对这种类型的API的一般经验),为什么你没有方法签名是:

public static string ReplaceNot(this string original, string pattern, char replacement)

or better yet, have an overload where pattern can be a char[] or string ? 或者更好的是,有一个重载,其中pattern可以是char[]string

Here's another version for you. 这是你的另一个版本。 My tests suggest that its performance is pretty good. 我的测试表明它的性能非常好。

public static string ReplaceNot(
    this string original, char[] pattern, char replacement)
{
    char[] buffer = new char[original.Length];

    for (int i = 0; i < buffer.Length; i++)
    {
        bool replace = true;

        for (int j = 0; j < pattern.Length; j++)
        {
            if (original[i] == pattern[j])
            {
                replace = false;
                break;
            }
        }

        buffer[i] = replace ? replacement : original[i];
    }

    return new string(buffer);
}

The StringBuilder has an overload that takes a character and a count, so you don't have to create intermediate strings to add to the StringBuilder. StringBuilder有一个带有字符和计数的重载,因此您不必创建要添加到StringBuilder的中间字符串。 I get about 20% improvement by replacing this: 通过替换它,我得到了大约20%的改进:

sb.Append(new string(replacement, index - old - 1));

with: 有:

sb.Append(replacement, index - old - 1);

and this: 还有这个:

sb.Append(new string(replacement, original.Length - (old + 1)));

with: 有:

sb.Append(replacement, original.Length - (old + 1));

(I tested the code that you said was about four times faster, and I find it about 15 times slower...) (我测试过你说的代码大约快了四倍,我觉得它慢了大约15倍......)

It's going to be O(n). 它将是O(n)。 You seem to be replacing all alphabets and whitespaces by * , why not just test if the current character is an alphabet/whitespace and replace it? 您似乎用*替换所有字母和空格,为什么不测试当前字符是否为字母/空格并替换它?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM