简体   繁体   English

是否有不区分大小写的 string.Replace 替代方法?

[英]Is there an alternative to string.Replace that is case-insensitive?

I need to search a string and replace all occurrences of %FirstName% and %PolicyAmount% with a value pulled from a database.我需要搜索一个字符串并将所有出现的%FirstName%%PolicyAmount%替换为从数据库中提取的值。 The problem is the capitalization of FirstName varies.问题是 FirstName 的大小写不同。 That prevents me from using the String.Replace() method.这阻止了我使用String.Replace()方法。 I've seen web pages on the subject that suggest我已经看到有关该主题的网页建议

Regex.Replace(strInput, strToken, strReplaceWith, RegexOptions.IgnoreCase);

However for some reason when I try and replace %PolicyAmount% with $0 , the replacement never takes place.但是由于某种原因,当我尝试用$0替换%PolicyAmount%时,替换永远不会发生。 I assume that it has something to do with the dollar sign being a reserved character in regex.我认为这与美元符号是正则表达式中的保留字符有关。

Is there another method I can use that doesn't involve sanitizing the input to deal with regex special characters?我可以使用另一种方法来处理输入以处理正则表达式特殊字符吗?

Seems like string.Replace should have an overload that takes a StringComparison argument. 看起来像string.Replace 应该有一个带有StringComparison参数的重载。 Since it doesn't, you could try something like this: 既然没有,你可以尝试这样的事情:

public static string ReplaceString(string str, string oldValue, string newValue, StringComparison comparison)
{
    StringBuilder sb = new StringBuilder();

    int previousIndex = 0;
    int index = str.IndexOf(oldValue, comparison);
    while (index != -1)
    {
        sb.Append(str.Substring(previousIndex, index - previousIndex));
        sb.Append(newValue);
        index += oldValue.Length;

        previousIndex = index;
        index = str.IndexOf(oldValue, index, comparison);
    }
    sb.Append(str.Substring(previousIndex));

    return sb.ToString();
}

From MSDN 来自MSDN
$0 - "Substitutes the last substring matched by group number number (decimal)." $ 0 - “替换与组号(十进制)匹配的最后一个子串。”

In .NET Regular expressions group 0 is always the entire match. 在.NET正则表达式中,组0始终是整个匹配。 For a literal $ you need to 对于文字$,你需要

string value = Regex.Replace("%PolicyAmount%", "%PolicyAmount%", @"$$0", RegexOptions.IgnoreCase);

Kind of a confusing group of answers, in part because the title of the question is actually much larger than the specific question being asked. 混乱组答案,部分原因是由于问题的标题实际上是远远大于被问的具体问题的种类。 After reading through, I'm not sure any answer is a few edits away from assimilating all the good stuff here, so I figured I'd try to sum. 阅读完之后,我不确定任何答案是否能够吸收所有好东西的一些编辑,所以我想我会尝试总结。

Here's an extension method that I think avoids the pitfalls mentioned here and provides the most broadly applicable solution. 这是一种扩展方法,我认为可以避免这里提到的陷阱,并提供最广泛适用的解决方案。

public static string ReplaceCaseInsensitiveFind(this string str, string findMe,
    string newValue)
{
    return Regex.Replace(str,
        Regex.Escape(findMe),
        Regex.Replace(newValue, "\\$[0-9]+", @"$$$0"),
        RegexOptions.IgnoreCase);
}

So... 所以...

  • This is an extension method @MarkRobinson 这是@MarkRobinson 的扩展方法
  • This doesn't try to skip Regex @Helge (you really have to do byte-by-byte if you want to string sniff like this outside of Regex) 不会试图跳过Regex @Helge(如果你想在Regex之外的字符串嗅探,你真的必须逐个字节)
  • Passes @MichaelLiu 's excellent test case , "œ".ReplaceCaseInsensitiveFind("oe", "") , though he may have had a slightly different behavior in mind. 通过@MichaelLiu的优秀测试用例"œ".ReplaceCaseInsensitiveFind("oe", "") ,尽管他的行为可能略有不同。

Unfortunately, @HA 's comment that you have to Escape all three isn't correct . 不幸的是, @ HA的评论说你必须Escape这三个是不正确的 The initial value and newValue doesn't need to be. 初始值和newValue不需要。

Note: You do, however, have to escape $ s in the new value that you're inserting if they're part of what would appear to be a "captured value" marker . 注意:但是, 如果它们是看似“捕获值”标记的一部分 ,则必须在新插入的值中转义$ s。 Thus the three dollar signs in the Regex.Replace inside the Regex.Replace [sic]. 因此Regex.Replace里面的三个美元符号.Replace里面的内容。[原文如此]。 Without that, something like this breaks... 没有它,这样的事情会破坏......

"This is HIS fork, hIs spoon, hissssssss knife.".ReplaceCaseInsensitiveFind("his", @"he$0r")

Here's the error: 这是错误:

An unhandled exception of type 'System.ArgumentException' occurred in System.dll

Additional information: parsing "The\hisr\ is\ he\HISr\ fork,\ he\hIsr\ spoon,\ he\hisrsssssss\ knife\." - Unrecognized escape sequence \h.

Tell you what, I know folks that are comfortable with Regex feel like their use avoids errors, but I'm often still partial to byte sniffing strings (but only after having read Spolsky on encodings ) to be absolutely sure you're getting what you intended for important use cases. 告诉你什么,我知道那些对Regex感到满意的人觉得他们的使用可以避免错误,但是我经常仍然偏向字节嗅探字符串(但只有在编码后阅读Spolsky )才能确保你得到的是什么用于重要用例。 Reminds me of Crockford on " insecure regular expressions " a little. 让我想起克罗克福德对“ 不安全的正则表达 ”的看法。 Too often we write regexps that allow what we want (if we're lucky), but unintentionally allow more in (eg, Is $10 really a valid "capture value" string in my newValue regexp, above?) because we weren't thoughtful enough. 我们经常编写允许我们想要的正则表达式(如果我们很幸运),但无意中允许更多(例如,在我的newValue正则表达式中, $10真的是一个有效的“捕获值”字符串吗?)因为我们并不周到足够。 Both methods have value, and both encourage different types of unintentional errors. 这两种方法都有价值,并且都鼓励不同类型的无意识错误。 It's often easy to underestimate complexity. 通常很容易低估复杂性。

That weird $ escaping (and that Regex.Escape didn't escape captured value patterns like $0 as I would have expected in replacement values) drove me mad for a while. 奇怪的$逃避(并且Regex.Escape没有像我在预期的替换价值中那样逃避被捕获的价值模式,如$0 )让我疯了一会儿。 Programming Is Hard (c) 1842 编程很难(c)1842

Seems the easiest method is simply to use the Replace method that ships with .Net and has been around since .Net 1.0: 似乎最简单的方法就是使用.Net附带的Replace方法,并且自.Net 1.0以来一直存在:

string res = Microsoft.VisualBasic.Strings.Replace(res, 
                                   "%PolicyAmount%", 
                                   "$0", 
                                   Compare: Microsoft.VisualBasic.CompareMethod.Text);

In order to use this method, you have to add a Reference to the Microsoft.VisualBasic assemblly. 要使用此方法,您必须添加对Microsoft.VisualBasic组件的引用。 This assembly is a standard part of the .Net runtime, it is not an extra download or marked as obsolete. 此程序集是.Net运行时的标准部分,它不是额外的下载或标记为过时。

Here's an extension method. 这是一种扩展方法。 Not sure where I found it. 不确定我在哪里找到它。

public static class StringExtensions
{
    public static string Replace(this string originalString, string oldValue, string newValue, StringComparison comparisonType)
    {
        int startIndex = 0;
        while (true)
        {
            startIndex = originalString.IndexOf(oldValue, startIndex, comparisonType);
            if (startIndex == -1)
                break;

            originalString = originalString.Substring(0, startIndex) + newValue + originalString.Substring(startIndex + oldValue.Length);

            startIndex += newValue.Length;
        }

        return originalString;
    }

}
    /// <summary>
    /// A case insenstive replace function.
    /// </summary>
    /// <param name="originalString">The string to examine.(HayStack)</param>
    /// <param name="oldValue">The value to replace.(Needle)</param>
    /// <param name="newValue">The new value to be inserted</param>
    /// <returns>A string</returns>
    public static string CaseInsenstiveReplace(string originalString, string oldValue, string newValue)
    {
        Regex regEx = new Regex(oldValue,
           RegexOptions.IgnoreCase | RegexOptions.Multiline);
        return regEx.Replace(originalString, newValue);
    }

Inspired by cfeduke's answer, I made this function which uses IndexOf to find the old value in the string and then replaces it with the new value. 受cfeduke的回答启发,我创建了这个函数,它使用IndexOf在字符串中查找旧值,然后用新值替换它。 I used this in an SSIS script processing millions of rows, and the regex-method was way slower than this. 我在处理数百万行的SSIS脚本中使用了这个,而regex方法比这慢。

public static string ReplaceCaseInsensitive(this string str, string oldValue, string newValue)
{
    int prevPos = 0;
    string retval = str;
    // find the first occurence of oldValue
    int pos = retval.IndexOf(oldValue, StringComparison.InvariantCultureIgnoreCase);

    while (pos > -1)
    {
        // remove oldValue from the string
        retval = retval.Remove(pos, oldValue.Length);

        // insert newValue in it's place
        retval = retval.Insert(pos, newValue);

        // check if oldValue is found further down
        prevPos = pos + newValue.Length;
        pos = retval.IndexOf(oldValue, prevPos, StringComparison.InvariantCultureIgnoreCase);
    }

    return retval;
}

Expanding on C. Dragon 76 's popular answer by making his code into an extension that overloads the default Replace method. 扩展C. Dragon 76的流行答案,将他的代码变成一个扩展,重载默认的Replace方法。

public static class StringExtensions
{
    public static string Replace(this string str, string oldValue, string newValue, StringComparison comparison)
    {
        StringBuilder sb = new StringBuilder();

        int previousIndex = 0;
        int index = str.IndexOf(oldValue, comparison);
        while (index != -1)
        {
            sb.Append(str.Substring(previousIndex, index - previousIndex));
            sb.Append(newValue);
            index += oldValue.Length;

            previousIndex = index;
            index = str.IndexOf(oldValue, index, comparison);
        }
        sb.Append(str.Substring(previousIndex));
        return sb.ToString();
     }
}

Based on Jeff Reddy's answer, with some optimisations and validations: 根据Jeff Reddy的回答,进行了一些优化和验证:

public static string Replace(string str, string oldValue, string newValue, StringComparison comparison)
{
    if (oldValue == null)
        throw new ArgumentNullException("oldValue");
    if (oldValue.Length == 0)
        throw new ArgumentException("String cannot be of zero length.", "oldValue");

    StringBuilder sb = null;

    int startIndex = 0;
    int foundIndex = str.IndexOf(oldValue, comparison);
    while (foundIndex != -1)
    {
        if (sb == null)
            sb = new StringBuilder(str.Length + (newValue != null ? Math.Max(0, 5 * (newValue.Length - oldValue.Length)) : 0));
        sb.Append(str, startIndex, foundIndex - startIndex);
        sb.Append(newValue);

        startIndex = foundIndex + oldValue.Length;
        foundIndex = str.IndexOf(oldValue, startIndex, comparison);
    }

    if (startIndex == 0)
        return str;
    sb.Append(str, startIndex, str.Length - startIndex);
    return sb.ToString();
}

a version similar to C. Dragon's, but for if you only need a single replacement: 类似于C. Dragon的版本,但是如果你只需要一个替换:

int n = myText.IndexOf(oldValue, System.StringComparison.InvariantCultureIgnoreCase);
if (n >= 0)
{
    myText = myText.Substring(0, n)
        + newValue
        + myText.Substring(n + oldValue.Length);
}

Here is another option for executing Regex replacements, since not many people seem to notice the matches contain the location within the string: 这是执行正则表达式替换的另一个选项,因为似乎没有多少人注意到匹配包含字符串中的位置:

    public static string ReplaceCaseInsensative( this string s, string oldValue, string newValue ) {
        var sb = new StringBuilder(s);
        int offset = oldValue.Length - newValue.Length;
        int matchNo = 0;
        foreach (Match match in Regex.Matches(s, Regex.Escape(oldValue), RegexOptions.IgnoreCase))
        {
            sb.Remove(match.Index - (offset * matchNo), match.Length).Insert(match.Index - (offset * matchNo), newValue);
            matchNo++;
        }
        return sb.ToString();
    }

Since .NET Core 2.0 or .NET Standard 2.1 respectively, this is baked into the .NET runtime [1]:从 .NET Core 2.0 或 .NET Standard 2.1 开始,这被烘焙到 .NET 运行时 [1]:

"hello world".Replace("World", "csharp", StringComparison.CurrentCultureIgnoreCase); // "hello csharp"

[1] https://docs.microsoft.com/en-us/dotnet/api/system.string.replace#System_String_Replace_System_String_System_String_System_StringComparison _ [1] https://docs.microsoft.com/en-us/dotnet/api/system.string.replace#System_String_Replace_System_String_System_String_System_StringComparison _

Regex.Replace(strInput, strToken.Replace("$", "[$]"), strReplaceWith, RegexOptions.IgnoreCase);

The regular expression method should work. 正则表达式方法应该有效。 However what you can also do is lower case the string from the database, lower case the %variables% you have, and then locate the positions and lengths in the lower cased string from the database. 然而,您还可以做的是小写数据库中的字符串,小写%变量%,然后从数据库中找到下部字符串中的位置和长度。 Remember, positions in a string don't change just because its lower cased. 请记住,字符串中的位置不会因为较低的情况而改变。

Then using a loop that goes in reverse (its easier, if you do not you will have to keep a running count of where later points move to) remove from your non-lower cased string from the database the %variables% by their position and length and insert the replacement values. 然后使用一个反向循环(它更容易,如果你不这样做,你将不得不保持后续点移动到的位置的运行计数)从数据库中删除非低位字符串的%变量%由它们的位置和长度并插入替换值。

(Since everyone is taking a shot at this). (因为每个人都在考虑这个)。 Here's my version (with null checks, and correct input and replacement escaping) ** Inspired from around the internet and other versions: 这是我的版本(使用空检查,正确输入和替换转义)**灵感来自互联网和其他版本:

using System;
using System.Text.RegularExpressions;

public static class MyExtensions {
    public static string ReplaceIgnoreCase(this string search, string find, string replace) {
        return Regex.Replace(search ?? "", Regex.Escape(find ?? ""), (replace ?? "").Replace("$", "$$"), RegexOptions.IgnoreCase);          
    }
}

Usage: 用法:

var result = "This is a test".ReplaceIgnoreCase("IS", "was");

Let me make my case and then you can tear me to shreds if you like. 让我说出我的情况,如果你愿意,你可以把我撕成碎片。

Regex is not the answer for this problem - too slow and memory hungry, relatively speaking. 相对来说,正则表达式不是这个问题的答案 - 太慢和内存饥饿。

StringBuilder is much better than string mangling. StringBuilder比字符串重整更好。

Since this will be an extension method to supplement string.Replace , I believe it important to match how that works - therefore throwing exceptions for the same argument issues is important as is returning the original string if a replacement was not made. 因为这将是一个补充string.Replace的扩展方法,我认为重要的是匹配它的工作方式 - 因此抛出相同参数问题的异常很重要,因为如果没有替换,则返回原始字符串。

I believe that having a StringComparison parameter is not a good idea. 我相信拥有StringComparison参数并不是一个好主意。 I did try it but the test case originally mentioned by michael-liu showed a problem:- 我确实尝试过但是michael-liu最初提到的测试用例显示了一个问题: -

[TestCase("œ", "oe", "", StringComparison.InvariantCultureIgnoreCase, Result = "")]

Whilst IndexOf will match, there is a mismatch between the length of the match in the source string (1) and oldValue.Length (2). 虽然IndexOf将匹配,但源字符串(1)中的匹配长度与oldValue.Length(2)之间存在不匹配。 This manifested itself by causing IndexOutOfRange in some other solutions when oldValue.Length was added to the current match position and I could not find a way around this. 当oldValue.Length被添加到当前匹配位置并且我无法找到解决方法时,这表现为在其他一些解决方案中引入IndexOutOfRange。 Regex fails to match the case anyway, so I took the pragmatic solution of only using StringComparison.OrdinalIgnoreCase for my solution. Regex无论如何都无法匹配案例,因此我采用了仅使用StringComparison.OrdinalIgnoreCase的实用解决方案作为我的解决方案。

My code is similar to other answers but my twist is that I look for a match before going to the trouble of creating a StringBuilder . 我的代码与其他答案类似,但我的转折是我在找到创建StringBuilder的麻烦之前寻找匹配。 If none is found then a potentially large allocation is avoided. 如果没有找到,则避免潜在的大分配。 The code then becomes a do{...}while rather than a while{...} 然后代码变为do{...}while而不是一段while{...}

I have done some extensive testing against other Answers and this came out fractionally faster and used slightly less memory. 我已经针对其他Answers进行了一些广泛的测试,这种测试速度更快,使用的内存略少。

    public static string ReplaceCaseInsensitive(this string str, string oldValue, string newValue)
    {
        if (str == null) throw new ArgumentNullException(nameof(str));
        if (oldValue == null) throw new ArgumentNullException(nameof(oldValue));
        if (oldValue.Length == 0) throw new ArgumentException("String cannot be of zero length.", nameof(oldValue));

        var position = str.IndexOf(oldValue, 0, StringComparison.OrdinalIgnoreCase);
        if (position == -1) return str;

        var sb = new StringBuilder(str.Length);

        var lastPosition = 0;

        do
        {
            sb.Append(str, lastPosition, position - lastPosition);

            sb.Append(newValue);

        } while ((position = str.IndexOf(oldValue, lastPosition = position + oldValue.Length, StringComparison.OrdinalIgnoreCase)) != -1);

        sb.Append(str, lastPosition, str.Length - lastPosition);

        return sb.ToString();
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM