简体   繁体   English

C# 字符串比较忽略空格、回车或换行符

[英]C# string comparison ignoring spaces, carriage return or line breaks

How can I compare 2 strings in C# ignoring the case, spaces and any line-breaks.如何在 C# 中比较 2 个字符串而忽略大小写、空格和任何换行符。 I also need to check if both strings are null then they are marked as same.我还需要检查两个字符串是否为空,然后将它们标记为相同。

Thanks!谢谢!

You should normalize each string by removing the characters that you don't want to compare and then you can perform a String.Equals with a StringComparison that ignores case.您应该通过删除不想比较的字符来规范化每个字符串,然后您可以使用忽略大小写的StringComparison执行String.Equals

Something like this:像这样的东西:

string s1 = "HeLLo    wOrld!";
string s2 = "Hello\n    WORLd!";

string normalized1 = Regex.Replace(s1, @"\s", "");
string normalized2 = Regex.Replace(s2, @"\s", "");

bool stringEquals = String.Equals(
    normalized1, 
    normalized2, 
    StringComparison.OrdinalIgnoreCase);

Console.WriteLine(stringEquals);

Here Regex.Replace is used first to remove all whitespace characters.这里首先使用Regex.Replace删除所有空白字符。 The special case of both strings being null is not treated here but you can easily handle that case before performing the string normalization.此处不处理两个字符串为 null 的特殊情况,但您可以在执行字符串规范化之前轻松处理这种情况。

This may also work.这也可能有效。

String.Compare(s1, s2, CultureInfo.CurrentCulture, CompareOptions.IgnoreCase | CompareOptions.IgnoreSymbols) == 0

Edit:编辑:

IgnoreSymbols : Indicates that the string comparison must ignore symbols, such as white-space characters, punctuation, currency symbols, the percent sign, mathematical symbols, the ampersand, and so on. IgnoreSymbols :指示字符串比较必须忽略符号,例如空格字符、标点符号、货币符号、百分号、数学符号、与号等。

Remove all the characters you don't want and then use the ToLower() method to ignore case.删除所有不需要的字符,然后使用 ToLower() 方法忽略大小写。

edit: While the above works, it's better to use StringComparison.OrdinalIgnoreCase .编辑:虽然上述工作,最好使用StringComparison.OrdinalIgnoreCase Just pass it as the second argument to the Equals method.只需将它作为第二个参数传递给Equals方法。

First replace all whitespace via regular expression from both string and then use the String.Compare method with parameter ignoreCase = true.首先通过正则表达式从两个字符串中替换所有空格,然后使用参数 ignoreCase = true 的String.Compare方法。

string a = System.Text.RegularExpressions.Regex.Replace("void foo", @"\s", "");
string b = System.Text.RegularExpressions.Regex.Replace("voidFoo", @"\s", "");
bool isTheSame = String.Compare(a, b, true) == 0;

If you need performance, the Regex solutions on this page run too slow for you.如果您需要性能,此页面上的 Regex 解决方案对您来说运行速度太慢。 Maybe you have a large list of strings you want to sort.也许你有一大串想要排序的字符串。 (A Regex solution is more readable however) (然而,正则表达式解决方案更具可读性)

I have a class that looks at each individual char in both strings and compares them while ignoring case and whitespace.我有一个类,它查看两个字符串中的每个单独的字符,并在忽略大小写和空格的情况下比较它们。 It doesn't allocate any new strings.它不分配任何新字符串。 It uses the char.IsWhiteSpace(ch) to determine whitespace, and char.ToLowerInvariant(ch) for case-insensitivity (if required).它使用char.IsWhiteSpace(ch)来确定空格,并使用char.ToLowerInvariant(ch)来区分大小写(如果需要)。 In my testing, my solution runs about 5x - 8x faster than a Regex-based solution.在我的测试中,我的解决方案的运行速度比基于正则表达式的解决方案快 5 到 8 倍。 My class also implements IEqualityComparer's GetHashCode(obj) method using this code in another SO answer.我的班级还在另一个 SO 答案中使用此代码实现了 IEqualityComparer 的GetHashCode(obj)方法。 This GetHashCode(obj) also ignores whitespace and optionally ignores case.GetHashCode(obj)也忽略空格并可选择忽略大小写。

Here's my class:这是我的课:

private class StringCompIgnoreWhiteSpace : IEqualityComparer<string>
{
    public bool Equals(string strx, string stry)
    {
        if (strx == null) //stry may contain only whitespace
            return string.IsNullOrWhiteSpace(stry);

        else if (stry == null) //strx may contain only whitespace
            return string.IsNullOrWhiteSpace(strx);

        int ix = 0, iy = 0;
        for (; ix < strx.Length && iy < stry.Length; ix++, iy++)
        {
            char chx = strx[ix];
            char chy = stry[iy];

            //ignore whitespace in strx
            while (char.IsWhiteSpace(chx) && ix < strx.Length)
            {
                ix++;
                chx = strx[ix];
            }

            //ignore whitespace in stry
            while (char.IsWhiteSpace(chy) && iy < stry.Length)
            {
                iy++;
                chy = stry[iy];
            }

            if (ix == strx.Length && iy != stry.Length)
            { //end of strx, so check if the rest of stry is whitespace
                for (int iiy = iy + 1; iiy < stry.Length; iiy++)
                {
                    if (!char.IsWhiteSpace(stry[iiy]))
                        return false;
                }
                return true;
            }

            if (ix != strx.Length && iy == stry.Length)
            { //end of stry, so check if the rest of strx is whitespace
                for (int iix = ix + 1; iix < strx.Length; iix++)
                {
                    if (!char.IsWhiteSpace(strx[iix]))
                        return false;
                }
                return true;
            }

            //The current chars are not whitespace, so check that they're equal (case-insensitive)
            //Remove the following two lines to make the comparison case-sensitive.
            chx = char.ToLowerInvariant(chx);
            chy = char.ToLowerInvariant(chy);

            if (chx != chy)
                return false;
        }

        //If strx has more chars than stry
        for (; ix < strx.Length; ix++)
        {
            if (!char.IsWhiteSpace(strx[ix]))
                return false;
        }

        //If stry has more chars than strx
        for (; iy < stry.Length; iy++)
        {
            if (!char.IsWhiteSpace(stry[iy]))
                return false;
        }

        return true;
    }

    public int GetHashCode(string obj)
    {
        if (obj == null)
            return 0;

        int hash = 17;
        unchecked // Overflow is fine, just wrap
        {
            for (int i = 0; i < obj.Length; i++)
            {
                char ch = obj[i];
                if(!char.IsWhiteSpace(ch))
                    //use this line for case-insensitivity
                    hash = hash * 23 + char.ToLowerInvariant(ch).GetHashCode();

                    //use this line for case-sensitivity
                    //hash = hash * 23 + ch.GetHashCode();
            }
        }
        return hash;
    }
}

private static void TestComp()
{
    var comp = new StringCompIgnoreWhiteSpace();

    Console.WriteLine(comp.Equals("abcd", "abcd")); //true
    Console.WriteLine(comp.Equals("abCd", "Abcd")); //true
    Console.WriteLine(comp.Equals("ab Cd", "Ab\n\r\tcd   ")); //true
    Console.WriteLine(comp.Equals(" ab Cd", "  A b" + Environment.NewLine + "cd ")); //true
    Console.WriteLine(comp.Equals(null, "  \t\n\r ")); //true
    Console.WriteLine(comp.Equals("  \t\n\r ", null)); //true
    Console.WriteLine(comp.Equals("abcd", "abcd   h")); //false

    Console.WriteLine(comp.GetHashCode(" a b c d")); //-699568861


    //This is -699568861 if you #define StringCompIgnoreWhiteSpace_CASE_INSENSITIVE
    //  Otherwise it's -1555613149
    Console.WriteLine(comp.GetHashCode("A B c      \t       d"));
}

Here's my testing code (with a Regex example):这是我的测试代码(使用正则表达式示例):

private static void SpeedTest()
{
    const int loop = 100000;
    string first = "a bc d";
    string second = "ABC D";

    var compChar = new StringCompIgnoreWhiteSpace();
    Stopwatch sw1 = Stopwatch.StartNew();
    for (int i = 0; i < loop; i++)
    {
        bool equals = compChar.Equals(first, second);
    }
    sw1.Stop();
    Console.WriteLine(string.Format("char time =  {0}", sw1.Elapsed)); //char time =  00:00:00.0361159

    var compRegex = new StringCompIgnoreWhiteSpaceRegex();
    Stopwatch sw2 = Stopwatch.StartNew();
    for (int i = 0; i < loop; i++)
    {
        bool equals = compRegex.Equals(first, second);
    }
    sw2.Stop();
    Console.WriteLine(string.Format("regex time = {0}", sw2.Elapsed)); //regex time = 00:00:00.2773072
}

private class StringCompIgnoreWhiteSpaceRegex : IEqualityComparer<string>
{
    public bool Equals(string strx, string stry)
    {
        if (strx == null)
            return string.IsNullOrWhiteSpace(stry);
        else if (stry == null)
            return string.IsNullOrWhiteSpace(strx);

        string a = System.Text.RegularExpressions.Regex.Replace(strx, @"\s", "");
        string b = System.Text.RegularExpressions.Regex.Replace(stry, @"\s", "");
        return String.Compare(a, b, true) == 0;
    }

    public int GetHashCode(string obj)
    {
        if (obj == null)
            return 0;

        string a = System.Text.RegularExpressions.Regex.Replace(obj, @"\s", "");
        return a.GetHashCode();
    }
}

I would probably start by removing the characters you don't want to compare from the string before comparing.在比较之前,我可能会先从字符串中删除您不想比较的字符。 If performance is a concern, you might look at storing a version of each string with the characters already removed.如果性能是一个问题,您可能会考虑存储每个字符串的版本,其中已删除字符。

Alternatively, you could write a compare routine that would skip over the characters you want to ignore.或者,您可以编写一个比较例程来跳过要忽略的字符。 But that just seems like more work to me.但这对我来说似乎是更多的工作。

You can also use the following custom function您还可以使用以下自定义函数

public static string ExceptChars(this string str, IEnumerable<char> toExclude)
        {
            StringBuilder sb = new StringBuilder();
            for (int i = 0; i < str.Length; i++)
            {
                char c = str[i];
                if (!toExclude.Contains(c))
                    sb.Append(c);
            }
            return sb.ToString();
        }

        public static bool SpaceCaseInsenstiveComparision(this string stringa, string stringb)
        {
            return (stringa==null&&stringb==null)||stringa.ToLower().ExceptChars(new[] { ' ', '\t', '\n', '\r' }).Equals(stringb.ToLower().ExceptChars(new[] { ' ', '\t', '\n', '\r' }));
        }

And then use it following way然后按照以下方式使用它

"Te  st".SpaceCaseInsenstiveComparision("Te st");

Another option is the LINQ SequenceEquals method which according to my tests is more than twice as fast as the Regex approach used in other answers and very easy to read and maintain.另一种选择是 LINQ SequenceEquals方法,根据我的测试,它比其他答案中使用的 Regex 方法快两倍多,并且非常易于阅读和维护。

public static bool Equals_Linq(string s1, string s2)
{
    return Enumerable.SequenceEqual(
        s1.Where(c => !char.IsWhiteSpace(c)).Select(char.ToUpperInvariant),
        s2.Where(c => !char.IsWhiteSpace(c)).Select(char.ToUpperInvariant));
}

public static bool Equals_Regex(string s1, string s2)
{
    return string.Equals(
        Regex.Replace(s1, @"\s", ""),
        Regex.Replace(s2, @"\s", ""),
        StringComparison.OrdinalIgnoreCase);
}

Here the simple performance test code I used:这是我使用的简单性能测试代码:

var s1 = "HeLLo    wOrld!";
var s2 = "Hello\n    WORLd!";
var watch = Stopwatch.StartNew();
for (var i = 0; i < 1000000; i++)
{
    Equals_Linq(s1, s2);
}
Console.WriteLine(watch.Elapsed); // ~1.7 seconds
watch = Stopwatch.StartNew();
for (var i = 0; i < 1000000; i++)
{
    Equals_Regex(s1, s2);
}
Console.WriteLine(watch.Elapsed); // ~4.6 seconds

An approach not optimized for performance, but for completeness.一种不是针对性能优化的方法,而是针对完整性进行优化的方法。

  • normalizes null归一化null
  • normalizes unicode, combining characters, diacritics规范化 unicode,组合字符,变音符号
  • normalizes new lines规范化新行
  • normalizes white space规范化空白
  • normalizes casing规范化套管

code snippet:代码片段:

public static class StringHelper
{
    public static bool AreEquivalent(string source, string target)
    {
        if (source == null) return target == null;
        if (target == null) return false;
        var normForm1 = Normalize(source);
        var normForm2 = Normalize(target);
        return string.Equals(normForm1, normForm2);
    }

    private static string Normalize(string value)
    {
        Debug.Assert(value != null);
        // normalize unicode, combining characters, diacritics
        value = value.Normalize(NormalizationForm.FormC);
        // normalize new lines to white space
        value = value.Replace("\r\n", "\n").Replace("\r", "\n");
        // normalize white space
        value = Regex.Replace(value, @"\s", string.Empty);
        // normalize casing
        return value.ToLowerInvariant();
    }
}
  1. I would Trim the string using Trim() to remove all the我会使用Trim()修剪字符串以删除所有
    whitespace.空白。
  2. Use StringComparison.OrdinalIgnoreCase to ignore case sensitivity ex.使用StringComparison.OrdinalIgnoreCase忽略大小写敏感,例如。 stringA.Equals(stringB, StringComparison.OrdinalIgnoreCase)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM