C# “anyString”.Contains('\0', StringComparison.InvariantCulture) 在 .NET5 中返回 true，但在旧版本中返回 false

Question

I encountered an incompatible problem while I was trying to upgrade my projects from .NET core 3.1 to the latest .NET 5.我在尝试将我的项目从 .NET 核心 3.1 升级到最新的 .NET 5 时遇到了不兼容的问题。

My original code has a validation logic to check invalid file name characters by checking each character returned from Path.GetInvalidFileNameChars() API.我的原始代码有一个验证逻辑，通过检查从 Path.GetInvalidFileNameChars() API 返回的每个字符来检查无效的文件名字符。


var invalidFilenameChars = Path.GetInvalidFileNameChars();
bool validFileName = !invalidFilenameChars.Any(ch => fileName.Contains(ch, StringComparison.InvariantCulture));

Suppose you give a regular value to fileName such as "test.txt" that should be valid.假设您为 fileName 提供了一个常规值，例如应该有效的“test.txt”。 Surprisingly, however, the above code gives the file name is invalid if you run it with 'net5' target framework.然而，令人惊讶的是，如果您使用“net5”目标框架运行，上面的代码给出的文件名是无效的。

After spend some time on debugging, what I found is that the returned invalid character set contains '\0', null ASCII character and "text.txt".Contains("\0, StringComparison.InvariantCulture) gives true.经过一段时间的调试，我发现返回的无效字符集包含'\0'、null ASCII 字符和"text.txt".Contains("\0, StringComparison.InvariantCulture) 为真。

    class Program
    {
        static void Main(string[] args)
        {
            var containsNullChar = "test".Contains("\0", StringComparison.InvariantCulture);
    
            Console.WriteLine($"Contains null char {containsNullChar}");
        }
    }

If you run in .NET core 3.1, it never says regular string contains null character.如果您在 .NET 核心 3.1 中运行，它永远不会说常规字符串包含 null 字符。 Also, if I omit the second parameter (StringComparison.InvariantCulture) or if I use StringComparison.Ordinal, the strange result is never returned.此外，如果我省略第二个参数 (StringComparison.InvariantCulture) 或使用 StringComparison.Ordinal，则永远不会返回奇怪的结果。

Why this behavior is changed in .NET5?为什么这种行为在 .NET5 中发生了变化？

EDIT: As commented by Karl-Johan Sjögren before, there is indeed a behavior change in .NET5 regarding string comparison:编辑：正如 Karl-Johan Sjögren 之前评论的那样，.NET5 中关于字符串比较的行为确实发生了变化：

Behavior changes when comparing strings on .NET 5+ 比较 .NET 5+ 上的字符串时的行为变化

Also see the related ticket:另请参阅相关票证：

string.IndexOf get different result in.Net 5 string.IndexOf 在.Net 5中得到不同的结果

Though this issue should be related to above, the current result related to '\0' still looks strange to me and might still be considered to be a bug as answered by @xanatos.虽然这个问题应该与上面有关，但与 '\0' 相关的当前结果对我来说仍然看起来很奇怪，并且可能仍然被认为是@xanatos 回答的错误。

EDIT2:编辑2：

Now I realized that the actual cause of this problem was my confusion between InvariantCulture and Ordinal string comparison.现在我意识到这个问题的真正原因是我对 InvariantCulture 和 Ordinal 字符串比较的混淆。 They are actually quite different things.它们实际上是完全不同的东西。 See the ticket below:请看下面的票：

Difference between InvariantCulture and Ordinal string comparison InvariantCulture 和 Ordinal 字符串比较之间的区别

Also note that this should be unique problem of .NET as other major programming languages such as Java, C++ and Python treat ordinal comparison by default. Also note that this should be unique problem of .NET as other major programming languages such as Java, C++ and Python treat ordinal comparison by default.

Answer 1

not a bug, a feature不是错误，是功能

The issue that I've opened has been closed, but they gave a very good explanation.我打开的问题已经关闭，但他们给出了很好的解释。 Now... In .NET 5.0 they began using on Windows (on Linux it was already present) a new library for comparing strings, the ICU library.现在......在 .NET 5.0 中，他们开始在 Windows 上使用（在 Linux 上已经存在）一个用于比较字符串的新库，ICU 库。 It is the official library of the Unicode Consortium, so it is "the verb".是Unicode联盟的官方库，所以是“动词”。 That library is used for CurrentCulture , InvariantCulture (plus the respective IgnoreCase ) and and any other culture.该库用于CurrentCulture 、 InvariantCulture （加上各自的IgnoreCase ）和任何其他文化。 The only exception is the Ordinal / OrdinalIgnoreCase .唯一的例外是Ordinal / OrdinalIgnoreCase 。 The library is targetted for text and it has some "particular" ideas about non-text .该库以文本为目标，它对非文本有一些“特殊”的想法。 In this particular case, there are some characters that are simply ignored .在这种特殊情况下，有些字符会被忽略。 In the block 0000-00FF I would say the ignored characters are all control codes (please ignore the fact that they are shown as €‚ƒ„†‡ˆ‰Š‹ŒŽ''“”•–—™š›œžŸ , at a certain point these characters have been remapped somewhere else in the Unicode, but the glyps shown don't reflect it, but if you try to see their code, like doing char ch = '€'; int val = (int)ch; you'll see it), and '\0' is a control code.在 0000-00FF 块中，我会说被忽略的字符都是控制代码（请忽略它们显示为€‚ƒ„†‡ˆ‰Š‹ŒŽ''“”•–—™š›œžŸ的事实，在在某一点上，这些字符已在 Unicode 的其他地方重新映射，但显示的 glyp 并没有反映出来，但是如果您尝试查看它们的代码，例如执行char ch = '€'; int val = (int)ch;你会看到它）， '\0'是一个控制代码。

Now... My personal thinking is that to compare string from today you'll need a master's degree in Unicode Technologies, and I do hope that they'll do some shenanigans in .NET 6.0 to make the default comparison Ordinal (it is one of the proposals for .NET 6.0 , the Option B ).现在...我个人的想法是，从今天开始比较string ，您需要 Unicode 技术的硕士学位，我希望他们会在 .NET 6.0 中做一些恶作剧来进行默认比较Ordinal （它是.NET 6.0 的建议，选项 B ）。 Note that if you want to make programs that can run in Turkey you already needed a master's degree in Unicode Technologies (see the Turkish i problem ).请注意，如果您想制作可以在土耳其运行的程序，您已经需要 Unicode 技术的硕士学位（请参阅土耳其语 i 问题）。

In general I would say that to look for words that aren't keywords/fixed words (for example column names), you should use Culture-aware comparisons, while to look for keywords/fixed words (for example column names) and symbols/control codes you should use Ordinal comparisons.一般来说，我会说要查找不是关键字/固定字词（例如列名）的字词，您应该使用文化感知比较，同时查找关键字/固定字词（例如列名）和符号/您应该使用序数比较的控制代码。 The problem is when you want to look for both at the same time.问题是当您想同时查找两者时。 Normally in this case you are looking for exact words, so you can use Ordinal.通常在这种情况下，您正在寻找确切的单词，因此您可以使用 Ordinal。 Otherwise it becames hellish.否则就变成了地狱。 And I don't even want to think how Regex works internally in a Culture-aware environment.而且我什至不想考虑 Regex 在文化感知环境中的内部工作方式。 That I don't want to think about.我不想去想。 Becasue in that direction there can only be folly and nightmares.因为在那个方向上只会有愚蠢和噩梦。

As a sidenote, even before the "default" Culture-aware comparisons had some secret shaeaningans... for example:作为旁注，甚至在“默认”文化感知比较之前就有一些秘密的 shaeaningans ......例如：

int ix = "ʹ$ʹ".IndexOf("$"); // -1 on .NET Framework or .NET Core <= 3.1

what I had written before我之前写的

I'll say that it is a bug.我会说这是一个错误。 There is a similar bug with IndexOf . IndexOf也有类似的错误。 I've opened an Issue on github to track it .我在 github 上打开了一个问题来跟踪它。

As you have written, the Ordinal and OrdinalIgnoreCase work as expected (probably because they don't need to use the new ICU library for handling Unicode).正如您所写的， Ordinal和OrdinalIgnoreCase按预期工作（可能是因为它们不需要使用新的 ICU 库来处理 Unicode）。

Some sample code:一些示例代码：

Console.WriteLine($"Ordinal Contains null char {"test".Contains("\0", StringComparison.Ordinal)}");
Console.WriteLine($"OrdinalIgnoreCase Contains null char {"test".Contains("\0", StringComparison.OrdinalIgnoreCase)}");

Console.WriteLine($"CurrentCulture Contains null char {"test".Contains("\0", StringComparison.CurrentCulture)}");
Console.WriteLine($"CurrentCultureIgnoreCase Contains null char {"test".Contains("\0", StringComparison.CurrentCultureIgnoreCase)}");

Console.WriteLine($"InvariantCulture Contains null char {"test".Contains("\0", StringComparison.InvariantCulture)}");
Console.WriteLine($"InvariantCultureIgnoreCase Contains null char {"test".Contains("\0", StringComparison.InvariantCultureIgnoreCase)}");

Console.WriteLine($"Ordinal IndexOf null char {"test".IndexOf("\0t", StringComparison.Ordinal)}");
Console.WriteLine($"OrdinalIgnoreCase IndexOf null char {"test".IndexOf("\0", StringComparison.OrdinalIgnoreCase)}");

Console.WriteLine($"CurrentCulture IndexOf null char {"test".IndexOf("\0", StringComparison.CurrentCulture)}");
Console.WriteLine($"CurrentCultureIgnoreCase IndexOf null char {"test".IndexOf("\0", StringComparison.CurrentCultureIgnoreCase)}");

Console.WriteLine($"InvariantCulture IndexOf null char {"test".IndexOf("\0", StringComparison.InvariantCulture)}");
Console.WriteLine($"InvariantCultureIgnoreCase IndexOf null char {"test".IndexOf("\0", StringComparison.InvariantCultureIgnoreCase)}");

and和

Console.WriteLine($"Ordinal Contains null char {"test".Contains("\0test", StringComparison.Ordinal)}");
Console.WriteLine($"OrdinalIgnoreCase Contains null char {"test".Contains("\0test", StringComparison.OrdinalIgnoreCase)}");

Console.WriteLine($"CurrentCulture Contains null char {"test".Contains("\0test", StringComparison.CurrentCulture)}");
Console.WriteLine($"CurrentCultureIgnoreCase Contains null char {"test".Contains("\0test", StringComparison.CurrentCultureIgnoreCase)}");

Console.WriteLine($"InvariantCulture Contains null char {"test".Contains("\0test", StringComparison.InvariantCulture)}");
Console.WriteLine($"InvariantCultureIgnoreCase Contains null char {"test".Contains("\0test", StringComparison.InvariantCultureIgnoreCase)}");

Console.WriteLine($"Ordinal IndexOf null char {"test".IndexOf("\0t", StringComparison.Ordinal)}");
Console.WriteLine($"OrdinalIgnoreCase IndexOf null char {"test".IndexOf("\0test", StringComparison.OrdinalIgnoreCase)}");

Console.WriteLine($"CurrentCulture IndexOf null char {"test".IndexOf("\0test", StringComparison.CurrentCulture)}");
Console.WriteLine($"CurrentCultureIgnoreCase IndexOf null char {"test".IndexOf("\0test", StringComparison.CurrentCultureIgnoreCase)}");

Console.WriteLine($"InvariantCulture IndexOf null char {"test".IndexOf("\0test", StringComparison.InvariantCulture)}");
Console.WriteLine($"InvariantCultureIgnoreCase IndexOf null char {"test".IndexOf("\0test", StringComparison.InvariantCultureIgnoreCase)}");

C# “anyString”.Contains('\0', StringComparison.InvariantCulture) 在 .NET5 中返回 true，但在旧版本中返回 false

问题描述

1 个解决方案

解决方案1
5 已采纳 2021-01-05 09:26:03

C# “anyString”.Contains('\0', StringComparison.InvariantCulture) 在 .NET5 中返回 true，但在旧版本中返回 false

问题描述

1 个解决方案

解决方案1 5 已采纳 2021-01-05 09:26:03

解决方案1
5 已采纳 2021-01-05 09:26:03