简体   繁体   English

字符串比较,.NET和不间断空格

[英]String Comparison, .NET and non breaking space

I have an app written in C# that does a lot of string comparison. 我有一个用C#编写的应用程序,它可以进行很多字符串比较。 The strings are pulled in from a variety of sources (including user input) and are then compared. 从各种来源(包括用户输入)提取字符串,然后进行比较。 However I'm running into problems when comparing space '32' to non-breaking space '160'. 但是,在将空间“ 32”与不间断空间“ 160”进行比较时,我遇到了问题。 To the user they look the same and so they expect a match. 对于用户来说,它们看起来相同,因此他们期望匹配。 But when the app does the compare, there is no match. 但是当应用程序进行比较时,没有匹配项。

What is the best way to go about this? 最好的方法是什么? Am I going to have to go to all parts of the code that do a string compare and manually normalize non-breaking spaces to spaces? 我是否必须去做字符串比较并手动将不间断空格标准化为空格的代码的所有部分? Does .NET offer anything to help with that? .NET是否提供任何帮助呢? (I've tried all the compare options but none seem to help.) (我尝试了所有比较选项,但似乎无济于事。)

It has been suggested that I normalize the strings upon receipt and then let the string compare method simply compare the normalized strings. 有人建议我在收到时对字符串进行规范化,然后让字符串比较方法简单地比较规范化的字符串。 I'm not sure it would be straight-forward to do that because what is a normalized string in the first place. 我不确定这样做是否简单,因为首先是什么是规范化字符串。 What do I normalize it too? 我也将其标准化吗? Sure, for now I can convert non-breaking spaces to breaking spaces. 当然,现在我可以将不间断空格转换为间断空格。 But what else can show up? 但是还能显示什么呢? Can there potentially be very many of these rules? 这些规则中可能有很多吗? Might they even be conflicting. 他们甚至可能会发生冲突。 (In one case I want to use a rule and in another I don't.) (在一种情况下,我想使用规则,在另一种情况下,我不想使用。)

I went through lots of pain to find this simple answer. 为了找到这个简单的答案,我费了很大的力气。 The code below uses a regular expression to replace non breaking spaces with normal spaces. 下面的代码使用正则表达式将普通的空格替换为不间断空格。

string cellText = "String with non breaking spaces.";
cellText = Regex.Replace(cellText, @"\u00A0", " ");

Hope this helps, Dan 希望这会有所帮助,丹

If it were me, I would 'normalize' the strings as I 'pulled them in'; 如果是我,我将在“拉入”字符串时对其进行“规范化”。 probably with a string.Replace(). 可能带有string.Replace()。 Then you won't need to change your comparisons anywhere else. 然后,您无需在其他任何地方更改比较。

Edit : Mark, that's a tough one. 编辑 :马克,那是一个艰难的过程。 Its really up to you, or you clients, as to what is a 'normalized' string. 到底什么是“规范化”字符串取决于您或您的客户。 I've been in a similar situation where the customer demanded that strings like: 我遇到过类似的情况,客户需要这样的字符串:

I have 4 apples.
I have four apples.

were actually equal. 实际上是平等的。 You may need separate normalizers for different situations. 您可能需要针对不同情况的单独的规范化器。 Either way, I would still do the normalization upon retrieval of the original strings. 无论哪种方式,我仍然会在检索原始字符串时进行标准化。

It needs to be 它必须是

text.Replace('\u00A0',' ')

where is non breaking space 是不间断空格

This will replace the non breaking space with normal space. 这将用正常空间替换非中断空间。

I'd suggest creating your own string comparer that extends one of the original ones -- do the "normalization" there (replace non-breaking space with regular space). 我建议创建自己的字符串比较器,以扩展原始字符串比较器中的一个-在此处进行“规范化”(用常规空间替换不间断空间)。 In addition to the instance Equals method, there's a static String.Equals that takes a comparer. 除了实例Equals方法之外,还有一个静态String.Equals需要一个比较器。

不使用正则表达式的情况也一样,主要是我自己以后需要时使用:

text.Replace('\ ', ' ')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM