简体   繁体   中英

String Comparison, .NET and non breaking space

I have an app written in C# that does a lot of string comparison. The strings are pulled in from a variety of sources (including user input) and are then compared. However I'm running into problems when comparing space '32' to non-breaking space '160'. To the user they look the same and so they expect a match. But when the app does the compare, there is no match.

What is the best way to go about this? Am I going to have to go to all parts of the code that do a string compare and manually normalize non-breaking spaces to spaces? Does .NET offer anything to help with that? (I've tried all the compare options but none seem to help.)

It has been suggested that I normalize the strings upon receipt and then let the string compare method simply compare the normalized strings. I'm not sure it would be straight-forward to do that because what is a normalized string in the first place. What do I normalize it too? Sure, for now I can convert non-breaking spaces to breaking spaces. But what else can show up? Can there potentially be very many of these rules? Might they even be conflicting. (In one case I want to use a rule and in another I don't.)

I went through lots of pain to find this simple answer. The code below uses a regular expression to replace non breaking spaces with normal spaces.

string cellText = "String with non breaking spaces.";
cellText = Regex.Replace(cellText, @"\u00A0", " ");

Hope this helps, Dan

If it were me, I would 'normalize' the strings as I 'pulled them in'; probably with a string.Replace(). Then you won't need to change your comparisons anywhere else.

Edit : Mark, that's a tough one. Its really up to you, or you clients, as to what is a 'normalized' string. I've been in a similar situation where the customer demanded that strings like:

I have 4 apples.
I have four apples.

were actually equal. You may need separate normalizers for different situations. Either way, I would still do the normalization upon retrieval of the original strings.

It needs to be

text.Replace('\u00A0',' ')

where is non breaking space

This will replace the non breaking space with normal space.

I'd suggest creating your own string comparer that extends one of the original ones -- do the "normalization" there (replace non-breaking space with regular space). In addition to the instance Equals method, there's a static String.Equals that takes a comparer.

不使用正则表达式的情况也一样,主要是我自己以后需要时使用:

text.Replace('\ ', ' ')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM