简体   繁体   中英

C#: Remove language specific symbols from string

For the sake of the example, let's assume I am parsing some text written in German. This means that it contains symbols like ü or Ö. The problem is that when all German specific symbols get rendered as an empty square. Please take a look at this image:

Image http://img8.imageshack.us/img8/7502/93341046.png

Since I do not know whether this symbol is ü or Ö I want to replace it with "." (dot). So the string from the image above, should become "Osnabr.ck". How do I do that? Any help would be greatly appreciated!

Best Regards, Kiril

You can use a regular expression to replace any characters that you don't want. Just put the characters that you want in a negative set:

str = Regex.Replace(str, "[^0-9A-Za-z _]", ".");

You should look into what encoding you are using to decode the text. It looks like you are not using the same encoding as was used to encode the text as the characters doesn't show up correctly.

If you want to see the actual characters (and I notice you are displaying the value in the immediate window in visual studio), you need to use a font that can display the characters. The presence of the square means the font you are using does not contain glyphs that match those characters. You can change the font used in various parts of Visual Studio in the options dialog.

Some more detail in this question here .

There is a Replace method on the string class. It's easiest to replace a single character with something else:

InnerText.Replace("ü", ".");

You can change several characters at the same time by chaining Replace:

InnerText.Replace("ü", "[ue]").Replace("Ö", "[Oe]");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM