简体   繁体   中英

How to read and store string in UTF-8 format in C#?

I have a file with URLs, one of which is http://en.wikipedia.org/wiki/São_Paulo . Note that 'ã'. When I read the URLs (in C#) and try to print it, it appears as http://en.wikipedia.org/wiki/S?o_Paulo .

I tried reading the URLs as following:

List<string> urls = System.IO.File.ReadAllLines(wikiURL_FilePath, Encoding.UTF8).ToList();

Note that I have passed second argument to read it in UTF8 format, but still the problem is not rectified. How can I read and store the string in correct form?

The data you have shown is simply not UTF-8, despite having a UTF-8 BOM; the UTF-8 for São is 53-C3-A3-6F; you have 53-E3-6F, which is... the right unicode code-points for basic multi-lingual plane data, but incorrectly encoded to disk as UTF-8. You probably need to fix the code that wrote this file, or: agree on what the encoding is (it could be a single-byte code-page, but you need to agree which, else everything falls apart).

Likely looking encodings (if we take away the BOM):

  • utf-7
  • windows-1252
  • windows-1254
  • iso-8859-1
  • iso-8859-4
  • iso-8859-9
  • iso-8859-15

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM