简体   繁体   中英

Can't read international characters from files

I am trying to read portuguese characters from files, and keep getting into problems.

I have the following C# code (for testing purposes):

var streamReader = new StreamReader("file.txt");

while (streamReader.Peek() >= 0)
{
  var buffer = new char[1];
  streamReader.Read(buffer, 0, buffer.Length);
  Console.Write(buffer[0]);
}

It reads each character in the file and then outputs it to the console. The file contains the following: "cãsa". The output in the console is: "c?sa".

What am I doing wrong?

You need to read the file using the correct encoding - by default the file will be read as UTF-8, if that's not the right encoding, you will get such issues.

In this example, I am using an constructor overload that takes an encoding, in this case UnicodeEncoding , which is UTF-16:

using(var streamReader = new StreamReader("file.txt", Encoding.UnicodeEncoding))
{
    while (streamReader.Peek() >= 0)
    {
      var buffer = new char[1];
      streamReader.Read(buffer, 0, buffer.Length);
      Console.Write(buffer[0]);
    }
}

In this example, I am using codepage 860, corresponding to Portuguese:

using(var streamReader = new StreamReader("file.txt", Encoding.GetEncoding(860)))
{
    while (streamReader.Peek() >= 0)
    {
      var buffer = new char[1];
      streamReader.Read(buffer, 0, buffer.Length);
      Console.Write(buffer[0]);
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM