简体   繁体   中英

c# encoding problems (question marks) while reading file from StreamReader

I've a problem while reading a .txt file from my Windows Phone app.

I've made a simple app, that reads a stream from a .txt file and prints it.

Unfortunately I'm from Italy and we've many letters with accents. And here's the problem, in fact all accented letters are printed as a question mark.

Here's the sample code:

var resourceStream = Application.GetResourceStream(new Uri("frasi.txt",UriKind.RelativeOrAbsolute));
            if (resourceStream != null)
            {
                {
                    //System.Text.Encoding.Default, true
                    using (var reader = new StreamReader(resourceStream.Stream, System.Text.Encoding.UTF8))
                    {
                        string line;
                        line = reader.ReadLine();

                        while (line != null)
                        {
                            frasi.Add(line);
                            line = reader.ReadLine();       
                        } 
                    }
                }

So, I'm asking you how to avoid this matter.

All the best.

[EDIT:] Solution: I didn't make sure the file was encoded in UTF-8- I saved it with the correct encoding and it worked like a charm. thank you Oscar

You need to use Encoding.Default. Change:

using (var reader = new StreamReader(resourceStream.Stream, System.Text.Encoding.UTF8))

to

using (var reader = new StreamReader(resourceStream.Stream, System.Text.Encoding.Default))

You have commented out is what you should be using if you do not know the exact encoding of your source data. System.Text.Encoding.Default uses the encoding for the operating system's current ANSI code page and provides the best chance of a correct encoding. This should detect the current region settings/encoding and use those.

However, from MSDN the warning:

Different computers can use different encodings as the default, and the default encoding can even change on a single computer. Therefore, data streamed from one computer to another or even retrieved at different times on the same computer might be translated incorrectly. In addition, the encoding returned by the Default property uses best-fit fallback to map unsupported characters to characters supported by the code page. For these two reasons, using the default encoding is generally not recommended. To ensure that encoded bytes are decoded properly, your application should use a Unicode encoding, such as UTF8Encoding or UnicodeEncoding, with a preamble. Another option is to use a higher-level protocol to ensure that the same format is used for encoding and decoding.

Despite this, in my experience with data coming from a number of different source and various different cultures, this is the one that provides the most consistent results out-of-the-box... Esp. for the case of diacritic marks which are turned to question marks when moving from ANSI to UTF8.

I hope this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM