简体   繁体   English

.NET:StreamReader 无法识别 ° 字符

[英].NET : StreamReader does not recognize ° characters

I am trying to run a RegEx to locate degree characters (°|º degrees in addition to locating the other form of ' --> ´).我正在尝试运行 RegEx 来定位度数字符(°|º 度数以及定位其他形式的 '--> ´)。 I am reading latitude and longitude DMS coordinates like this one: 12º30'23.256547"S我正在读取纬度和经度 DMS 坐标,如下所示:12º30'23.256547"S

The problem is with the way I am reading the file as I can manually inject a string like the one below (format is latitude, longitude, description):问题在于我读取文件的方式,因为我可以手动注入如下所示的字符串(格式为纬度、经度、描述):

const string myTestString = @"12º30'23.256547""S, 12º30'23.256547""W, Somewhere"; const string myTestString = @"12º30'23.256547""S, 12º30'23.256547""W, 某处";

and my regex is matching as expected - I can also see the º values where, when I am using the streamreader, I see a � for all unrecognized characters (the º symbol being included as one of those unrecognized characters)并且我的正则表达式按预期匹配 - 我还可以看到 º 值,当我使用流式阅读器时,我看到所有无法识别的字符都有一个 �(º 符号作为这些无法识别的字符之一包含在内)

I've tried:我试过了:

            var sr = new StreamReader(dlg.File.OpenRead(), Encoding.UTF8);
            var sr = new StreamReader(dlg.File.OpenRead(), Encoding.Unicode);
            var sr = new StreamReader(dlg.File.OpenRead(), Encoding.BigEndianUnicode);

in addition to the default ASCII.除了默认的 ASCII。

Either way I read the file, I end up with these special characters.无论我以何种方式读取文件,我最终都会得到这些特殊字符。 Any advice would be greatly appreciated!!任何建议将不胜感激!!

You've tried various encodings... but presumably not the right one. 您尝试了各种编码...但是大概不是正确的编码。 You shouldn't just be guessing at encodings - find out what encoding it's really using, and use that. 您不应该只是猜测编码-找出它真正使用的编码,然后使用它。 StreamReader itself is absolutely fine. StreamReader本身绝对可以。 It can deal with any encoding you give it, but it does have to match the encoding used when writing the file out. 它可以处理您提供的任何编码,但必须与写出文件时使用的编码匹配。

Where does the file come from? 文件来自哪里? What has written it out? 是什么写出来的?

If it was written out with Notepad, it may well be using Encoding.Default , which is the system's default encoding (ie it will vary from machine to machine). 如果是用记事本写的,则很可能是使用Encoding.Default ,这是系统的默认编码(即,机器之间的编码会有所不同)。 If at all possible, change whatever is creating the file to use a single standard encoding - personally I'm a big fan of UTF-8. 如果有可能,请更改正在创建文件的任何内容,以使用单一标准编码-就我个人而言,我是UTF-8的忠实拥护者。

You need to identify what encoding the file was saved in, and use that when you read it with your streamreader. 您需要确定文件的编码格式,并在通过流阅读器读取文件时使用该编码。

If it is created using a regular texteditor I'm guessing the default encoding is either Windows-1252 or ISO-8859-1. 如果使用常规文本编辑器创建,则我猜默认编码为Windows-1252或ISO-8859-1。

The degree symbol is 0xBA in ISO-8859-1 and goes outside of the 7bit ASCII table. 在ISO-8859-1中,度数符号为0xBA,位于7位ASCII表之外。 I don't know how the Encoding.ASCII interprets it. 我不知道Encoding.ASCII如何解释它。

Otherwise, it might be easier to just make sure to save the file as UTF-8 if you have that possibility. 否则,如果可能的话,仅确保将文件另存为UTF-8可能会更容易。

The reason that it works when you define the string in code is because .NET will always work with strings with it's internal encoding (UCS-2?), so what StreamReader do is convert the bytes it is reading from the file into the internal encoding using the encoding that you specify when you create the StreamReader. 当您在代码中定义字符串时,它起作用的原因是因为.NET将始终使用内部编码(UCS-2?)来处理字符串,因此StreamReader所做的就是将其从文件中读取的字节转换为内部编码。使用您在创建StreamReader时指定的编码。

You can open your file being read in an editor like Notepad++ to see the Encoding type of the file and change it to UTF-8. Then reading as you are doing 'var sr = new StreamReader(dlg.File.OpenRead(), Encoding.UTF8);'您可以在 Notepad++ 等编辑器中打开正在读取的文件,以查看文件的编码类型并将其更改为 UTF-8。然后像您一样读取 'var sr = new StreamReader(dlg.File.OpenRead(), Encoding.UTF8 );' will work.将工作。 I could read degree symbol by doing this我可以通过这样做来读取度数符号

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM