简体   繁体   English

如何使C#抛出解码异常?

[英]How can I make C# throw decoding exceptions?

I want my C# application (which has a GUI) to help the user choose between "unicode (utf-8)" and "legacy (cp1252)". 我希望我的C#应用​​程序(具有GUI)帮助用户在“unicode(utf-8)”和“legacy(cp1252)”之间进行选择。 I would like to give the user two independent true/false readings regarding whether the file can be 'successfully' (though not necessarily correctly) read in in those two formats with no loss of detail. 我想给用户两个独立的真/假读数,关于文件是否可以“成功”(尽管不一定正确)以这两种格式读入而不会丢失细节。

When I tried the following in C#, it didn't work. 当我在C#中尝试以下操作时,它无效。 That is, it seems to always return true, even if I call it on a utf-8 text file that I know contains non-Roman characters. 也就是说,它似乎总是返回true,即使我在一个我知道包含非罗马字符的utf-8文本文件上调用它。

[EDIT: Actually, I shouldn't have thought this should fail. [编辑:实际上,我不应该认为这应该失败。 Could be one of those reasonable successes that happens to be incorrect, since most (all?) byte streams are also valid cp1252. 可能是那些合理的成功之一恰好是不正确的,因为大多数(全部?)字节流也是有效的cp1252。 Testing the other direction does find invalid utf-8 as the Python code below does.] 测试另一个方向确实找到了无效的utf-8,如下面的Python代码所做的那样。

Eg CanBeReadAs("nepali.txt", Encoding.GetEncoding(1252)) ought to return false, but it returns true. 例如,CanBeReadAs(“nepali.txt”,Encoding.GetEncoding(1252))应该返回false,但它返回true。

public static bool CanBeReadAs(string filePath, Encoding encoding)
    {
        // make it strict:
        encoding = Encoding.GetEncoding(encoding.CodePage, EncoderFallback.ExceptionFallback, DecoderFallback.ExceptionFallback);
        using (var r = new StreamReader(filePath, encoding, false))
        {
            try
            {
                r.ReadToEnd();
            }
            catch (Exception e)
            {
                //swallow
                return false;
            }
        }
        return true;
    }

I've also tried it with "string s = r.ReadToEnd();" 我也尝试过“string s = r.ReadToEnd();” just to make sure that it really is being forced to decode the data, but that doesn't seem to affect anything. 只是为了确保它真的被迫解码数据,但这似乎并没有影响任何事情。

What am I doing wrong? 我究竟做错了什么?

Note: If I need to be doing anything special to deal with BOMs, please let me know that too. 注意:如果我需要做一些特殊处理BOM的事情,请告诉我。 I'm inclined to ignore them if that's simple. 如果这很简单,我倾向于忽略它们。 (Some of these files have mixed encodings, BTW, though I would like to think that anything actually beginning with a BOM is pure unicode.) (其中一些文件有混合编码,顺便说一下,虽然我想认为实际上从BOM开始的任何东西都是纯粹的unicode。)

Here is a Python script I'd created, which uses the same strategy and works fine: 这是我创建的Python脚本,它使用相同的策略并且工作正常:

def bad_encoding(filename, enc='utf-8', max=9):
'''Return a list of up to max error strings for lines in the file not encoded in the specified encoding. 

Otherwise, return an empty list.'''

errors = []
line = None
with open(filename, encoding=enc) as f:
    i = 0
    while True:
        try:
            i += 1
            line = f.readline()
        except UnicodeDecodeError:
            errors.append('UnicodeDecodeError: Could not read line {} as {}.'.format(i, enc))
        if not line or len(errors) > max:
            break

return errors

The static Encoding instances available through the Encoding class (Ascii, UTF8, Unicode, etc.) all try to make a best effort to decode the input bytes and do not throw if they fail. 通过Encoding类(Ascii,UTF8,Unicode等)可用的静态编码实例都试图尽最大努力解码输入字节,如果失败则不抛出。

To create an Encoding with a specific encode/decode behavior you should use the overload of Encoding.GetEncoding that takes EncoderFallback/DecoderFallback parameters. 要创建具有特定编码/解码行为的Encoding.GetEncoding ,您应该使用带有EncoderFallback / DecoderFallback参数的Encoding.GetEncoding的重载。 I tried creating instances of various encodings (AsciiEncoding, UTF8Endcoding) but they are read only, so setting the fallback options always threw an InvalidOperationException. 我尝试创建各种编码的实例(AsciiEncoding,UTF8Endcoding),但它们是只读的,因此设置回退选项始终会抛出InvalidOperationException。 In your case, to create an instance that throws when decoding fails, try: 在您的情况下,要创建在解码失败时抛出的实例,请尝试:

encoding = Encoding.GetEncoding(encoding.CodePage, EncoderFallback.ExceptionFallback, DecoderFallback.ExceptionFallback);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM