简体   繁体   English

解码字节失败-IronPython

[英]Failed to decode bytes - IronPython

I have some files with unicode data, the following code works fine when working with CPython to read those files, whereas the code crashes on IronPython saying "failed to decode bytes at index 67" 我有一些带有unicode数据的文件,以下代码在使用CPython读取这些文件时工作正常,而代码在IronPython上崩溃,提示“无法解码索引67的字节”

for f in self.list_of_files:
            all_words_in_file = []

            with codecs.open(f,encoding="utf-8-sig") as file_obj:
                for line in file_obj:
                    all_words_in_file.extend(line.split(" "))

            #print "Normalising unicode strings"

            normal_list = []
            #gets all the words and remove duplicate words 
            #the list will contain unique normalized words
            for l in all_words_in_file:
                    normal_list.append(normalize('NFKC',l))

            file_listing.update({f:normal_list})
        return file_listing

I cannot understand the reason, is there another way to read unicode data in ironpython? 我不明白原因,是否有另一种方法可以在ironpython中读取unicode数据?

How about this one: 这个怎么样:

def lines(filename):
    f = open(filename, "rb")
    yield f.readline()[3:].strip().decode("utf-8")
    for line in f:
        yield line.strip().decode("utf-8")
    f.close()

for line in lines("text-utf8-with-bom.txt"):
    all_words_in_file.extend(line.split(" "))

I have also filed a IronPython bug https://ironpython.codeplex.com/workitem/34951 我还提交了IronPython错误https://ironpython.codeplex.com/workitem/34951

As long as you are feeding entire lines to decode, things will be ok. 只要您输入整行进行解码,一切都会好起来的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM