如何实现类似于始终以UTF-8编码返回的类的文件，而不管文件编码如何？

Question

I have made a module that detects the encoding of a file. 我已经制作了一个模块来检测文件的编码。 I want to be able to able to give file path and encoding as inputs to the class and always be able to get back 'utf-8' when I process the contents of the file. 我希望能够将文件路径和编码作为类的输入，并且在处理文件内容时始终能够返回'utf-8'。

For example something like this 例如像这样的东西

handler = UnicodeWrapper(file_path, encoding='ISO-8859-2')

for line in handler:
   # need the line to be encoded in utf-8
   process(line)

I can not understand why there are a million types of encodings yet. 我无法理解为什么还有一百万种类型的编码。 But I want to write an interface that always returns unicode. 但我想写一个总是返回unicode的接口。

Is there a library to do this already? 有没有图书馆可以做到这一点？

Answer 1

Based on this answer , I think the following might suit your needs: 根据这个答案，我认为以下内容可能适合您的需求：

import io

class UnicodeWrapper(object):
    def __init__(self, filename):
        self._filename = filename

    def __iter__(self):
        with io.open(self._filename,'r', encoding='utf8') as f:
            return iter(f.readlines())

if __name__ == '__main__':
    filename = r'...'

    handler = UnicodeWrapper(filename)

    for line in handler:
       print(line)

Edit 编辑

In Python 2, you can assert that each line is encoded in UTF-8 using something like this: 在Python 2中，您可以断言每行使用以下内容以UTF-8编码：

if __name__ == '__main__':
    filename = r'...'

    handler = UnicodeWrapper(filename)

    for line in handler:
        try:
            line.decode('utf-8')
            # process(line)
        except UnicodeDecodeError:
            print('Not encoded in UTF-8')

如何实现类似于始终以UTF-8编码返回的类的文件，而不管文件编码如何？

问题描述

1 个解决方案

解决方案1
0 2017-01-05 12:48:17

如何实现类似于始终以UTF-8编码返回的类的文件，而不管文件编码如何？

问题描述

1 个解决方案

解决方案1 0 2017-01-05 12:48:17

解决方案1
0 2017-01-05 12:48:17