[英]How can I implement a file like class that always returns in UTF-8 encoding irrespective of file encoding?
I have made a module that detects the encoding of a file. 我已经制作了一个模块来检测文件的编码。 I want to be able to able to give file path and encoding as inputs to the class and always be able to get back 'utf-8' when I process the contents of the file.
我希望能够将文件路径和编码作为类的输入,并且在处理文件内容时始终能够返回'utf-8'。
For example something like this 例如像这样的东西
handler = UnicodeWrapper(file_path, encoding='ISO-8859-2')
for line in handler:
# need the line to be encoded in utf-8
process(line)
I can not understand why there are a million types of encodings yet. 我无法理解为什么还有一百万种类型的编码。 But I want to write an interface that always returns unicode.
但我想写一个总是返回unicode的接口。
Is there a library to do this already? 有没有图书馆可以做到这一点?
Based on this answer , I think the following might suit your needs: 根据这个答案 ,我认为以下内容可能适合您的需求:
import io
class UnicodeWrapper(object):
def __init__(self, filename):
self._filename = filename
def __iter__(self):
with io.open(self._filename,'r', encoding='utf8') as f:
return iter(f.readlines())
if __name__ == '__main__':
filename = r'...'
handler = UnicodeWrapper(filename)
for line in handler:
print(line)
Edit 编辑
In Python 2, you can assert that each line is encoded in UTF-8 using something like this: 在Python 2中,您可以断言每行使用以下内容以UTF-8编码:
if __name__ == '__main__':
filename = r'...'
handler = UnicodeWrapper(filename)
for line in handler:
try:
line.decode('utf-8')
# process(line)
except UnicodeDecodeError:
print('Not encoded in UTF-8')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.