简体   繁体   English

读取时Python编码问题,但键入时不编码

[英]Python encoding problem when reading but not when typing

I'm reading some strings from a text file. 我正在从文本文件中读取一些字符串。 Some of these strings have some "strange" characters, eg "\\xc3\\xa9comiam". 其中一些字符串有一些“奇怪”的字符,例如“\\ xc3 \\ xa9comiam”。 If I copy that string and paste it into a variable, I can convert it to readable characters: 如果我复制该字符串并将其粘贴到变量中,我可以将其转换为可读字符:

string = "\xc3\xa9comiam"
print(string.encode("raw_unicode_escape").decode('utf-8'))
écomiam

but if I read it from the file, it doesn't work: 但如果我从文件中读取它,它不起作用:

with open(fn) as f:
       for string in f.readlines():
          print(string.encode("raw_unicode_escape").decode('utf-8'))
\xc3\xa9comiam

It seems the solution must be pretty easy, but I can't find it. 似乎解决方案必须非常简单,但我找不到它。 What can I do? 我能做什么?

Thanks! 谢谢!

Those not unicode-escape ones - like the name suggests, that handles Unicode sequences like but not \\xe9 . 那些不是unicode-escape那些 - 就像名字所暗示的那样,处理Unicode序列,如而不是\\xe9

What you have is a UTF-8 enooded sequence. 你拥有的是UTF-8编码序列。 The way to decode that is to get it into a bytes sequence which can then be decoded to a Unicode string. 解码的方法是将其转换为bytes序列,然后可以将其解码为Unicode字符串。

# Let's not shadow the string library
s = "\xc3\xa9comiam"
print(bytes(s, 'latin-1').decode('utf-8'))

The 'latin-1' trick is a dirty secret which simply converts every byte to a character with the same character code. 'latin-1'技巧是一个肮脏的秘密,它只是将每个字节转换为具有相同字符代码的字符。

For your file, you could open it in binary mode so you don't have to explictly convert it to bytes , or you could simply apply the same conversion to the strings you read. 对于您的文件,您可以在二进制模式下打开它,这样您就不必将其明确地转换为bytes ,或者您可以简单地将相同的转换应用于您阅读的字符串。

Thanks everyone for your help, 谢谢大家的帮助,

I think, I've found a solution (not very elegant, but it does the trick). 我想,我已经找到了一个解决方案(不是很优雅,但确实如此)。

print(bytes(tm.strip(), "utf-8").decode("unicode_escape").encode("raw_unicode_escape").decode('utf-8'))

Thanks! 谢谢!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM