简体   繁体   English

Python:带重音符的字节为字符串

[英]Python: Bytes to string with accented characters

I have git reading the file name "ùàèòùèòùùè.txt" as a simple string of bytes, so when I ask git for a list of commited files, I'm given the following string: 我将git读取为一个简单的字节字符串,读取文件名“ùàèòùèòùùùùè.txt”,因此,当我向git请求提交的文件列表时,会得到以下字符串:

r"\303\271\303\240\303\250\303\262\303\271\303\250\303\262\303\271\303\271\303\250.txt"

How can I use Python 2 to have it back to "ùàèòùèòùùè.txt"? 如何使用Python 2将其还原回“ùàèòùèèòùùùè.txt”?

If the git format contains literal \\ddd sequences (so up to 4 characters per filename byte) you can use the string_escape (Python 2) or unicode_escape (Python 3) codecs to have Python interpret the escape sequences. 如果git格式包含文字\\ddd序列(每个文件名字节最多4个字符),则可以使用string_escape (Python 2)或unicode_escape (Python 3)编解码器让Python解释转义序列。

You'll get UTF-8 data; 您将获得UTF-8数据; my terminal is set to interpret UTF-8 directly: 我的终端设置为直接解释UTF-8:

>>> git_data = r"\303\271\303\240\303\250\303\262\303\271\303\250\303\262\303\271\303\271\303\250.txt"
>>> git_data.decode('string_escape')
'\xc3\xb9\xc3\xa0\xc3\xa8\xc3\xb2\xc3\xb9\xc3\xa8\xc3\xb2\xc3\xb9\xc3\xb9\xc3\xa8.txt'
>>> print git_data.decode('string_escape')
ùàèòùèòùùè.txt

You'd want to decode that as UTF-8 to get text: 您需要将其解码为UTF-8以获得文本:

>>> git_data.decode('string_escape').decode('utf8')
u'\xf9\xe0\xe8\xf2\xf9\xe8\xf2\xf9\xf9\xe8.txt'
>>> print git_data.decode('string_escape').decode('utf8')
ùàèòùèòùùè.txt

In Python 3, the unicode_escape codec gives you (Unicode) text so an extra encode to Latin-1 is required to make it bytes again: 在Python 3中, unicode_escape编解码器为您提供(Unicode)文本,因此需要对Latin-1进行额外的编码才能再次使其成为字节:

>>> git_data = rb"\303\271\303\240\303\250\303\262\303\271\303\250\303\262\303\271\303\271\303\250.txt"
>>> git_data.decode('unicode_escape').encode('latin1').decode('utf8')
'ùàèòùèòùùè.txt'

Note that git_data is a bytes object before decoding. 注意git_data是解码前的bytes对象。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM