Python：带重音符的字节为字符串

Question

I have git reading the file name "ùàèòùèòùùè.txt" as a simple string of bytes, so when I ask git for a list of commited files, I'm given the following string: 我将git读取为一个简单的字节字符串，读取文件名“ùàèòùèòùùùùè.txt”，因此，当我向git请求提交的文件列表时，会得到以下字符串：

r"\303\271\303\240\303\250\303\262\303\271\303\250\303\262\303\271\303\271\303\250.txt"

How can I use Python 2 to have it back to "ùàèòùèòùùè.txt"? 如何使用Python 2将其还原回“ùàèòùèèòùùùè.txt”？

Answer 1

If the git format contains literal \\ddd sequences (so up to 4 characters per filename byte) you can use the string_escape (Python 2) or unicode_escape (Python 3) codecs to have Python interpret the escape sequences. 如果git格式包含文字\\ddd序列（每个文件名字节最多4个字符），则可以使用string_escape （Python 2）或unicode_escape （Python 3）编解码器让Python解释转义序列。

You'll get UTF-8 data; 您将获得UTF-8数据； my terminal is set to interpret UTF-8 directly: 我的终端设置为直接解释UTF-8：

>>> git_data = r"\303\271\303\240\303\250\303\262\303\271\303\250\303\262\303\271\303\271\303\250.txt"
>>> git_data.decode('string_escape')
'\xc3\xb9\xc3\xa0\xc3\xa8\xc3\xb2\xc3\xb9\xc3\xa8\xc3\xb2\xc3\xb9\xc3\xb9\xc3\xa8.txt'
>>> print git_data.decode('string_escape')
ùàèòùèòùùè.txt

You'd want to decode that as UTF-8 to get text: 您需要将其解码为UTF-8以获得文本：

>>> git_data.decode('string_escape').decode('utf8')
u'\xf9\xe0\xe8\xf2\xf9\xe8\xf2\xf9\xf9\xe8.txt'
>>> print git_data.decode('string_escape').decode('utf8')
ùàèòùèòùùè.txt

In Python 3, the unicode_escape codec gives you (Unicode) text so an extra encode to Latin-1 is required to make it bytes again: 在Python 3中， unicode_escape编解码器为您提供（Unicode）文本，因此需要对Latin-1进行额外的编码才能再次使其成为字节：

>>> git_data = rb"\303\271\303\240\303\250\303\262\303\271\303\250\303\262\303\271\303\271\303\250.txt"
>>> git_data.decode('unicode_escape').encode('latin1').decode('utf8')
'ùàèòùèòùùè.txt'

Note that git_data is a bytes object before decoding. 注意git_data是解码前的bytes对象。

Python：带重音符的字节为字符串

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-06-17 11:28:09

Python：带重音符的字节为字符串

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-06-17 11:28:09

解决方案1
2 已采纳 2015-06-17 11:28:09