简体   繁体   English

Python 从 base64 解码时出现重音问题

[英]Python problem with accents when decoding from base64

I'm getting data from a website and this is an example of a sentence I retrieved: PHA+Q29ycmlnJmVhY3V0ZTtzIGV4ZXJjaWNlcyBlbnRyYWluZW1lbnQgY2hhcGl0cmUgbW91dmVtZW50IGV0IGZvcmNlczwvcD4K我从网站获取数据,这是我检索到的句子示例: PHA+Q29ycmlnJmVhY3V0ZTtzIGV4ZXJjaWNlcyBlbnRyYWluZW1lbnQgY2hhcGl0cmUgbW91dmVtZW50IGV0IGZvcmNlczwvcD4K

The sentence is encoded with base64 so I thought about decoding it and then encoding it back to utf-8 with python:该句子是用 base64 编码的,所以我考虑对其进行解码,然后用 python 将其编码回 utf-8:

import base64

sentence = "PHA+Q29ycmlnJmVhY3V0ZTtzIGV4ZXJjaWNlcyBlbnRyYWluZW1lbnQgY2hhcGl0cmUgbW91dmVtZW50IGV0IGZvcmNlczwvcD4K"
base64.b64decode(sentence).decode("utf-8")

The problem is that instead of looking like this: "Corrigés exercices entrainement chapitre mouvement et forces" , it looks like this: "Corrigés exercices entrainement chapitre mouvement et forces" .问题在于,不是这样: "Corrigés exercices entrainement chapitre mouvement et forces" ,而是这样:“Corrigés exercices entrainement chapitre mouvement et "Corrigés exercices entrainement chapitre mouvement et forces"

As you can see, the accents are completely messed up.如您所见,口音完全混乱。

I'm using python 3我正在使用python 3

I do not have access to the decoded sentence using the API (I only have the base64 encoded one).我无法使用 API 访问解码后的句子(我只有 base64 编码的句子)。

Thanks for you help.谢谢你的帮助。

In case someone doesn't know about HTML entities (just like me) and needs the answer.如果有人不知道 HTML 实体(就像我一样)并且需要答案。

Thanks to Amadan's comment, I just learned that the strange thing I got instead of my accent was called an HTML entity.感谢 Amadan 的评论,我才知道我得到的奇怪的东西不是我的口音,而是被称为 HTML 实体。

In order to get back my accent, I needed to unescape it:为了找回我的口音,我需要取消它:

import html

print(html.unescape("Corrigés exercices entrainement chapitre mouvement et forces"))

>> Corrigés exercices entrainement chapitre mouvement et forces

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM