[英]How to convert u'\x96' to u'–' in python
I'm porting content from an old Wordpress blog to Mezzanine . 我正在将内容从旧的Wordpress博客移植到夹层 。 I was given a json dump of the database and the posts are littered with special characters that look like this: \\x96
among otherwise unescaped html. 给了我一个数据库的json转储,并且帖子中散布着特殊字符,如下所示: \\x96
,否则为未转义的html。
If I manually replace
the slash with &#
and append a semicolon the character renders correctly 如果我手动replace
斜杠replace
为&#
并附加分号,则字符将正确呈现
so \\x96
to –
因此\\x96
至–
escaped UTF-8(hex) to HTML Entity(hex) 将UTF-8(十六进制)转义为HTML实体(十六进制)
How to do this in Python? 如何在Python中做到这一点?
If –
如果–
is also acceptable, you can use: 也可以,您可以使用:
>>> u'\x96'.encode('ascii', 'xmlcharrefreplace')
'–'
which is even called out in the documentation 1 . 甚至在文档 1中也提到了这一点。
1 (although not very clearly)... 1 (虽然不是很清楚)...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.