简体   繁体   English

在python中解码html编码的字符串

[英]Decoding html encoded strings in python

I have the following string... 我有以下字符串......

"Scam, hoax, or the real deal, he’s gonna work his way to the bottom of the sordid tale, and hopefully end up with an arcade game in the process."

I need to turn it into this string... 我需要把它变成这个字符串......

Scam, hoax, or the real deal, he's gonna work his way to the bottom of the sordid tale, and hopefully end up with an arcade game in the process. 骗局,恶作剧或真正的交易,他将努力工作到肮脏的故事的底部,并希望在这个过程中最终得到一个街机游戏。

This is pretty standard HTML encoding and I can't for the life of me figure out how to convert it in python. 这是非常标准的HTML编码,我不能为我的生活弄清楚如何在python中转换它。

I found this: GitHub 我发现了这个: GitHub

And it's very close to working, however it does not output an apostrophe but instead some off unicode character. 并且它非常接近工作,但它不输出撇号,而是输出一些unicode字符。

Here is an example of the output from the GitHub script... 以下是GitHub脚本输出的示例...

Scam, hoax, or the real deal, heâs gonna work his way to the bottom of the sordid tale, and hopefully end up with an arcade game in the process. 骗局,恶作剧或真正的交易,他将在肮脏的故事的底部工作,并希望最终在这个过程中的街机游戏。

What's you're trying to do is called "HTML entity decoding" and it's covered in a number of past Stack Overflow questions, for example: 您正在尝试做什么称为“HTML实体解码”,它包含在许多过去的Stack Overflow问题中,例如:

Here's a code snippet using the Beautiful Soup HTML parsing library to decode your example: 这是使用Beautiful Soup HTML解析库解码您的示例的代码段:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from BeautifulSoup import BeautifulSoup

string = "Scam, hoax, or the real deal, he’s gonna work his way to the bottom of the sordid tale, and hopefully end up with an arcade game in the process."
s = BeautifulSoup(string,convertEntities=BeautifulSoup.HTML_ENTITIES).contents[0]
print s

Here's the output: 这是输出:

Scam, hoax, or the real deal, he's gonna work his way to the bottom of the sordid tale, and hopefully end up with an arcade game in the process. 骗局,恶作剧或真正的交易,他将努力工作到肮脏的故事的底部,并希望在这个过程中最终得到一个街机游戏。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM