![](/img/trans.png)
[英]How to extract text from html in Python while ignoring a certain tag
[英]How to get certain text from html tag on python?
我正在从 API 制作 Python md5 解密器,但问题是 API 正在发回 HTML 反馈。 如何获取<font color=green>
之间的文本?
{"error":0,"msg":"<font color=blue><b>Live</b></font><font color=green>Jumpman#23</font> | [MD5 Decrypt] .S/C0D3"}
我建议使用 HTML 解析器作为Beautiful Soup :
>>> from bs4 import BeautifulSoup
>>> d = {"error":0,"msg":"<font color=blue><b>Live</b></font><font color=green>Jumpman#23</font> | [MD5 Decrypt] .S/C0D3"}
>>> soup = BeautifulSoup(d['msg'], 'html.parser')
>>> soup.font.attrs
{'color': 'blue'}
您将获得一个包含键、值解析作为属性名称、值的字典。
获取文本"Jumpman#23"
>>> soup.findAll("font", {"color": "green"})[0].contents[0]
'Jumpman#23'
如果您知道目标文本正好是<font color=green>
,那么您可以使用简单的字符串操作:
msg = "<font color=blue><b>Live</b></font><font color=green>Jumpman#23</font> | [MD5 Decrypt] .S/C0D3"
start_pattern = "<font color=green>"
stop_pattern = "<"
start_index = msg.find(start_pattern) + len(start_pattern)
stop_index = start_index + msg[start_index:].find(stop_pattern)
print msg[start_index:stop_index]
您可以使用bs4
和相邻的兄弟组合器作为字体标签
from bs4 import BeautifulSoup as bs
s = {"error":0,"msg":"<font color=blue><b>Live</b></font><font color=green>Jumpman#23</font> | [MD5 Decrypt] .S/C0D3"}
soup = bs(s['msg'], 'lxml')
data = soup.select_one('font + font').text
print(data)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.