[英]How to convert a String into a BeautifulSoup object?
我正在尝试抓取新闻网站,我需要更改一个参数。 我用下一个代码替换了它:
while i < len(links):
conn = urllib.urlopen(links[i])
html = conn.read()
soup = BeautifulSoup(html)
t = html.replace('class="row bigbox container mi-df-local locked-single"', 'class="row bigbox container mi-df-local single-local"')
n = str(t.find("div", attrs={'class':'entry cuerpo-noticias'}))
print(p)
问题是“t”类型是字符串,而使用属性查找只适用于<class 'BeautifulSoup.BeautifulSoup'>
类型。 你知道怎么把“t”转换成那种类型吗?
只需在解析之前进行替换:
html = html.replace('class="row bigbox container mi-df-local locked-single"', 'class="row bigbox container mi-df-local single-local"')
soup = BeautifulSoup(html, "html.parser")
请注意,也可以(我甚至会说首选 )解析HTML,定位元素并修改 Tag
实例的属性 ,例如:
soup = BeautifulSoup(html, "html.parser")
for elm in soup.select(".row.bigbox.container.mi-df-local.locked-single"):
elm["class"] = ["row", "bigbox", "container", "mi-df-local", "single-local"]
请注意, class
是一个特殊的多值属性 - 这就是我们将值设置为单个类列表的原因。
演示:
from bs4 import BeautifulSoup
html = """
<div class="row bigbox container mi-df-local locked-single">test</div>
"""
soup = BeautifulSoup(html, "html.parser")
for elm in soup.select(".row.bigbox.container.mi-df-local.locked-single"):
elm["class"] = ["row", "bigbox", "container", "mi-df-local", "single-local"]
print(soup.prettify())
现在看看div
元素类是如何更新的:
<div class="row bigbox container mi-df-local single-local">
test
</div>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.