繁体   English   中英

如何将String转换为BeautifulSoup对象?

[英]How to convert a String into a BeautifulSoup object?

我正在尝试抓取新闻网站,我需要更改一个参数。 我用下一个代码替换了它:

while i < len(links):
    conn = urllib.urlopen(links[i])
    html = conn.read()
    soup = BeautifulSoup(html)
    t = html.replace('class="row bigbox container mi-df-local locked-single"', 'class="row bigbox container mi-df-local single-local"')
    n = str(t.find("div", attrs={'class':'entry cuerpo-noticias'}))
    print(p)

问题是“t”类型是字符串,而使用属性查找只适用于<class 'BeautifulSoup.BeautifulSoup'>类型。 你知道怎么把“t”转换成那种类型吗?

只需在解析之前进行替换:

html = html.replace('class="row bigbox container mi-df-local locked-single"', 'class="row bigbox container mi-df-local single-local"')
soup = BeautifulSoup(html, "html.parser")

请注意,也可以(我甚至会说首选 )解析HTML,定位元素并修改 Tag实例的属性 ,例如:

soup = BeautifulSoup(html, "html.parser")
for elm in soup.select(".row.bigbox.container.mi-df-local.locked-single"):
    elm["class"] = ["row", "bigbox", "container", "mi-df-local", "single-local"]

请注意, class是一个特殊的多值属性 - 这就是我们将值设置为单个类列表的原因。

演示:

from bs4 import BeautifulSoup

html = """
<div class="row bigbox container mi-df-local locked-single">test</div>
"""

soup = BeautifulSoup(html, "html.parser")
for elm in soup.select(".row.bigbox.container.mi-df-local.locked-single"):
    elm["class"] = ["row", "bigbox", "container", "mi-df-local", "single-local"]

print(soup.prettify())

现在看看div元素类是如何更新的:

<div class="row bigbox container mi-df-local single-local">
 test
</div>

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM