繁体   English   中英

根据元素字符串中的特定单词搜索HTML元素

[英]Searching for HTML Elements based on a specific word in the element string

尝试创建一个程序,该程序可以使用Beautiful Soup模块在某些指定的元素中查找和替换标签。 但是,我很难弄清楚如何通过在元素字符串中找到的特定单词“搜索”来“查找”这些元素。 假设我可以通过指定的字符串单词来“查找”这些元素,然后“解开”元素的“ p”标签,然后将其“包装”在新的“ h1”标签中。

这是一些示例HTML代码作为输入:

<p> ExampleStringWord#1 needs to “find” this entire element based on the "finding" of the first word </p>
<p> Example#2  this element ignored </p>
<p> ExampleStringWord#1 needs to find this entire element as well because the first word of this string is what I’m “searching” for, even though the wording after the first word in the string is different <p>

到目前为止,这是我的代码(通过“ ExampleStringWord#1”搜索):

for h1_tag in soup.find_all(string="ExampleStringWord#1"):
            soup.p.wrap(soup.h1_tag("h1"))

如果使用上面的示例HTML输入,我希望代码如下所示:

<h1> ExampleStringWord#1 needs to “find” this entire element based on the "finding" of the first word </h1>
<p> Example#2  this element ignored </p>
<h1> ExampleStringWord#1 needs to find this entire element as well because the first word of this string is what I’m “searching” for, even though the wording after the first word in the string is different <h1>

但是,我的代码仅查找专门包含“ ExampleStringWord#1”的元素,而将排除包含任何超出此范围的字符串的元素。 我坚信,我将需要以某种方式使用正则表达式来查找我指定的单词的元素(以及随后的任何字符串措词)。 但是,我对正则表达式不是很熟悉,因此我不确定如何将其与BeautifulSoup模块结合使用。

另外–我已经查看了Beautiful Soup中的文档,以将正则表达式作为过滤器进行传递( https://www.crummy.com/software/BeautifulSoup/bs4/doc/#a-regular-expression ),但是我在我的情况下无法使其正常工作。 我还查看了其他与通过正则表达式传递正则表达式相关的文章,但没有发现任何可以解决我的问题的文章。 任何帮助表示赞赏!

如果要用指定的子字符串查找p元素(请注意re.compile()部分),然后用h1替换元素名称,该怎么办:

import re

from bs4 import BeautifulSoup

data = """
<body>
    <p> ExampleStringWord#1 needs to “find” this entire element based on the "finding" of the first word </p>
    <p> Example#2  this element ignored </p>
    <p> ExampleStringWord#1 needs to find this entire element as well because the first word of this string is what I’m “searching” for, even though the wording after the first word in the string is different </p>
</body>
"""

soup = BeautifulSoup(data, "html.parser")
for p in soup.find_all("p", string=re.compile("ExampleStringWord#1")):
    p.name = 'h1'
print(soup)

打印:

<body>
    <h1> ExampleStringWord#1 needs to “find” this entire element based on the "finding" of the first word </h1>
    <p> Example#2  this element ignored </p>
    <h1> ExampleStringWord#1 needs to find this entire element as well because the first word of this string is what I’m “searching” for, even though the wording after the first word in the string is different </h1>
</body>

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM