简体   繁体   English

Python Beautifulsoup:如何在不知道相应属性名称的情况下通过属性值查找标签?

[英]Python Beautifulsoup : how to find a tag by attribute value without knowing corresponding attribute name?

Let's assume we have an Attribute value "xyz" without knowing the Attribute Name.假设我们有一个不知道属性名称的属性值“xyz”。 It means we could match这意味着我们可以匹配

    <a href="xyz">

but also但是也

    <div class="xyz">

Is it possible search for such tags?是否可以搜索此类标签?

One solution is using lambda in find_all function.一种解决方案是在find_all函数中使用lambda

Example:例子:

data = '''<a href="xyz">a</a>
<div class="somethingelse">b</div>
<div class="xyz">c</div>'''

from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'html.parser')

for tag in soup.find_all(lambda tag: any('xyz' in tag[a] for a in tag.attrs)):
    print(tag)

Prints:印刷:

<a href="xyz">a</a>
<div class="xyz">c</div>
[tag for tag in soup.find_all(True) 
    if "xyz" in tag.attrs.values() or ["xyz"] in tag.attrs.values()]

The explanation:说明:

  • soup.find_all(True) finds all tags (because True is for every tag evaluated to True ). soup.find_all(True)查找所有标签(因为True是针对每个评估为True标签)。

  • tag.attrs is the dictionary of all attributes of the tag . tag.attrstag所有属性的字典。

  • We are not interested in tag attributes names (as href , class , id ), only in their values - so we use tag.attrs.values() .我们对标签属性名称(如hrefclassid )不感兴趣,只对它们的tag.attrs.values() ——所以我们使用tag.attrs.values()
  • Some attributes are multi-valued (eg class="xy" ), so their value in the attrs dictionary is a list (eg ["x", "y"] ).一些属性是多值的(例如class="xy" ),因此它们在attrs字典中的值是一个列表(例如["x", "y"] )。 So we test both "xyz" and ["xyz"] possibilities.所以我们测试了"xyz"["xyz"]可能性。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM