測試一個屬性是否存在於 BeautifulSoup 的標簽中

Question

我想獲取文檔中的所有<script>標簽，然后根據某些屬性的存在（或不存在）來處理每個標簽。

例如，對於每個<script>標簽，如果存在for屬性for則執行某些操作； 否則，如果存在屬性bar ，請執行其他操作。

這是我目前正在做的事情：

outputDoc = BeautifulSoup(''.join(output))
scriptTags = outputDoc.findAll('script', attrs = {'for' : True})

但是這樣我用for屬性過濾了所有<script>標簽......但我丟失了其他標簽（那些沒有for屬性的標簽）。

Answer 1

如果我理解得很好，您只需要所有腳本標簽，然后檢查其中的某些屬性？

scriptTags = outputDoc.findAll('script')
for script in scriptTags:
    if script.has_attr('some_attribute'):
        do_something()

Answer 2

您不需要任何 lambdas 來按屬性過濾，您可以簡單地在find或find_all使用some_attribute=True 。

script_tags = soup.find_all('script', some_attribute=True)

# or

script_tags = soup.find_all('script', {"some-data-attribute": True})

以下是其他方法的更多示例：

soup = bs4.BeautifulSoup(html)

# Find all with a specific attribute

tags = soup.find_all(src=True)
tags = soup.select("[src]")

# Find all meta with either name or http-equiv attribute.

soup.select("meta[name],meta[http-equiv]")

# find any tags with any name or source attribute.

soup.select("[name], [src]")

# find first/any script with a src attribute.

tag = soup.find('script', src=True)
tag = soup.select_one("script[src]")

# find all tags with a name attribute beginning with foo
# or any src beginning with /path
soup.select("[name^=foo], [src^=/path]")

# find all tags with a name attribute that contains foo
# or any src containing with whatever
soup.select("[name*=foo], [src*=whatever]")

# find all tags with a name attribute that endwith foo
# or any src that ends with  whatever
soup.select("[name$=foo], [src$=whatever]")

您還可以在 find 或 find_all 中使用正則表達式：

import re
# starting with
soup.find_all("script", src=re.compile("^whatever"))
# contains
soup.find_all("script", src=re.compile("whatever"))
# ends with 
soup.find_all("script", src=re.compile("whatever$"))

Answer 3

為了將來參考，has_key 已被棄用是 beautifulsoup 4. 現在你需要使用 has_attr

scriptTags = outputDoc.find_all('script')
  for script in scriptTags:
    if script.has_attr('some_attribute'):
      do_something()

Answer 4

如果您只需要獲取帶有屬性的標簽，您可以使用 lambda：

soup = bs4.BeautifulSoup(YOUR_CONTENT)

帶有屬性的標簽

tags = soup.find_all(lambda tag: 'src' in tag.attrs)

或者

tags = soup.find_all(lambda tag: tag.has_attr('src'))

帶有屬性的特定標簽

tag = soup.find(lambda tag: tag.name == 'script' and 'src' in tag.attrs)

等等 ...

認為它可能有用。

Answer 5

您可以檢查是否存在某些屬性

scriptTags = outputDoc.findAll('script', some_attribute=True)
for script in scriptTags:
    do_something()

Answer 6

通過使用 pprint 模塊，您可以檢查元素的內容。

from pprint import pprint

pprint(vars(element))

在 bs4 元素上使用它會打印出類似的內容：

{'attrs': {u'class': [u'pie-productname', u'size-3', u'name', u'global-name']},
 'can_be_empty_element': False,
 'contents': [u'\n\t\t\t\tNESNA\n\t'],
 'hidden': False,
 'name': u'span',
 'namespace': None,
 'next_element': u'\n\t\t\t\tNESNA\n\t',
 'next_sibling': u'\n',
 'parent': <h1 class="pie-compoundheader" itemprop="name">\n<span class="pie-description">Bedside table</span>\n<span class="pie-productname size-3 name global-name">\n\t\t\t\tNESNA\n\t</span>\n</h1>,
 'parser_class': <class 'bs4.BeautifulSoup'>,
 'prefix': None,
 'previous_element': u'\n',
 'previous_sibling': u'\n'}

要訪問屬性 - 比如說類列表 - 使用以下內容：

class_list = element.attrs.get('class', [])

您可以使用這種方法過濾元素：

for script in soup.find_all('script'):
    if script.attrs.get('for'):
        # ... Has 'for' attr
    elif "myClass" in script.attrs.get('class', []):
        # ... Has class "myClass"
    else: 
        # ... Do something else

Answer 7

一種選擇所需內容的簡單方法。

outputDoc.select("script[for]")

測試一個屬性是否存在於 BeautifulSoup 的標簽中

問題描述

7 個解決方案

解決方案1
127 已采納 2011-02-16 14:15:11

解決方案2
35 2016-08-20 14:12:28

解決方案3
34 2013-08-01 02:32:11

解決方案4
18 2015-07-26 16:51:00

解決方案5
3 2018-01-03 00:04:21

解決方案6
1 2016-09-20 15:28:50

解決方案7
0 2021-08-07 19:09:23

測試一個屬性是否存在於 BeautifulSoup 的標簽中

問題描述

7 個解決方案

解決方案1 127 已采納 2011-02-16 14:15:11

解決方案2 35 2016-08-20 14:12:28

解決方案3 34 2013-08-01 02:32:11

解決方案4 18 2015-07-26 16:51:00

解決方案5 3 2018-01-03 00:04:21

解決方案6 1 2016-09-20 15:28:50

解決方案7 0 2021-08-07 19:09:23

解決方案1
127 已采納 2011-02-16 14:15:11

解決方案2
35 2016-08-20 14:12:28

解決方案3
34 2013-08-01 02:32:11

解決方案4
18 2015-07-26 16:51:00

解決方案5
3 2018-01-03 00:04:21

解決方案6
1 2016-09-20 15:28:50

解決方案7
0 2021-08-07 19:09:23