簡體   English   中英

如何在 python 中使用 BeautifulSoup 獲得第二個跨度?

[英]How can I get the second span using BeautifulSoup in python?

我試圖在這個 div 和其他類似的人中獲得第二個跨度值(如下所示)

<div class="C(#959595) Fz(11px) D(ib) Mb(6px)">
    <span>VALUE 1</span>
    <i aria-hidden="true" class="Mx(4px)">•</i>
    <span>TRYING TO GET THIS</span>
</div>

我試過查看類似的堆棧帖子,但我仍然不知道如何解決這個問題。 這是我當前的代碼:

time = soup.find_all('div', {'class': 'C(#959595) Fz(11px) D(ib) Mb(6px)'})
    for i in time:
        print(i.text) #this prints VALUE 1 x amount of times (there are multiple divs)

我已經嘗試過 i.span、i.contents、i.children 等。非常感謝您的幫助,謝謝!

嘗試這個

from io import StringIO
from bs4 import BeautifulSoup as bs

data = """<div class="C(#959595) Fz(11px) D(ib) Mb(6px)">
    <span>VALUE 1</span>
    <i aria-hidden="true" class="Mx(4px)">•</i>
    <span>TRYING TO GET THIS</span>
</div>
<div class="another class">
    <span>VALUE 1</span>
    <i aria-hidden="true" class="Mx(4px)">•</i>
    <span>TRYING TO GET THIS</span>
</div>"""

soup = bs(StringIO(data))
spans = soup.select('div[class="C(#959595) Fz(11px) D(ib) Mb(6px)"] > span')
print(spans[1].text)

你基本上已經有了它,你只需要在每個 div (find_next) 中獲得第二個跨度:

soup = BeautifulSoup(HTML, 'html.parser')
divs = soup.find_all('div', {'class': 'C(#959595) Fz(11px) D(ib) Mb(6px)'})
for div in divs:
    # want the second span in the div
    span = div.find_next('span').find_next('span')
    print(span.string)
div= soup.find_all('div',class_='C(#959595) Fz(11px) D(ib) Mb(6px)')
[x.get_text() for x in div[0].find_all('span')]

#op

Out[17]:
['VALUE 1', 'TRYING TO GET THIS']

有幾種方法可以獲得您想要的值。

from simplified_scrapy.simplified_doc import SimplifiedDoc
html='''
<div class="C(#959595) Fz(11px) D(ib) Mb(6px)">
    <span>VALUE 1</span>
    <i aria-hidden="true" class="Mx(4px)">•</i>
    <span>TRYING TO GET THIS</span>
</div>
'''
doc = SimplifiedDoc(html)
divs = doc.getElementsByClass('C(#959595) Fz(11px) D(ib) Mb(6px)')
for div in divs:
  value = div.getElementByTag('span',start='</span>') # Use start to skip the first
  print (value)
  value = div.getElementByTag('span',before='<span>',end=len(div.html)) # Locate the last
  print (value)
  value = div.i.next # Use <i> to locate
  print (value)
  value = div.spans[-1]
  print (value)
  print (value.text)

結果:

{'tag': 'span', 'html': 'TRYING TO GET THIS'}
{'tag': 'span', 'html': 'TRYING TO GET THIS'}
{'tag': 'span', 'html': 'TRYING TO GET THIS'}
{'tag': 'span', 'html': 'TRYING TO GET THIS'}
TRYING TO GET THIS

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM