Python - BeautifulSoup - 無法提取跨度值

Question

我有一個帶有多個 Div 類/跨度類的 XML，我正在努力提取文本值。

   <div class="line">
     <span class="html-tag">
       "This is a Heading that I dont want"
     </span>
     <span>This is the text I want</span>

到目前為止，我已經寫了這個：

    html = driver.page_source
    soup = BeautifulSoup(html, "lxml")
    spans = soup.find_all('span', attrs={'class': 'html-tag'})[29]
    print(spans.text)

不幸的是，這只會打印出“這是我不想要的標題”值，例如

This is the heading I dont want

代碼中的數字[29]是 position，我需要的文本將始終出現在該處。

我不確定如何檢索我需要的跨度值。

請你幫忙。 謝謝

Answer 1

您可以按<div class="line">搜索，然后搜索 select 秒<span> 。

例如：

txt = '''
   # line 1

   <div class="line">
     <span class="html-tag">
       "This is a Heading that I dont want"
     </span>
     <span>This is the text I dont want</span>
   </div>

   # line 2

   <div class="line">
     <span class="html-tag">
       "This is a Heading that I dont want"
     </span>
     <span>This is the text I dont want</span>
   </div>

   # line 3

   <div class="line">
     <span class="html-tag">
       "This is a Heading that I dont want"
     </span>
     <span>This is the text I want</span>   <--- this is I want
   </div>'''


soup = BeautifulSoup(txt, 'html.parser')
s = soup.select('div.line')[2].select('span')[1]    # select 3rd line 2nd span

print(s.text)

印刷：

This is the text I want

Python - BeautifulSoup - 無法提取跨度值

問題描述

1 個解決方案

解決方案1
1 已采納 2020-06-17 12:29:49

Python - BeautifulSoup - 無法提取跨度值

問題描述

1 個解決方案

解決方案1 1 已采納 2020-06-17 12:29:49

解決方案1
1 已采納 2020-06-17 12:29:49