如何<a>使用 Python 從 DIV 中的標簽中提取標題？</a>

Question

我是 Python 的新手，我想提取放置在 Divs 中的 <a> 標簽內的所有標題。 它可能有 0 個標題或多達 100 個。

它是子 DIV <div class="Shl zI7 iyn Hsu">其中包含 < a > 標記和標題。

這是第一個包含所有子 DIV 的主 DIV 代碼：

<div class="Eqh F6l Jea k1A zI7 iyn Hsu"><div class="Shl zI7 iyn Hsu"><a data-test-id="search-guide" 
href="" title="Search for &quot;living room colors&quot;"><div class="Jea Lfz XiG fZz gjz qDf zI7 iyn 
Hsu" style="white-space: nowrap; background-color: rgb(162, 152, 139);"><div class="tBJ dyH iFc MF7 
erh tg7 IZT mWe">Living</div></div></a>

在上面的示例中，我想獲取“客廳顏色”而不是 title= 前面的所有內容，我想我以后可以有一些 RegEx，但是我遇到了從 HTML 解析中獲取標題的問題。

我試過以下 Python：

import requests
from bs4 import BeautifulSoup

url = "https://www.pinterest.com/search/pins/?q=room%20color"
get_url = requests.get(url)
get_text = get_url.text
soup = BeautifulSoup(get_text, "html.parser")
DivTitle = soup.select('a.Shl.zI7.iyn.Hsu')[0].text.strip()
print(DivTitle)

我得到：IndexError：列表索引超出范圍

當我搜索上述關鍵字時，搜索結果中出現了多個標題（建議關鍵字）。

感謝你的幫助。

編輯：好的，我得到了這個工作，但我試圖讓它從 URL 解析而不是粘貼我的代碼：

這是我使用的部分：

import requests
vgm_url = 'https://www.pinterest.com/search/pins/?q=skin%20care'
html_text = requests.get(vgm_url).text
soup = BeautifulSoup(html_text, 'html.parser')

但我什么也沒得到，也沒有錯誤。

Answer 1

您的選擇器錯誤，因為 DIV 具有您想要的類，而 A 是該 DIV 的子項。 title是 A 元素的一個屬性。

from bs4 import BeautifulSoup

data = '''\
<html>
  <head>
    <meta name="generator"
    content="HTML Tidy for HTML5 (experimental) for Windows https://github.com/w3c/tidy-html5/tree/c63cc39" />
    <title></title>
  </head>
  <body>
    <div class="Eqh F6l Jea k1A zI7 iyn Hsu">
      <div class="Shl zI7 iyn Hsu">
        <a data-test-id="search-guide" href="" title="Search for &quot;living room colors&quot;">
          <div class="Jea Lfz XiG fZz gjz qDf zI7 iyn Hsu" style="white-space: nowrap; background-color: rgb(162, 152, 139);">
            <div class="tBJ dyH iFc MF7 erh tg7 IZT mWe">Living</div>
          </div>
        </a>
      </div>
    </div>
  </body>
</html>
'''

soup = BeautifulSoup(data, 'html.parser')

a = soup.select('div.Shl.zI7.iyn.Hsu a')[0]

print(a['title'])

如何<a>使用 Python 從 DIV 中的標簽中提取標題？</a>

問題描述

1 個解決方案

解決方案1
0 已采納 2020-11-25 20:07:39

如何<a>使用 Python 從 DIV 中的標簽中提取標題？</a>

問題描述

1 個解決方案

解決方案1 0 已采納 2020-11-25 20:07:39

解決方案1
0 已采納 2020-11-25 20:07:39