在 Python 中使用 Selenium獲取每個網頁的值

Question

我有一個網站列表，我想循環瀏覽並提取電影的流派。 它們都來自 boxofficemojo。

示例鏈接如下： https : //www.boxofficemojo.com/release/rl3829564929/

在inspect中，我要提取的部分的頁面結構是這樣的：

<div class = "a-section a-spacing-none">
   <span>Genres</span>
 </div>
  <span>
  Action Adventure Thriller
 </span>

當我運行以下代碼時：

driver = webdriver.Chrome("C:\SeleniumDrivers\chromedriver.exe")
driver.get("https://www.boxofficemojo.com/release/rl3829564929/")
driver.implicitly_wait(3)
my_element = driver.find_element_by_xpath("/html/body/div[1]/main/div/div[3]/div[4]/div[7]/span[2]") 
my_element.text

我得到以下結果：

'Action Adventure Thriller'

這是這部特定電影的理想結果。 但是，當我去其他電影時，xpath 不同，我無法自動訪問它。

理想的解決方案是遍歷網站並提取電影的流派，而不管流派在每個單獨的電影頁面中具有的 xpath。

Answer 1

my_element = driver.find_element_by_xpath("//div[@class='a-section a-spacing-none' and contains(.,'Genres')]/span[2]") 
my_element.text

搜索適合您的周長的更獨特的 xpath。

Answer 2

對於您訪問的每個相關網站，如果標簽Genre后跟標簽中的流派名稱，則可以使用以下任一定位器策略來提取文本：

使用xpath ：

 driver.get("https://www.boxofficemojo.com/release/rl3829564929/") print(driver.find_element_by_xpath("//span[text()='Genres']//following::span[1]").text)

理想情況下，您需要為visibility_of_element_located()引入WebDriverWait ，您可以使用以下任一定位器策略：

使用XPATH ：

 driver.get("https://www.boxofficemojo.com/release/rl3829564929/") print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[text()='Genres']//following::span[1]"))).get_attribute("innerHTML"))

控制台輸出：
```
 Action Adventure Thriller
```

注意：您必須添加以下導入：

 from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC

<span>在 Python 中使用 Selenium</span>獲取<span>每個網頁</span>的<span>值</span>

問題描述

2 個解決方案

解決方案1
0 2021-11-12 19:30:56

解決方案2
0 2021-11-12 20:23:38

<span>在 Python 中使用 Selenium</span>獲取<span>每個網頁</span>的<span>值</span>

問題描述

2 個解決方案

解決方案1 0 2021-11-12 19:30:56

解決方案2 0 2021-11-12 20:23:38

解決方案1
0 2021-11-12 19:30:56

解決方案2
0 2021-11-12 20:23:38