使用 selenium 從網站中提取文本

Question

試圖找到一種從好讀物頁面中提取書籍摘要的方法。 試過美湯/Selenium，可惜無濟於事。

鏈接：https://www.goodreads.com/book/show/67896.Tao_Te_Ching?from_search=true&from_srp=true&qid=D19iQu7KWI&rank=1

代碼：

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
import requests
link='https://www.goodreads.com/book/show/67896.Tao_Te_Ching?from_search=true&from_srp=true&qid=D19iQu7KWI&rank=1'
driver.get(link)
Description=driver.find_element_by_xpath("//div[contains(text(),'TextContainer')]")
#first TextContainer contains the sumary of the book
book_page = requests.get(link)
soup = BeautifulSoup(book_page.text, "html.parser")
print(soup)
Container = soup.find('class', class_='leftContainer')
print(Container)

錯誤：

容器是空的 +

NoSuchElementException：沒有這樣的元素：無法找到元素：{"method":"xpath","selector":"//div[contains(text(),'TextContainer')]"} （會話信息：chrome=83.0. 4103.116)

Answer 1

你可以像這樣得到描述

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
...
driver.get("https://www.goodreads.com/book/show/67896.Tao_Te_Ching?from_search=true&from_srp=true&qid=D19iQu7KWI&rank=1")
description = WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.CSS_SELECTOR, 'div#description span[style="display:none"]'))
)
print(description.get_attribute('textContent'))

我使用了CSS 選擇器來獲取包含完整描述的特定隱藏span 。 我還使用了顯式等待來給元素加載時間。

使用 selenium 從網站中提取文本

問題描述

1 個解決方案

解決方案1
0 2020-07-10 20:50:42

使用 selenium 從網站中提取文本

問題描述

1 個解決方案

解決方案1 0 2020-07-10 20:50:42

解決方案1
0 2020-07-10 20:50:42