![](/img/trans.png)
[英]How to extract text from webdriver elements found through xpath using Selenium and Python
[英]How to extract the text elements using Selenium in Python?
我正在使用Selenium從應用商店中抓取內容: https : //apps.apple.com/us/app/bank-of-america-private-bank/id1096813830
我嘗試提取文本字段“作為主題專家,我們的團隊非常有魅力……”
我試圖按班級查找元素
review_ratings = driver.find_elements_by_class_name('we-truncate we-truncate--multi-line we-truncate--interactive ember-view we-customer-review__body')
review_ratingsList = []
for e in review_ratings:
review_ratingsList.append(e.get_attribute('innerHTML'))
review_ratings
但它返回一個空列表[]
代碼有什么問題嗎? 還是更好的解決方案? 謝謝你的幫助。
使用requests
和BeautifulSoup
:
import requests
from bs4 import BeautifulSoup
url = 'https://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830'
res = requests.get(url)
soup = BeautifulSoup(res.text,'lxml')
item = soup.select_one("blockquote > p").text
print(item)
輸出:
As subject matter experts, our team is very engaging and focused on our near and long term financial health!
您可以使用WebDriverWait
等待元素的可見性並獲取文本。 請檢查良好的硒定位器 。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
#...
wait = WebDriverWait(driver, 5)
review_ratings = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".we-customer-review")))
for review_rating in review_ratings:
starts = review_rating.find_element_by_css_selector(".we-star-rating").get_attribute("aria-label")
title = review_rating.find_element_by_css_selector("h3").text
review = review_rating.find_element_by_css_selector("p").text
我可以建議將selenium
與BeautifulSoup
混合使用嗎? 使用網絡驅動程序:
from bs4 import BeautifulSoup
from selenium import webdriver
browser=webdriver.Chrome()
url = "https://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830"
browser.get(url)
innerHTML = browser.execute_script("return document.body.innerHTML")
bs = BeautifulSoup(innerHTML, 'html.parser')
bs.blockquote.p.text
輸出:
Out[22]: 'As subject matter experts, our team is very engaging and focused on our near and long term financial health!'
如果有什么要解釋的,請告訴我!
使用WebDriverWait
,等待presence_of_all_elements_located
和使用下面的CSS選擇器。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("https://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830")
review_ratings =WebDriverWait(driver,20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,'.we-customer-review__body p[dir="ltr"]')))
review_ratingsList = []
for e in review_ratings:
review_ratingsList.append(e.get_attribute('innerHTML'))
print(review_ratingsList)
['As subject matter experts, our team is very engaging and focused on our near and long term financial health!', 'Very much seems to be an unfinished app. Can’t find secure message alert. Or any alerts for that matter. Most of my client team is missing from the “send to” list. I have other functions very useful, when away from my computer.']
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.