簡體   English   中英

如何在Python中使用Selenium提取文本元素?

[英]How to extract the text elements using Selenium in Python?

在此處輸入圖片說明

我正在使用Selenium從應用商店中抓取內容: https : //apps.apple.com/us/app/bank-of-america-private-bank/id1096813830

我嘗試提取文本字段“作為主題專家,我們的團隊非常有魅力……”

我試圖按班級查找元素

review_ratings = driver.find_elements_by_class_name('we-truncate we-truncate--multi-line we-truncate--interactive ember-view we-customer-review__body')
review_ratingsList = []
for e in review_ratings:
review_ratingsList.append(e.get_attribute('innerHTML'))
review_ratings

但它返回一個空列表[]

代碼有什么問題嗎? 還是更好的解決方案? 謝謝你的幫助。

使用requestsBeautifulSoup

import requests
from bs4 import BeautifulSoup

url = 'https://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830'

res = requests.get(url)
soup = BeautifulSoup(res.text,'lxml')
item = soup.select_one("blockquote > p").text
print(item)

輸出:

As subject matter experts, our team is very engaging and focused on our near and long term financial health!

您可以使用WebDriverWait等待元素的可見性並獲取文本。 請檢查良好的硒定位器

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

#...

wait = WebDriverWait(driver, 5)
review_ratings = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".we-customer-review")))
for review_rating in review_ratings:
    starts = review_rating.find_element_by_css_selector(".we-star-rating").get_attribute("aria-label")
    title = review_rating.find_element_by_css_selector("h3").text
    review = review_rating.find_element_by_css_selector("p").text

我可以建議將seleniumBeautifulSoup混合使用嗎? 使用網絡驅動程序:

from bs4 import BeautifulSoup
from selenium import webdriver
browser=webdriver.Chrome()
url = "https://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830"
browser.get(url)
innerHTML = browser.execute_script("return document.body.innerHTML")

bs = BeautifulSoup(innerHTML, 'html.parser')

bs.blockquote.p.text

輸出:

Out[22]: 'As subject matter experts, our team is very engaging and focused on our near and long term financial health!'

如果有什么要解釋的,請告訴我!

使用WebDriverWait ,等待presence_of_all_elements_located和使用下面的CSS選擇器。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("https://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830")
review_ratings =WebDriverWait(driver,20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,'.we-customer-review__body p[dir="ltr"]')))
review_ratingsList = []
for e in review_ratings:
 review_ratingsList.append(e.get_attribute('innerHTML'))
print(review_ratingsList)

輸出:

['As subject matter experts, our team is very engaging and focused on our near and long term financial health!', 'Very much seems to be an unfinished app. Can’t find secure message alert. Or any alerts for that matter. Most of my client team is missing from the “send to” list. I have other functions very useful, when away from my computer.']

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM