[英]How to extract the text elements using Selenium in Python?
I am using Selenium to scrape the contents from app store: https://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830 我正在使用Selenium从应用商店中抓取内容: https : //apps.apple.com/us/app/bank-of-america-private-bank/id1096813830
I tried to extract the text field "As subject matter experts, our team is very engaging..." 我尝试提取文本字段“作为主题专家,我们的团队非常有魅力……”
I tried to find elements by class 我试图按班级查找元素
review_ratings = driver.find_elements_by_class_name('we-truncate we-truncate--multi-line we-truncate--interactive ember-view we-customer-review__body')
review_ratingsList = []
for e in review_ratings:
review_ratingsList.append(e.get_attribute('innerHTML'))
review_ratings
But it returns an empty list []
但它返回一个空列表[]
Anything wrong with the code? 代码有什么问题吗? Or any better solutions? 还是更好的解决方案? Thanks for your help. 谢谢你的帮助。
Using requests
and BeautifulSoup
: 使用requests
和BeautifulSoup
:
import requests
from bs4 import BeautifulSoup
url = 'https://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830'
res = requests.get(url)
soup = BeautifulSoup(res.text,'lxml')
item = soup.select_one("blockquote > p").text
print(item)
Output: 输出:
As subject matter experts, our team is very engaging and focused on our near and long term financial health!
You can use WebDriverWait
to wait for visibility of element and get text. 您可以使用WebDriverWait
等待元素的可见性并获取文本。 Please check good selenium locator . 请检查良好的硒定位器 。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
#...
wait = WebDriverWait(driver, 5)
review_ratings = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".we-customer-review")))
for review_rating in review_ratings:
starts = review_rating.find_element_by_css_selector(".we-star-rating").get_attribute("aria-label")
title = review_rating.find_element_by_css_selector("h3").text
review = review_rating.find_element_by_css_selector("p").text
May I suggest mixing selenium
with BeautifulSoup
? 我可以建议将selenium
与BeautifulSoup
混合使用吗? Using webdriver: 使用网络驱动程序:
from bs4 import BeautifulSoup
from selenium import webdriver
browser=webdriver.Chrome()
url = "https://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830"
browser.get(url)
innerHTML = browser.execute_script("return document.body.innerHTML")
bs = BeautifulSoup(innerHTML, 'html.parser')
bs.blockquote.p.text
Output: 输出:
Out[22]: 'As subject matter experts, our team is very engaging and focused on our near and long term financial health!'
If there's something to be explained, just tell me! 如果有什么要解释的,请告诉我!
Use WebDriverWait
and wait for presence_of_all_elements_located
and use following Css Selector. 使用WebDriverWait
,等待presence_of_all_elements_located
和使用下面的CSS选择器。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("https://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830")
review_ratings =WebDriverWait(driver,20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,'.we-customer-review__body p[dir="ltr"]')))
review_ratingsList = []
for e in review_ratings:
review_ratingsList.append(e.get_attribute('innerHTML'))
print(review_ratingsList)
['As subject matter experts, our team is very engaging and focused on our near and long term financial health!', 'Very much seems to be an unfinished app. Can’t find secure message alert. Or any alerts for that matter. Most of my client team is missing from the “send to” list. I have other functions very useful, when away from my computer.']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.