[英]Selenium find all elements by xpath
I used selenium to scrap a scrolling website and conducted the code below我使用 selenium 来废弃滚动网站并执行以下代码
import requests
from bs4 import BeautifulSoup
import csv
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
import unittest
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import time
import unittest
import re
output_file = open("Kijubi.csv", "w", newline='')
class Crawling(unittest.TestCase):
def setUp(self):
self.driver = webdriver.Firefox()
self.driver.set_window_size(1024, 768)
self.base_url = "http://www.viatorcom.de/"
self.accept_next_alert = True
def test_sel(self):
driver = self.driver
delay = 3
driver.get(self.base_url + "de/7132/Seoul/d973-allthingstodo")
for i in range(1,1):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(2)
html_source = driver.page_source
data = html_source.encode("utf-8")
My next step was to crawl specific information from the website like the price.我的下一步是从网站上抓取特定信息,例如价格。
Hence, I added the following code:因此,我添加了以下代码:
all_spans = driver.find_elements_by_xpath("/html/body/div[5]/div/div[3]/div[2]/div[2]/div[1]/div[1]/div")
print(all_spans)
for price in all_spans:
Header = driver.find_elements_by_xpath("/html/body/div[5]/div/div[3]/div[2]/div[2]/div[1]/div[1]/div/div[2]/div[2]/span[2]")
for span in Header:
print(span.text)
But I get just one price instead all of them.但我只得到一个价格而不是所有价格。 Could you provide me feedback on what I could improve my code?你能给我提供关于我可以改进我的代码的反馈吗? Thanks:)谢谢:)
EDIT编辑
Thanks to your guys I managed to get it running.多亏了你们,我才设法让它运行起来。 Here is the additional code:这是附加代码:
elements = driver.find_elements_by_xpath("//div[@id='productList']/div/div")
innerElements = 15
outerElements = len(elements)/innerElements
print(innerElements, "\t", outerElements, "\t", len(elements))
for j in range(1, int(outerElements)):
for i in range(1, int(innerElements)):
headline = driver.find_element_by_xpath("//div[@id='productList']/div["+str(j)+"]/div["+str(i)+"]/div/div[2]/h2/a").text
price = driver.find_element_by_xpath("//div[@id='productList']/div["+str(j)+"]/div["+str(i)+"]/div/div[2]/div[2]/span[2]").text
deeplink = driver.find_element_by_xpath("//div[@id='productList']/div["+str(j)+"]/div["+str(i)+"]/div/div[2]/h2/a").get_attribute("href")
print("Header: " + headline + " | " + "Price: " + price + " | " + "Deeplink: " + deeplink)
Now my last issue is that I still do not get the last 20 prices back, which have a English description.现在我的最后一个问题是我仍然没有取回最后20个价格,其中有英文说明。 I only get back the prices which have German description.我只取回有德国描述的价格。 For English ones, they do not get fetched although they share the same html structure.对于英文版本,尽管它们共享相同的 html 结构,但它们不会被获取。
Eg html structure for the English items例如英文项目的 html 结构
headline = driver.find_element_by_xpath("//div[@id='productList']/div[6]/div[1]/div/div[2]/h2/a")
Do you guys know what I have to modify?你们知道我要修改什么吗? Any feedback is appreciated:)任何反馈表示赞赏:)
To grab all prices on that page you should use such XPATH:要获取该页面上的所有价格,您应该使用这样的 XPATH:
Header = driver.find_elements_by_xpath("//span[contains(concat(' ', normalize-space(@class), ' '), 'price-amount')]")
which means: find all span elements with class=price-amount, why so complex - see here这意味着:找到所有具有 class=price-amount 的 span 元素,为什么这么复杂 - 见这里
But more simply to find the same elements is by CSS locator:但更简单地找到相同的元素是通过 CSS 定位器:
.price-amount
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.