简体   繁体   English

Selenium 通过 xpath 查找所有元素

[英]Selenium find all elements by xpath

I used selenium to scrap a scrolling website and conducted the code below我使用 selenium 来废弃滚动网站并执行以下代码

import requests
from bs4 import BeautifulSoup
import csv
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
import unittest
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import time
import unittest
import re

output_file = open("Kijubi.csv", "w", newline='')  

class Crawling(unittest.TestCase):
    def setUp(self):
        self.driver = webdriver.Firefox()
        self.driver.set_window_size(1024, 768)
        self.base_url = "http://www.viatorcom.de/"
        self.accept_next_alert = True

    def test_sel(self):
        driver = self.driver
        delay = 3
        driver.get(self.base_url + "de/7132/Seoul/d973-allthingstodo")
        for i in range(1,1):
            driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
            time.sleep(2)
    html_source = driver.page_source
    data = html_source.encode("utf-8")

My next step was to crawl specific information from the website like the price.我的下一步是从网站上抓取特定信息,例如价格。

Hence, I added the following code:因此,我添加了以下代码:

 all_spans = driver.find_elements_by_xpath("/html/body/div[5]/div/div[3]/div[2]/div[2]/div[1]/div[1]/div")
    print(all_spans)
    for price in all_spans:
        Header = driver.find_elements_by_xpath("/html/body/div[5]/div/div[3]/div[2]/div[2]/div[1]/div[1]/div/div[2]/div[2]/span[2]")
        for span in Header:
            print(span.text)

But I get just one price instead all of them.但我只得到一个价格而不是所有价格。 Could you provide me feedback on what I could improve my code?你能给我提供关于我可以改进我的代码的反馈吗? Thanks:)谢谢:)

EDIT编辑

Thanks to your guys I managed to get it running.多亏了你们,我才设法让它运行起来。 Here is the additional code:这是附加代码:

    elements = driver.find_elements_by_xpath("//div[@id='productList']/div/div")

    innerElements = 15

    outerElements = len(elements)/innerElements

    print(innerElements,  "\t", outerElements, "\t", len(elements))

    for j in range(1, int(outerElements)):

        for i in range(1, int(innerElements)):


            headline = driver.find_element_by_xpath("//div[@id='productList']/div["+str(j)+"]/div["+str(i)+"]/div/div[2]/h2/a").text

            price = driver.find_element_by_xpath("//div[@id='productList']/div["+str(j)+"]/div["+str(i)+"]/div/div[2]/div[2]/span[2]").text
            deeplink = driver.find_element_by_xpath("//div[@id='productList']/div["+str(j)+"]/div["+str(i)+"]/div/div[2]/h2/a").get_attribute("href")

            print("Header: " + headline + " | " + "Price: " + price + " | " + "Deeplink: " + deeplink)

Now my last issue is that I still do not get the last 20 prices back, which have a English description.现在我的最后一个问题是我仍然没有取回最后20个价格,其中有英文说明。 I only get back the prices which have German description.我只取回有德国描述的价格。 For English ones, they do not get fetched although they share the same html structure.对于英文版本,尽管它们共享相同的 html 结构,但它们不会被获取。

Eg html structure for the English items例如英文项目的 html 结构

     headline =   driver.find_element_by_xpath("//div[@id='productList']/div[6]/div[1]/div/div[2]/h2/a")

Do you guys know what I have to modify?你们知道我要修改什么吗? Any feedback is appreciated:)任何反馈表示赞赏:)

To grab all prices on that page you should use such XPATH:要获取该页面上的所有价格,您应该使用这样的 XPATH:

Header = driver.find_elements_by_xpath("//span[contains(concat(' ', normalize-space(@class), ' '), 'price-amount')]")

which means: find all span elements with class=price-amount, why so complex - see here这意味着:找到所有具有 class=price-amount 的 span 元素,为什么这么复杂 - 见这里

But more simply to find the same elements is by CSS locator:但更简单地找到相同的元素是通过 CSS 定位器:

.price-amount

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Selenium 无法通过 xpath 找到所有元素 - Selenium can't find all elements by xpath Selenium Python Find Elements by XPath 不返回所有预期元素 - Selenium Python Find Elements by XPath does not return all expected elements python/Selenium --> find_elements_by_xpath 方法找不到所有元素 - python/Selenium --> find_elements_by_xpath method not finding all elements Python 查找所有 Xpath selenium - Python find all Xpath selenium 使用python中的selenium webdriver从父xpath中找出所有子元素xpath - find out all child elements xpath from parent xpath using selenium webdriver in python Python Selenium 获取页面源中的所有元素。 driver.find_elements(By.XPATH, "//*") - Python Selenium get all elements in page source. driver.find_elements(By.XPATH, "//*") 使用 selenium find_elements_by_xpath 多次返回相同元素的数组,而不是所有元素 - Using selenium find_elements_by_xpath is returning an array of the same element multiple times rather than all elements Selenium 无法通过 XPath 找到元素 - Selenium can't find elements by XPath 如何循环查找Selenium xpath中的子元素? - How to loop to find child elements in Selenium xpath? Xpath 选择器 - 是否可以在 selenium 中按标题查找元素? - Xpath Selectors - is it possible to find elements by title in selenium?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM