简体   繁体   中英

Problems Scraping with Selenium (xpath) in Tripadvisor

I am new in python and scraping. I am trying to extract information about Tripadvisor. First of all, I need Selenium for crawling but when I run the program in diferents times the paths change.

I show you a example:

import urllib.request
import urllib.parse
from selenium import webdriver
import csv
from selenium.webdriver.common.action_chains import ActionChains
import time
from _datetime import datetime
from selenium.webdriver.common.keys import Keys
options=webdriver.ChromeOptions()
options.headless=False
prefs={"profile.default_content_setting_values.notofications" :2}
options.add_experimental_option("prefs",prefs)
chromedriver = "C:/Users/rober/OneDrive/Escritorio/tfm/chromedriver.exe"
driver=webdriver.Chrome(chromedriver)
driver.maximize_window()
time.sleep(5)

driver.get("https://www.tripadvisor.es/")
//*[@id="component_5"]/div/div/div/span[3]/div/div/div/a/span[2]

#Click Restaurants
driver.find_element_by_xpath('//*[@id="component_5"]/div/div/div/span[3]/div/div/div/a').click()

#Introduce localization
driver.find_element_by_xpath('//*[@id="BODY_BLOCK_JQUERY_REFLOW"]/div[14]/div/div/div[1]/div[1]/div/input').send_keys("madrid")

In the last part of code, sometimes div[14] is div[13] or div[15]. is it possible absolute xpath or use other form?

Thank you

You should not use Xpath with a longer path. That makes the test brittle

Please use shorter xpaths. An Xpath like this "//input[@class="Smftgery"]" should help you click on the same input field.

在此处输入图片说明

Also to click on Restaurantes, you can use //*[text()='Restaurantes']

Your Xpath is too specific, find some uniqueness in the deeper levels of the DOM. This uniqueness can be also a combination of multiple levels. eg if there is only one input field inside BODY_BLOCK_JQUERY_REFLOW you can ignore all the levels in between: '//*[@id="BODY_BLOCK_JQUERY_REFLOW"]//input'

or use some other attribute of input eg if it has a data attribute: //input[@data="the-data-of-the-input-field"]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM