
[英]How to grab URL in "View Deal" and price for deal from kayak.com using BeautifulSoup
[英]Trying to grab href URLs from Kayak website using BeautifulSoup
我试图从出现在这个 Kayak 网站上的每张卡片中获取 URL,当我尝试运行下面的代码时,我收到了BrokenPipeError: [Errno 32] Broken pipe
错误。 有人可以帮助我获得正确的代码以从该页面的航班结果中获取所有 URL 吗?
url = 'https://www.kayak.com/flights/AMS-WMI,nearby/2023-02-15/WMI-SOF,nearby/2023-02-18/SOF-BEG,nearby/2023-02-20/BEG-MIL,nearby/2023-02-23/MIL-AMS,nearby/2023-02-25/?sort=bestflight_a&fs=stops=-2&attempt=1&lastms=1675195877028'
requests = 0
chrome_options = webdriver.ChromeOptions()
agents = ["Firefox/66.0.3","Chrome/73.0.3683.68","Edge/16.16299"]
print("User agent: " + agents[(requests%len(agents))])
chrome_options.add_argument('--user-agent=' + agents[(requests%len(agents))] + '"')
chrome_options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome('/Users/junerodriguez/Downloads/chromedriver_mac_arm64/chromedriver')
driver.implicitly_wait(10)
driver.get(url)
sleep(randint(8,10))
xp_hrefs = "//div[@class='above-button']//a[contains(@class,'booking-link')]/href[@class='col col-best']"
hrefs = driver.find_elements_by_xpath(xp_hrefs)
hrefs
在 Selenium 中,您应该使用 XPaths 定位 web 元素,而不是它们的属性。
要提取href
属性值,您需要将所有这些a
web 元素收集到一个列表中,然后迭代该列表以从列表中的每个 web 元素中提取href
属性,如下所示:
hrefs = [link.get_attribute('href') for link in driver.find_elements(By.XPATH,"//div[@class='above-button']//a[contains(@class,'booking-link')]")]
在上面的代码中,您将所有匹配的 web 元素添加到列表中,然后为该列表中的每个link
元素应用link.get_attribute('href')
以提取href
属性值。
结果被收集到hrefs
列表中。
要从网站内的所有href属性中提取链接,您必须为visibility_of_all_elements_located()引入WebDriverWait ,并且您可以使用以下任一定位器策略:
使用CSS_SELECTOR和文本属性:
driver.get("https://www.kayak.com/flights/AMS-WMI,nearby/2023-02-15/WMI-SOF,nearby/2023-02-18/SOF-BEG,nearby/2023-02-20/BEG-MIL,nearby/2023-02-23/MIL-AMS,nearby/2023-02-25/?sort=bestflight_a&fs=stops=-2&attempt=1&lastms=1675195877028") print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a[role='link'][href]")))])
使用XPATH和get_attribute("innerHTML")
:
driver.get("https://www.kayak.com/flights/AMS-WMI,nearby/2023-02-15/WMI-SOF,nearby/2023-02-18/SOF-BEG,nearby/2023-02-20/BEG-MIL,nearby/2023-02-23/MIL-AMS,nearby/2023-02-25/?sort=bestflight_a&fs=stops=-2&attempt=1&lastms=1675195877028") print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//a[@role='link' and @href]")))])
控制台 Output:
['https://www.kayak.com/book/flight?code=OiFir3l_8L.18wgSzIaLAlgpzrTH2pViLaYAeeTFjgE.81197.36c89f7717e84ac7a4ee2898627fa251&h=40d03211086c&sub=E-191e8b4083a&pageOrigin=F..RP.FE.M0', 'https://www.kayak.com/book/flight?code=OiFir3l_8L.47F3EeHCWiIEdn9PX-8xhQ.41000.a6f675f0a632a9d55b0fab7f1b09f9d8&h=8dce29003385&sub=E-10f42a14593&pageOrigin=F..RP.FE.M1', 'https://www.kayak.com/book/flight?code=OiFir3l_8L.18wgSzIaLAlgpzrTH2pViLaYAeeTFjgE.81397.58fb639ccf8938f61eec808f1e13c556&h=ba02be2bf0dc&sub=E-191e8b4083a&pageOrigin=F..RP.FE.M2', 'https://www.kayak.com/book/flight?code=OiFir3l_8L.18wgSzIaLAlgpzrTH2pViLaYAeeTFjgE.81197.aca9104db06bae99e4f55a158dfd3ff2&h=61a4dc653dc3&sub=E-191e8b4083a&pageOrigin=F..RP.FE.M4', 'https://www.kayak.com/book/flight?code=OiFir3l_8L.18wgSzIaLAlgpzrTH2pViLaYAeeTFjgE.81397.bcc92e8ae656b0e298dbe8a6555bd825&h=ece97a1b9509&sub=E-191e8b4083a&pageOrigin=F..RP.FE.M5', 'https://www.kayak.com/book/flight?code=OiFir3l_8L.18wgSzIaLAlgpzrTH2pViLaYAeeTFjgE.80697.732461bd95055d2478850abf1741221f&h=c94d3b283c0a&sub=E-191e8b4083a&pageOrigin=F..RP.FE.M6', 'https://www.kayak.com/book/flight?code=OiFir3l_8L.18wgSzIaLAlgpzrTH2pViLaYAeeTFjgE.80997.215cabd8ee10582a0d6b94c20dfb95ad&h=b493996d9e9d&sub=E-191e8b4083a&pageOrigin=F..RP.FE.M7', 'https://www.kayak.com/book/flight?code=OiFir3l_8L.18wgSzIaLAlgpzrTH2pViLaYAeeTFjgE.80697.cabd6f6051c17b3cd7f9129454607d0e&h=917f7fb0f2f5&sub=E-191e8b4083a&pageOrigin=F..RP.FE.M8', 'https://www.kayak.com/book/flight?code=OiFir3l_8L.18wgSzIaLAlgpzrTH2pViLaYAeeTFjgE.80997.07496f1f93e916d757ec284da1ef4638&h=52abbdb7d13a&sub=E-191e8b4083a&pageOrigin=F..RP.FE.M11', 'https://www.kayak.com/book/flight?code=OiFir3l_8L.eNCwACMVOeJpd4CyPwn0EI6M4XD8KcmF.71697.a1633218d7cbd5eb2fe950504a6207a9&h=c8fe9769a628&sub=E-15b10c5af5f&pageOrigin=F..RP.FE.M12', 'https://www.kayak.com/book/flight?code=OiFir3l_8L.eNCwACMVOeJpd4CyPwn0EI6M4XD8KcmF.71997.ebd834f5c265ae428e1bdbb3637a606b&h=d80290f43a93&sub=E-15b10c5af5f&pageOrigin=F..RP.FE.M13', 'https://www.kayak.com/book/flight?code=OiFir3l_8L.eNCwACMVOeJpd4CyPwn0EI6M4XD8KcmF.71697.c7a2ace471ba5c35334014e91956f849&h=2f3b292c166e&sub=E-15b10c5af5f&pageOrigin=F..RP.FE.M14', 'https://www.kayak.com/book/flight?code=OiFir3l_8L.eNCwACMVOeJpd4CyPwn0EI6M4XD8KcmF.71997.a281f919b379469a223fb34ed5510409&h=913a810b8e80&sub=E-15b10c5af5f&pageOrigin=F..RP.FE.M15', 'https://www.kayak.com/book/flight?code=OiFir3l_8L.eNCwACMVOeJpd4CyPwn0EI6M4XD8KcmF.71697.4fdb4fded43ccbf47dcdcad01bf919e6&h=ea07410d1dda&sub=E-15b10c5af5f&pageOrigin=F..RP.FE.M16', 'https://www.kayak.com/book/flight?code=OiFir3l_8L.eNCwACMVOeJpd4CyPwn0EI6M4XD8KcmF.71997.23101dac562249519c55956ba4cc7abf&h=45c62f765f1f&sub=E-15b10c5af5f&pageOrigin=F..RP.FE.M17', 'https://www.kayak.com/book/flight?code=OiFir3l_8L.eNCwACMVOeJpd4CyPwn0EI6M4XD8KcmF.71697.1377d5650be1523cc39b1849b7d9bbdf&h=c04d73c3ac61&sub=E-15b10c5af5f&pageOrigin=F..RP.FE.M18']
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.