無法建立新連接：[Errno 11001] getaddrinfo failed'))

Question

我試圖在 booking.com 上抓取一些數據，但我遇到了這個錯誤：

requests.exceptions.ConnectionError: HTTPSConnectionPool(host='www.booking.com%0a', port=443): Max retries exceeded with url: /hotel/fr/elyseesunion.fr.html?label=gen173nr-1FCA0oTUIMZWx5c2Vlc3VuaW9uSA1YBGhNiAEBmAENuAEXyAEM2AEB6AEB-AECiAIBqAIDuAL_5ZqEBsACAdICJDcxYjgyZmI2LTFlYWQtNGZjOS04Y2U2LTkwNTQyZjI5OWY1YtgCBeACAQ&sid=c9f6a7c7371b88db9274005810b6f9e1&dest_id=-1456928&dest_type=city&group_adults=2&group_children=0&hapos=1&hpos=1&no_rooms=1&sr_order=popularity&srepoch=1621245602&srpvid=00bd465162de01a4&ucfs=1&from=searchresults%0A;highlight_room= (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x0000019BF5ECBBB0>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))

我知道這個，網站阻止我抓取他的數據。

我在這個網站上嘗試了一些答案，但沒有一個有效。

這是我的腳本：

import numpy as np


import time
from random import randint
import requests
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import re
import random

#headers= {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'}

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'Accept-Language': 'fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3',
    'Referer': 'https://www.espncricinfo.com/',
    'Upgrade-Insecure-Requests': '1',
    'Connection': 'keep-alive',
    'Pragma': 'no-cache',
    'Cache-Control': 'no-cache',
}

url0 = 'https://www.booking.com/searchresults.fr.html?label=gen173nr-1DCA0oTUIMZWx5c2Vlc3VuaW9uSA1YBGhNiAEBmAENuAEXyAEM2AED6AEB-AECiAIBqAIDuAL_5ZqEBsACAdICJDcxYjgyZmI2LTFlYWQtNGZjOS04Y2U2LTkwNTQyZjI5OWY1YtgCBOACAQ;sid=303509179a2849df63e4d1e5bc1ab1e3;dest_id=-1456928;dest_type=city&'
links1 = []

results = requests.get(url0, headers = headers)


soup = BeautifulSoup(results.text, "html.parser")

links1 = [a['href']  for a in soup.find("div", {"class": "hotellist sr_double_search"}).find_all('a',  href=True)]
 

root_url = 'https://www.booking.com'
urls1 = [ '{root}{i}'.format(root=root_url, i=i) for i in links1 ]

#print(urls1[0])

for url in urls1: 
    results = requests.get(url)

    

    time.sleep(random.random()*10)



    soup = BeautifulSoup(results.text, "html.parser")


    div = soup.find("div", {"class": "hp_desc_important_facilities clearfix hp_desc_important_facilities--bui"})
    pointfort = [x['data-name-en'] for x in div.select('div[class*="important_facility"]')]


print(pointfort)

如您所見，我嘗試time ，我嘗試了headers 。 我也試過timeout等等。

任何想法如何解決這個問題？

也許如果我等一段時間？

Answer 1

我正在使用 requests_html 庫。 使用此文檔鏈接。

試試這個代碼片段。 它對我有用。 它正在抓取頁面並將其存儲在 HTML 頁面中。


from requests_html import HTMLSession

session = HTMLSession()

url = "https://www.booking.com/searchresults.en-gb.html?label=gen173nr-1FCA0oTUIMZWx5c2Vlc3VuaW9uSA1YBGhNiAEBmAENuAEXyAEM2AEB6AEB-AECiAIBqAIDuAL_5ZqEBsACAdICJDcxYjgyZmI2LTFlYWQtNGZjOS04Y2U2LTkwNTQyZjI5OWY1YtgCBeACAQ&sid=7685a7b3f07c84e4aadff993a229c309&tmpl=searchresults&class_interval=1&dest_id=-1456928&dest_type=city&dtdisc=0&inac=0&index_postcard=0&label_click=undef&lang=en-gb&offset=0&postcard=0&raw_dest_type=city&room1=A%2CA&sb_price_type=total&shw_aparth=1&slp_r_match=0&soz=1&srpvid=525248f6e64d0200&ss_all=0&ssb=empty&sshis=0&top_ufis=1&lang_click=other;cdl=fr;lang_changed=1"
r = session.get(url)
with open("test.html", "wb") as f:
    f.write(r.content)

print("completed")

無法建立新連接：[Errno 11001] getaddrinfo failed'))

問題描述

1 個解決方案

解決方案1
0 2021-05-17 10:59:08

無法建立新連接：[Errno 11001] getaddrinfo failed'))

問題描述

1 個解決方案

解決方案1 0 2021-05-17 10:59:08

解決方案1
0 2021-05-17 10:59:08