![](/img/trans.png)
[英]Azure Schema Registry client library python not working - Failed to establish a new connection: [Errno 11001] getaddrinfo failed
[英]Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))
我試圖在 booking.com 上抓取一些數據,但我遇到了這個錯誤:
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='www.booking.com%0a', port=443): Max retries exceeded with url: /hotel/fr/elyseesunion.fr.html?label=gen173nr-1FCA0oTUIMZWx5c2Vlc3VuaW9uSA1YBGhNiAEBmAENuAEXyAEM2AEB6AEB-AECiAIBqAIDuAL_5ZqEBsACAdICJDcxYjgyZmI2LTFlYWQtNGZjOS04Y2U2LTkwNTQyZjI5OWY1YtgCBeACAQ&sid=c9f6a7c7371b88db9274005810b6f9e1&dest_id=-1456928&dest_type=city&group_adults=2&group_children=0&hapos=1&hpos=1&no_rooms=1&sr_order=popularity&srepoch=1621245602&srpvid=00bd465162de01a4&ucfs=1&from=searchresults%0A;highlight_room= (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x0000019BF5ECBBB0>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))
我知道這個,網站阻止我抓取他的數據。
我在這個網站上嘗試了一些答案,但沒有一個有效。
這是我的腳本:
import numpy as np
import time
from random import randint
import requests
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import re
import random
#headers= {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'}
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language': 'fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3',
'Referer': 'https://www.espncricinfo.com/',
'Upgrade-Insecure-Requests': '1',
'Connection': 'keep-alive',
'Pragma': 'no-cache',
'Cache-Control': 'no-cache',
}
url0 = 'https://www.booking.com/searchresults.fr.html?label=gen173nr-1DCA0oTUIMZWx5c2Vlc3VuaW9uSA1YBGhNiAEBmAENuAEXyAEM2AED6AEB-AECiAIBqAIDuAL_5ZqEBsACAdICJDcxYjgyZmI2LTFlYWQtNGZjOS04Y2U2LTkwNTQyZjI5OWY1YtgCBOACAQ;sid=303509179a2849df63e4d1e5bc1ab1e3;dest_id=-1456928;dest_type=city&'
links1 = []
results = requests.get(url0, headers = headers)
soup = BeautifulSoup(results.text, "html.parser")
links1 = [a['href'] for a in soup.find("div", {"class": "hotellist sr_double_search"}).find_all('a', href=True)]
root_url = 'https://www.booking.com'
urls1 = [ '{root}{i}'.format(root=root_url, i=i) for i in links1 ]
#print(urls1[0])
for url in urls1:
results = requests.get(url)
time.sleep(random.random()*10)
soup = BeautifulSoup(results.text, "html.parser")
div = soup.find("div", {"class": "hp_desc_important_facilities clearfix hp_desc_important_facilities--bui"})
pointfort = [x['data-name-en'] for x in div.select('div[class*="important_facility"]')]
print(pointfort)
如您所見,我嘗試time
,我嘗試了headers
。 我也試過timeout
等等。
任何想法如何解決這個問題?
也許如果我等一段時間?
我正在使用 requests_html 庫。 使用此文檔鏈接。
試試這個代碼片段。 它對我有用。 它正在抓取頁面並將其存儲在 HTML 頁面中。
from requests_html import HTMLSession
session = HTMLSession()
url = "https://www.booking.com/searchresults.en-gb.html?label=gen173nr-1FCA0oTUIMZWx5c2Vlc3VuaW9uSA1YBGhNiAEBmAENuAEXyAEM2AEB6AEB-AECiAIBqAIDuAL_5ZqEBsACAdICJDcxYjgyZmI2LTFlYWQtNGZjOS04Y2U2LTkwNTQyZjI5OWY1YtgCBeACAQ&sid=7685a7b3f07c84e4aadff993a229c309&tmpl=searchresults&class_interval=1&dest_id=-1456928&dest_type=city&dtdisc=0&inac=0&index_postcard=0&label_click=undef&lang=en-gb&offset=0&postcard=0&raw_dest_type=city&room1=A%2CA&sb_price_type=total&shw_aparth=1&slp_r_match=0&soz=1&srpvid=525248f6e64d0200&ss_all=0&ssb=empty&sshis=0&top_ufis=1&lang_click=other;cdl=fr;lang_changed=1"
r = session.get(url)
with open("test.html", "wb") as f:
f.write(r.content)
print("completed")
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.