无法使用 beautifulSoup 抓取网站

Question

我试图使用漂亮的汤 (bs4) 抓取页面，但是我在抓取数据时遇到了问题，我什至提到了这个答案中指出的标题Stackoverflow 问题这是我的代码

from bs4 import BeautifulSoup
import requests
headers = {
'Referer': 'hello',
 }
 r=requests.get
 ('https://www.doamin.com/bangalore/restaurants',headers=headers)
 print(r.status_code)

这是我得到的错误

requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

和这个

 raise RemoteDisconnected("Remote end closed connection without"
 http.client.RemoteDisconnected: Remote end closed connection without 
 response

我什至尝试使用用户代理

import requests
url = 'https://www.example.com/bangalore/restaurants'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 
(KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'}
response = requests.get(url, headers=headers)
print(response.content)

但仍然得到同样的错误！

谁能帮我吗？

Answer 1

我猜服务器通过检查有效 Chrome 版本列表（如果您在用户代理中指定 Chrome 浏览器）来更彻底地检查用户代理字符串。 您指定的版本 (41.0.2228) 未列在Chrome version history 中。 使用例如 41.0.2272 ：

import requests
url = 'https://www.example.com/bangalore/restaurants'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 
(KHTML, like Gecko) Chrome/41.0.2272.0 Safari/537.36'}
response = requests.get(url, headers=headers)
print(response.content)

Answer 2

Zomato（和许多其他数据收集网站）很可能已经采取措施来阻止数据抓取者或数据挖掘者。 只需使用他们的 API： https : //developers.zomato.com/api

无法使用 beautifulSoup 抓取网站

问题描述

2 个解决方案

解决方案1
1 2018-05-29 01:13:47

解决方案2
0 2018-05-24 05:15:13

无法使用 beautifulSoup 抓取网站

问题描述

2 个解决方案

解决方案1 1 2018-05-29 01:13:47

解决方案2 0 2018-05-24 05:15:13

解决方案1
1 2018-05-29 01:13:47

解决方案2
0 2018-05-24 05:15:13