简体   繁体   English

url请求Python超出了最大重试次数

[英]Max retries exceeded with url requests Python

I am trying to web scrape this page and the code i use is this:我正在尝试抓取此页面,我使用的代码是这样的:

page = get("https://www.uobgroup.com/online-rates/gold-and-silver-prices.page")

I get this error when i run this code:运行此代码时出现此错误:

Traceback (most recent call last):
  File "/Users/lakesh/WebScraping/Gold.py", line 46, in <module>
    page = get("https://www.uobgroup.com/online-rates/gold-and-silver-prices.page")
  File "/Library/Python/2.7/site-packages/requests/api.py", line 72, in get
    return request('get', url, params=params, **kwargs)
  File "/Library/Python/2.7/site-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/Library/Python/2.7/site-packages/requests/sessions.py", line 512, in request
    resp = self.send(prep, **send_kwargs)
  File "/Library/Python/2.7/site-packages/requests/sessions.py", line 622, in send
    r = adapter.send(request, **kwargs)
  File "/Library/Python/2.7/site-packages/requests/adapters.py", line 511, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='www.uobgroup.com', port=443): Max retries exceeded with url: /online-rates/gold-and-silver-prices.page (Caused by SSLError(SSLError(1, u'[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:590)'),))

Tried this as well:也试过这个:

page = get("https://www.uobgroup.com/online-rates/gold-and-silver-prices.page",verify=False)

This doesn't work as well.这也不行。 Need some guidance.需要一些指导。

Full code:完整代码:

from requests import get
import requests
from requests.exceptions import RequestException
from contextlib import closing
from bs4 import BeautifulSoup
from collections import defaultdict
import json

requests.packages.urllib3.util.ssl_.DEFAULT_CIPHERS = 'DES-CBC3-SHA'
page = get("https://www.uobgroup.com/online-rates/gold-and-silver-prices.page")
html = BeautifulSoup(page.content, 'html.parser')
result = defaultdict(list)
last_table = html.find_all('table')[-1]

I added the verify=False option, and also took out the line that is setting the cypher.我添加了verify=False选项,还取出了设置密码的行。 Once I did this, your code worked for me in Python 3...sometimes.一旦我这样做了,你的代码就可以在 Python 3 中为我工作......有时。 It works once, and then seems to not work for a while.它工作一次,然后似乎有一段时间不起作用。 My guess is that the site is rate-limiting access, possibly based on the agent signature it sees, trying to limit bot access.我的猜测是该站点是限速访问,可能基于它看到的代理签名,试图限制机器人访问。 I printed last_table when it worked, and here's what I got:我在last_table工作时打印了它,这就是我得到的:

<table class="responsive-table-rates table table-striped table-bordered" id="nova-funds-list-table">
<tbody>
<tr>
<td style="background-color: #002265; text-align: center; color: #ffffff;">DESCRIPTION</td>
<td style="background-color: #002265; text-align: center; color: #ffffff;">CURRENCY</td>
<td style="background-color: #002265; text-align: center; color: #ffffff;">UNIT</td>
<td style="background-color: #002265; text-align: center; color: #ffffff;">BANK SELLS</td>
<td style="background-color: #002265; text-align: center; color: #ffffff;">BANK BUYS</td>
<td style="text-align: left; display: none;"> </td>
<td style="text-align: left; display: none;"> </td>
</tr>
</tbody>
</table>

I am dumping the incoming contents to a file.我将传入的内容转储到文件中。 When it works, I get readable HTML.当它工作时,我得到可读的 HTML。 When it doesn't work, I get a few readable lines at the top, and then a bunch of gibberish that may be some complex Javascript.当它不起作用时,我会在顶部看到几行可读的行,然后是一堆乱码,可能是一些复杂的 Javascript。 Not sure what that is.不确定那是什么。 When it doesn't work, I get this:当它不起作用时,我得到这个:

Traceback (most recent call last): File "/Users/stevenjohnson/lab/so/ReadAFile.py", line 8, in last_table = html.find_all('table')[-1] IndexError: list index out of range Traceback(最近一次通话最后一次):文件“/Users/stevenjohnson/lab/so/ReadAFile.py”,第 8 行,在 last_table = html.find_all('table')[-1] IndexError: list index out of range

I get back a 200 status code in either case.无论哪种情况,我都会返回 200 状态码。

Here's my version of the code:这是我的代码版本:

from requests import get
from bs4 import BeautifulSoup
from collections import defaultdict

page = get("https://www.uobgroup.com/online-rates/gold-and-silver-prices.page", verify=False)
html = BeautifulSoup(page.content, 'html.parser')
result = defaultdict(list)
last_table = html.find_all('table')[-1]
print(last_table)

I'm on a Mac.我在 Mac 上。 Maybe you're not, and the certificate chains on your machine are different than on mine, and so you're not able to get as far as I can.也许您不是,并且您机器上的证书链与我的不同,因此您无法尽我所能。 I wanted you to know, however, that your code does work for me with just verify=False .但是,我想让您知道,您的代码只需verify=False即可为我工作。

I am trying to web scrape this page and the code i use is this:我正在尝试通过网页抓取此页面,而我使用的代码是这样的:

page = get("https://www.uobgroup.com/online-rates/gold-and-silver-prices.page")

I get this error when i run this code:运行此代码时出现此错误:

Traceback (most recent call last):
  File "/Users/lakesh/WebScraping/Gold.py", line 46, in <module>
    page = get("https://www.uobgroup.com/online-rates/gold-and-silver-prices.page")
  File "/Library/Python/2.7/site-packages/requests/api.py", line 72, in get
    return request('get', url, params=params, **kwargs)
  File "/Library/Python/2.7/site-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/Library/Python/2.7/site-packages/requests/sessions.py", line 512, in request
    resp = self.send(prep, **send_kwargs)
  File "/Library/Python/2.7/site-packages/requests/sessions.py", line 622, in send
    r = adapter.send(request, **kwargs)
  File "/Library/Python/2.7/site-packages/requests/adapters.py", line 511, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='www.uobgroup.com', port=443): Max retries exceeded with url: /online-rates/gold-and-silver-prices.page (Caused by SSLError(SSLError(1, u'[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:590)'),))

Tried this as well:也尝试过这个:

page = get("https://www.uobgroup.com/online-rates/gold-and-silver-prices.page",verify=False)

This doesn't work as well.这也不行。 Need some guidance.需要一些指导。

Full code:完整代码:

from requests import get
import requests
from requests.exceptions import RequestException
from contextlib import closing
from bs4 import BeautifulSoup
from collections import defaultdict
import json

requests.packages.urllib3.util.ssl_.DEFAULT_CIPHERS = 'DES-CBC3-SHA'
page = get("https://www.uobgroup.com/online-rates/gold-and-silver-prices.page")
html = BeautifulSoup(page.content, 'html.parser')
result = defaultdict(list)
last_table = html.find_all('table')[-1]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM