繁体   English   中英

cloudscraper.exceptions.CloudflareChallengeError:检测到 Cloudflare 版本 2 挑战。 当我将 cloudscraper 模块与 python 一起使用时出错

[英]cloudscraper.exceptions.CloudflareChallengeError: Detected a Cloudflare version 2 challenge. Error when I used cloudscraper module with python

所以我试图绕过网站的 cloudflare 保护以从中抓取一些项目,但Cloudscraper python 模块无法正常工作。

每当我运行它时,我都会收到此错误:

cloudscraper.exceptions.CloudflareChallengeError: Detected a Cloudflare version 2 challenge, This feature is not available in the opensource (free) version.

这是我正在使用的简化代码:

import cloudscraper
from bs4 import BeautifulSoup as soup


url = "http://adventurequest.life/"
scraper = cloudscraper.create_scraper()
html = scraper.get(url).text
page_soup = soup(html, "html.parser")
print(page_soup)

你们知道如何解决这个问题吗?

cloudscraper 库在免费版本中不提供 cloudfare 版本 2 验证码的旁路。 因此,为了抓取此类网站,一种替代方法是使用第三方验证码求解器。

Cloud scraper 目前支持以下提供商:

您可以订阅他们各自的 API,并使用给定的 API 密钥和云刮板,就像他们的自述文件中的示例一样

scraper = cloudscraper.create_scraper(
  interpreter='nodejs',
  captcha={
    'provider': '2captcha',
    'api_key': 'your_2captcha_api_key'
  }
)

但如果您仍然面临问题,您可以尝试继续使用其他 Anti Bot Bypass 提供商。 例如,您可以尝试将第三方代理与请求一起使用

import requests
url = "https://the.url/to/scrape" 
proxy = "http://subscribed.proxy/" 
proxies = {"http": proxy, "https": proxy} 
response = requests.get(url, proxies=proxies, verify=False)
print(response.text)

最好的反机器人绕过代理服务之一是 Bright Data Web Unlocker

我在使用scrapy + cloudcraper时遇到了同样的错误,但是我设置cookie_enable=true就好了:

错误

Traceback (most recent call last):
cloudscraper.exceptions.CloudflareChallengeError: Detected a Cloudflare version 2 Captcha challenge, This feature is not available in the opensource (free) version.
2021-04-27 09:59:30 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.forever21.com/us/shop/catalog/category/f21/lingerie>
Traceback (most recent call last):
StopIteration: <403 
https://www.forever21.com/us/shop/catalog/category/f21/lingerie>

前:

import cloudscraper

browser = cloudscraper.create_scraper()

# in middleware
req = spider.browser.get(url,
                         proxies={'http': proxy,
                                  'https': https_proxy
                                  headers={'referer': url},
                         )

后:

'COOKIES_ENABLED': True

但是在bs4中默认添加了Cookies,所以我尝试了你的代码,发现它是正常的。

url = "http://adventurequest.life/"
scraper = cloudscraper.create_scraper()
html = scraper.get(url).text
page_soup = soup(html, "html.parser")
print(page_soup)

<!DOCTYPE doctype html>
<html lang="en" style="min-height: 100%;">
<head>
<!-- Required meta tags -->
<meta charset="utf-8"/>
<meta content="width=device-width, initial-scale=1, shrink-to-fit=no" name="viewport"/>
<meta content="Auto Quest Worlds" name="twitter:title"/>
<meta content="aqw bots, adventure quest bots, aqw cheat, aqw hack, aqw exploits, grimoire download, adventure quest worlds bot, leveling bot aqw, botting mmorpg, aqw private server, aqworlds private server, aqw server, aqw ps, aqw private, skidson, aqw pirata, servidor de aqw, adventure quest worlds private, dragonfable private server, adventure quest private server, free to play mmorpg, free online games, browser games, jogos online, jogos criancas, jogos de navegador, best aqw private server, best online mmorpg, best browser mmorpg, habbo servidor privado, habbo retro, habbo private server, runescape private server, high rates aqw, aqw items, aqworlds wiki" name="keywords"/>
<meta content="https://adventurequest.life/" name="twitter:url"/>

也许您应该检查您的机器 opennssl 版本,然后更新或升级 cloudcraper 版本。

我的cloudscraper版本是:cloudscraper ========> 1.2.58

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM