简体   繁体   English

使用 selenium package 使用 python 打开浏览器

[英]open tor browser with python using selenium package

I am trying to scrape websites from tor browser.我正在尝试从 tor 浏览器中抓取网站。 I have done it with this code:我已经用这段代码完成了:

import webbrowser
url = 'http://www.google.com/'
webbrowser.register('firefox', None, webbrowser.BackgroundBrowser(r"C:\Users\Lenovo\Bureau\Tor Browser\Browser\firefox.exe"))
webbrowser.get('firefox').open(url)

but I'm actually more familiar with selenium library when it comes to web scraping.但在涉及 web 抓取时,我实际上更熟悉 selenium 库。 I tried with this code but a WebDriverException error is raised.我尝试使用此代码,但引发了WebDriverException错误。

from selenium import webdriver
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
binary = FirefoxBinary("C:/Users/Lenovo/Bureau/Tor Browser/Browser/firefox.exe")
driver = webdriver.Firefox(firefox_binary = binary)
url = 'https://www.google.com/'
driver.get(url)

i wonder what causes this error and how can i solve it.我想知道是什么导致了这个错误,我该如何解决。

Here is the full error i encountered:这是我遇到的完整错误:

WebDriverException                        Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17288/4279882525.py in <module>
      1 url = 'https://www.google.com/'
----> 2 driver.get(url)

~\anaconda3\envs\aa\lib\site-packages\selenium\webdriver\remote\webdriver.py in get(self, url)
    434         Loads a web page in the current browser session.
    435         """
--> 436         self.execute(Command.GET, {'url': url})
    437 
    438     @property

~\anaconda3\envs\aa\lib\site-packages\selenium\webdriver\remote\webdriver.py in execute(self, driver_command, params)
    422         response = self.command_executor.execute(driver_command, params)
    423         if response:
--> 424             self.error_handler.check_response(response)
    425             response['value'] = self._unwrap_value(
    426                 response.get('value', None))

~\anaconda3\envs\aa\lib\site-packages\selenium\webdriver\remote\errorhandler.py in check_response(self, response)
    245                 alert_text = value['alert'].get('text')
    246             raise exception_class(message, screen, stacktrace, alert_text)  # type: ignore[call-arg]  # mypy is not smart enough here
--> 247         raise exception_class(message, screen, stacktrace)
    248 
    249     def _value_or_default(self, obj: Mapping[_KT, _VT], key: _KT, default: _VT) -> _VT:

WebDriverException: Message: Reached error page: about:neterror?e=proxyConnectFailure&u=https%3A//www.google.com/&c=UTF-8&d=Firefox%20is%20configured%20to%20use%20a%20proxy%20server%20that%20is%20refusing%20connections.
Stacktrace:
WebDriverError@chrome://remote/content/shared/webdriver/Errors.jsm:181:5
UnknownError@chrome://remote/content/shared/webdriver/Errors.jsm:488:5
checkReadyState@chrome://remote/content/marionette/navigate.js:64:24
onNavigation@chrome://remote/content/marionette/navigate.js:312:39
emit@resource://gre/modules/EventEmitter.jsm:160:20
receiveMessage@chrome://remote/content/marionette/actors/MarionetteEventsParent.jsm:42:25

You didn't show error message so I don't know what is your problem.你没有显示错误信息,所以我不知道你的问题是什么。

When I try to use tor on Linux then it opens tor without errors当我尝试在 Linux 上使用tor时,它会正常打开tor
(only with warning "firefox_binary has been deprecated" but this is not problem) (仅警告"firefox_binary has been deprecated" ,但这不是问题)
but later it doesn't load page - get(url) - and it doesn't show error.但后来它不加载页面 - get(url) - 并且不显示错误。
Maybe tor is safe browser because it blocks some functions which Selenium needs to control browser.也许tor是安全的浏览器,因为它阻止了Selenium需要控制浏览器的一些功能。


But if you run tor.network then you can use it as proxy server with normal Firefox .但是如果你运行tor.network那么你可以将它用作正常的proxy server Firefox

If page http://127.0.0.1:9050 shows "This is a SOCKs proxy, not an HTTP proxy."如果页面http://127.0.0.1:9050显示"This is a SOCKs proxy, not an HTTP proxy."
then tor.network is running and you can do:然后tor.network正在运行,你可以这样做:

from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.proxy import Proxy, ProxyType

proxy = Proxy({
    'proxyType': ProxyType.MANUAL,
    'socksProxy': '127.0.0.1:9050',
    'socksVersion': 5,
})

options = Options()
options.proxy = proxy 
#options.binary_location = '/home/furas/bin/tor'  # doesn't work
#options.binary_location = '/path/to/normal/firefox'  # works

driver = webdriver.Firefox(options=options)  #  use path to standard `Firefox`

url = 'https://www.google.com/'
url = 'https://icanhazip.com'     # it shows your IP
#url = 'https://httpbin.org/get'  # it shows your IP and headers/cookies

driver.get(url)

PS.附言。 sometimes tor may use port 9150 instead of 9050 .有时tor可能会使用端口9150而不是9050

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM