[英]open tor browser with python using selenium package
I am trying to scrape websites from tor browser.我正在尝试从 tor 浏览器中抓取网站。 I have done it with this code:
我已经用这段代码完成了:
import webbrowser
url = 'http://www.google.com/'
webbrowser.register('firefox', None, webbrowser.BackgroundBrowser(r"C:\Users\Lenovo\Bureau\Tor Browser\Browser\firefox.exe"))
webbrowser.get('firefox').open(url)
but I'm actually more familiar with selenium library when it comes to web scraping.但在涉及 web 抓取时,我实际上更熟悉 selenium 库。 I tried with this code but a WebDriverException error is raised.
我尝试使用此代码,但引发了WebDriverException错误。
from selenium import webdriver
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
binary = FirefoxBinary("C:/Users/Lenovo/Bureau/Tor Browser/Browser/firefox.exe")
driver = webdriver.Firefox(firefox_binary = binary)
url = 'https://www.google.com/'
driver.get(url)
i wonder what causes this error and how can i solve it.我想知道是什么导致了这个错误,我该如何解决。
Here is the full error i encountered:这是我遇到的完整错误:
WebDriverException Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17288/4279882525.py in <module>
1 url = 'https://www.google.com/'
----> 2 driver.get(url)
~\anaconda3\envs\aa\lib\site-packages\selenium\webdriver\remote\webdriver.py in get(self, url)
434 Loads a web page in the current browser session.
435 """
--> 436 self.execute(Command.GET, {'url': url})
437
438 @property
~\anaconda3\envs\aa\lib\site-packages\selenium\webdriver\remote\webdriver.py in execute(self, driver_command, params)
422 response = self.command_executor.execute(driver_command, params)
423 if response:
--> 424 self.error_handler.check_response(response)
425 response['value'] = self._unwrap_value(
426 response.get('value', None))
~\anaconda3\envs\aa\lib\site-packages\selenium\webdriver\remote\errorhandler.py in check_response(self, response)
245 alert_text = value['alert'].get('text')
246 raise exception_class(message, screen, stacktrace, alert_text) # type: ignore[call-arg] # mypy is not smart enough here
--> 247 raise exception_class(message, screen, stacktrace)
248
249 def _value_or_default(self, obj: Mapping[_KT, _VT], key: _KT, default: _VT) -> _VT:
WebDriverException: Message: Reached error page: about:neterror?e=proxyConnectFailure&u=https%3A//www.google.com/&c=UTF-8&d=Firefox%20is%20configured%20to%20use%20a%20proxy%20server%20that%20is%20refusing%20connections.
Stacktrace:
WebDriverError@chrome://remote/content/shared/webdriver/Errors.jsm:181:5
UnknownError@chrome://remote/content/shared/webdriver/Errors.jsm:488:5
checkReadyState@chrome://remote/content/marionette/navigate.js:64:24
onNavigation@chrome://remote/content/marionette/navigate.js:312:39
emit@resource://gre/modules/EventEmitter.jsm:160:20
receiveMessage@chrome://remote/content/marionette/actors/MarionetteEventsParent.jsm:42:25
You didn't show error message so I don't know what is your problem.你没有显示错误信息,所以我不知道你的问题是什么。
When I try to use tor
on Linux then it opens tor
without errors当我尝试在 Linux 上使用
tor
时,它会正常打开tor
(only with warning "firefox_binary has been deprecated"
but this is not problem) (仅警告
"firefox_binary has been deprecated"
,但这不是问题)
but later it doesn't load page - get(url)
- and it doesn't show error.但后来它不加载页面 -
get(url)
- 并且不显示错误。
Maybe tor
is safe browser because it blocks some functions which Selenium
needs to control browser.也许
tor
是安全的浏览器,因为它阻止了Selenium
需要控制浏览器的一些功能。
But if you run tor.network
then you can use it as proxy server
with normal Firefox
.但是如果你运行
tor.network
那么你可以将它用作正常的proxy server
Firefox
。
If page http://127.0.0.1:9050 shows "This is a SOCKs proxy, not an HTTP proxy."
如果页面http://127.0.0.1:9050显示
"This is a SOCKs proxy, not an HTTP proxy."
then tor.network
is running and you can do:然后
tor.network
正在运行,你可以这样做:
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.proxy import Proxy, ProxyType
proxy = Proxy({
'proxyType': ProxyType.MANUAL,
'socksProxy': '127.0.0.1:9050',
'socksVersion': 5,
})
options = Options()
options.proxy = proxy
#options.binary_location = '/home/furas/bin/tor' # doesn't work
#options.binary_location = '/path/to/normal/firefox' # works
driver = webdriver.Firefox(options=options) # use path to standard `Firefox`
url = 'https://www.google.com/'
url = 'https://icanhazip.com' # it shows your IP
#url = 'https://httpbin.org/get' # it shows your IP and headers/cookies
driver.get(url)
PS.附言。 sometimes
tor
may use port 9150
instead of 9050
.有时
tor
可能会使用端口9150
而不是9050
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.