简体   繁体   English

来自 selenium 的 PhantomJS - Python

[英]PhantomJS from selenium - Python

I'm trying to use PhantomJS (in Windows) with Selenium in Python to do some web scraping.我正在尝试使用 PhantomJS(在 Windows 中)和 Python 中的 Selenium 来进行一些网页抓取。
I've downloaded the latest PhantomJS build from the website, then unzip it.我已经从网站下载了最新的 PhantomJS 版本,然后解压缩它。 After that I've tried在那之后我试过了

from selenium import webdriver
browser = webdriver.PhantomJS()

The response was回应是

WebDriverException: 'phantomjs' executable needs to be in PATH.

Then I've tried to add the path, like然后我尝试添加路径,例如

browser = webdriver.PhantomJS('path_to/phantomjs.exe')

I've also tried to put an 'r' before the path.我也试过在路径之前放一个“r”。 The response was an exxeption in HTML:响应是 HTML 中的一个例外:

WebDriverException: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Notification: Gateway Timeout</title>

<style type="text/css">
body {
  font-family: Arial, Helvetica, sans-serif;
  font-size: 14px;
  color:#333333;
  background-color: #ffffff;
}
h1 {
  font-size: 18px;
  font-weight: bold;
  text-decoration: none;
  padding-top: 0px;
  color: #2970A6;
}
a:link {
    color: #2970A6;
  text-decoration: none;
}
a:hover {
    color: #2970A6;
  text-decoration: underline;
}
p.buttonlink {
  margin-bottom: 24px;
}
.copyright {
  font-size: 12px;
  color: #666666;
  margin: 5px 5px 0px 30px;

}
.details {
  font-size: 14px;
  color: #969696;
  border: none;
  padding: 20px 20px 20px 20px;
  margin: 0px 10px 10px 35px;
}

.shadow {
  border: 3px solid #9f9f9f;
  padding: 10px 25px 10px 25px;
  margin: 10px 35px 0px 30px;
  background-color: #ffffff;
  width: 600px;

  -moz-box-shadow: 3px 3px 3px #cccccc;
  -webkit-box-shadow: 3px 3px 3px #cccccc;
  box-shadow: 3px 3px 3px #cccccc;
  /* For IE 8 */
  -ms-filter: "progid:DXImageTransform.Microsoft.Shadow(Strength=5, Direction=135, Color='cccccc')";
  /* For IE 5.5 - 7 */
  filter: progid:DXImageTransform.Microsoft.Shadow(Strength=5, Direction=135, Color='cccccc');
}
.logo {
  border: none;
  margin: 5px 5px 0px 30px;
}
</style>

</head>

<body>
<div class="logo"></div><p>&nbsp;</p>
<div class="shadow">
<h1>This Page Cannot Be Displayed</h1>


<p>
The system cannot communicate with the external server (&nbsp;127.0.0.1&nbsp;).
The Internet server may be busy, may be permanently down, or may be
unreachable because of network problems.
</p>

<p>
Please check the spelling of the Internet address entered.
If it is correct, try this request later.
</p>



<p>
If you have questions, please contact
your corporate network administrator 
and provide the codes shown below.
</p>

</div>

<div class="details"><p>
Date: Mon, 30 May 2016 12:30:14 CEST<br />
Username: <br />
Source IP: 10.202.210.98<br />
URL: POST http://127.0.0.1/wd/hub/session<br />
Category: Uncategorized URLs<br />
Reason: UNKNOWN<br />
Notification: GATEWAY_TIMEOUT
</p></div>
</body>
</html>

I've open this code into a Chrome session and it opens the firewall page of my corporate.我已将此代码打开到 Chrome 会话中,并打开了我公司的防火墙页面。 The message is "The system cannot communicate with the external server ( 127.0.0.1 ).".消息是“系统无法与外部服务器 (127.0.0.1) 通信。”。 I can webscrape with Chrome or Firefox drivers, but I have this problem with PhantomJS.我可以使用 Chrome 或 Firefox 驱动程序进行网页抓取,但是 PhantomJS 存在此问题。
Can you help me?你能帮助我吗?

try to use absolute path as below where you have installed and also set 'NO_PROXY' environment for '127.0.0.1'尝试使用如下安装的绝对路径,并为“127.0.0.1”设置“NO_PROXY”环境

os.environ['NO_PROXY'] = '127.0.0.1'
driver = webdriver.PhantomJS(
 executable_path=r'C:\Python\Python35-32\Lib\site-packages\phantomjs-2.1.1-windows\bin\phantomjs')

I'm trying to get this working under Windows, too...我也试图让它在 Windows 下工作......

I'm superclassing the WebDriver, and I'm passing the ABSOLUTE path the executable is in into the __init__ method for the webdriver when initializing the superclass.我正在对 WebDriver 进行超类化,并且在初始化超类时将可执行文件所在的绝对路径传递到 webdriver 的 __init__ 方法中。

This has got me further - now I'm seeing这让我走得更远——现在我看到了

Exception WebDriverException: Message: 'phantomjs' executable may have wrong permissions.

which gives me the impression I'm on the right track... that suggests you could just pass the path the executable is in into the constructor for the webdriver object (as a simple string).这给我的印象是我走在正确的轨道上......这表明您可以将可执行文件所在的路径传递到 webdriver 对象的构造函数中(作为一个简单的字符串)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM