[英]How do I load balance phantomjs using docker-compose and haproxy?
I have an application that uses selenium webdriver to interface with PhantomJS. 我有一个使用Selenium WebDriver与PhantomJS交互的应用程序。 To scale things up, I want to run multiple instances of PhantomJS and load balance them with haproxy.
为了扩大规模,我想运行多个PhantomJS实例并使用haproxy对其进行负载平衡。 This is for a local application, so I'm not concerned with deployment to a production environment or anything like that.
这是针对本地应用程序的,因此我不关心部署到生产环境或类似环境。
Here's my docker-compose.yml
file: 这是我
docker-compose.yml
文件:
version: '2'
services:
app:
build: .
volumes:
- .:/code
links:
- mongo
- haproxy
mongo:
image: mongo
phantomjs1:
image: wernight/phantomjs:latest
ports:
- 8910
entrypoint:
- phantomjs
- --webdriver=8910
- --ignore-ssl-errors=true
- --load-images=false
phantomjs2:
image: wernight/phantomjs:latest
ports:
- 8910
entrypoint:
- phantomjs
- --webdriver=8910
- --ignore-ssl-errors=true
- --load-images=false
phantomjs3:
image: wernight/phantomjs:latest
ports:
- 8910
entrypoint:
- phantomjs
- --webdriver=8910
- --ignore-ssl-errors=true
- --load-images=false
phantomjs4:
image: wernight/phantomjs:latest
ports:
- 8910
entrypoint:
- phantomjs
- --webdriver=8910
- --ignore-ssl-errors=true
- --load-images=false
haproxy:
image: haproxy
volumes:
- ./haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:ro
ports:
- 8910:8910
links:
- phantomjs1
- phantomjs2
- phantomjs3
- phantomjs4
As you can see, I've got four instances of phantomjs, one haproxy instance, and one app (written in python). 如您所见,我有四个phantomjs实例,一个haproxy实例和一个应用程序(用python编写)。
Here's my haproxy.cfg
: 这是我的
haproxy.cfg
:
global
log 127.0.0.1 local0
log 127.0.0.1 local1 notice
maxconn 4096
daemon
defaults
log global
mode http
option httplog
option dontlognull
retries 3
option redispatch
maxconn 2000
timeout connect 5000
timeout client 50000
timeout server 50000
frontend phantomjs_front
bind *:8910
stats uri /haproxy?stats
default_backend phantomjs_back
backend phantomjs_back
balance roundrobin
server phantomjs1 phantomjs1:8910 check
server phantomjs2 phantomjs2:8910 check
server phantomjs3 phantomjs3:8910 check
server phantomjs4 phantomjs4:8910 check
I know I need to use sticky sessions or something in haproxy to get this to work, but I don't know how to do that. 我知道我需要使用粘性会话或haproxy中的某些东西才能使它正常工作,但是我不知道该怎么做。
Here's a relevant snippet of my python app code that connects to this service: 这是我连接到该服务的python应用程序代码的相关代码段:
def get_page(url):
driver = webdriver.Remote(
command_executor='http://haproxy:8910',
desired_capabilities=DesiredCapabilities.PHANTOMJS
)
driver.get(url)
source = driver.page_source
driver.close()
return source
The error I get when I try to run this code is this: 我尝试运行此代码时遇到的错误是:
phantomjs2_1 | [ERROR - 2016-07-12T23:35:25.454Z] RouterReqHand - _handle.error - {"name":"Variable Resource Not Found","message":"{\"headers\":{\"Accept\":\"application/json\",\"Accept-Encoding\":\"identity\",\"Connection\":\"close\",\"Content-Length\":\"96\",\"Content-Type\":\"application/json;charset=UTF-8\",\"Host\":\"172.19.0.7:8910\",\"User-Agent\":\"Python-urllib/3.5\"},\"httpVersion\":\"1.1\",\"method\":\"POST\",\"post\":\"{\\\"url\\\": \\\"\\\\\\\"http://www.REDACTED.com\\\\\\\"\\\", \\\"sessionId\\\": \\\"4eff6a60-4889-11e6-b4ad-095b9e1284ce\\\"}\",\"url\":\"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url\",\"urlParsed\":{\"anchor\":\"\",\"query\":\"\",\"file\":\"url\",\"directory\":\"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/\",\"path\":\"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url\",\"relative\":\"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url\",\"port\":\"\",\"host\":\"\",\"password\":\"\",\"user\":\"\",\"userInfo\":\"\",\"authority\":\"\",\"protocol\":\"\",\"source\":\"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url\",\"queryKey\":{},\"chunks\":[\"session\",\"4eff6a60-4889-11e6-b4ad-095b9e1284ce\",\"url\"]}}","line":80,"sourceURL":"phantomjs://code/router_request_handler.js","stack":"_handle@phantomjs://code/router_request_handler.js:80:82"}
phantomjs2_1 |
phantomjs2_1 | phantomjs://platform/console++.js:263 in error
app_1 | Traceback (most recent call last):
app_1 | File "selenium_process.py", line 69, in <module>
app_1 | main()
app_1 | File "selenium_process.py", line 61, in main
app_1 | source = get_page(args.url)
app_1 | File "selenium_process.py", line 52, in get_page
app_1 | driver.get(url)
app_1 | File "/usr/local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 248, in get
app_1 | self.execute(Command.GET, {'url': url})
app_1 | File "/usr/local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 236, in execute
app_1 | self.error_handler.check_response(response)
app_1 | File "/usr/local/lib/python3.5/site-packages/selenium/webdriver/remote/errorhandler.py", line 163, in check_response
app_1 | raise exception_class(value)
app_1 | selenium.common.exceptions.WebDriverException: Message: Variable Resource Not Found - {"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"96","Content-Type":"application/json;charset=UTF-8","Host":"172.19.0.7:8910","User-Agent":"Python-urllib/3.5"},"httpVersion":"1.1","method":"POST","post":"{\"url\": \"\\\"http://www.REDACTED.com\\\"\", \"sessionId\": \"4eff6a60-4889-11e6-b4ad-095b9e1284ce\"}","url":"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url","urlParsed":{"anchor":"","query":"","file":"url","directory":"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/","path":"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url","relative":"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url","queryKey":{},"chunks":["session","4eff6a60-4889-11e6-b4ad-095b9e1284ce","url"]}}
app_1 |
So, how do I get load balancing working? 那么,如何使负载平衡起作用? What am I missing?
我想念什么?
I figured out that I need some kind of session management in haproxy. 我发现在haproxy中需要某种会话管理。 The selenium webdriver and phantomjs communicate via sessions.
硒webdriver和phantomjs通过会话进行通信。 The client sends a
POST /session
and receives a reply with the session id in the body. 客户端发送一个
POST /session
并在正文中收到一个带有会话ID的回复。 That reply looks something like this: 该回复看起来像这样:
{"sessionId":"5a27f2b0-48a5-11e6-97d7-7f5820fc7aa6","status":0,"value":{"browserName":"phantomjs","version":"2.1.1","driverName":"ghostdriver","driverVersion":"1.2.0","platform":"linux-unknown-64bit","javascriptEnabled":true,"takesScreenshot":true,"handlesAlerts":false,"databaseEnabled":false,"locationContextEnabled":false,"applicationCacheEnabled":false,"browserConnectionEnabled":false,"cssSelectorsEnabled":true,"webStorageEnabled":false,"rotatable":false,"acceptSslCerts":false,"nativeEvents":true,"proxy":{"proxyType":"direct"}}}
Then, as the session progresses, the session id is sent to the server as part of the URI in subsequent requests, such as GET /session/5a27f2b0-48a5-11e6-97d7-7f5820fc7aa6/source
. 然后,随着会话的进行,会话ID在后续请求中作为URI的一部分发送到服务器,例如
GET /session/5a27f2b0-48a5-11e6-97d7-7f5820fc7aa6/source
。 How can I grab this stuff to use it for sticky sessions in haproxy? 我该如何获取这些东西以用于haproxy的粘性会话?
You should be able to add cookies within haproxy config itself.. 您应该能够在haproxy配置本身中添加cookie。
cookie SERVERID insert indirect nocache
server httpd1 10.0.0.19:9443 cookie httpd1 check
server httpd2 10.0.0.18:9443 cookie httpd2 check
Then sessions will be stick through haproxy itself. 然后,会话将通过haproxy自身进行坚持。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.