简体   繁体   English

如何使用docker-compose和haproxy负载均衡phantomjs?

[英]How do I load balance phantomjs using docker-compose and haproxy?

I have an application that uses selenium webdriver to interface with PhantomJS. 我有一个使用Selenium WebDriver与PhantomJS交互的应用程序。 To scale things up, I want to run multiple instances of PhantomJS and load balance them with haproxy. 为了扩大规模,我想运行多个PhantomJS实例并使用haproxy对其进行负载平衡。 This is for a local application, so I'm not concerned with deployment to a production environment or anything like that. 这是针对本地应用程序的,因此我不关心部署到生产环境或类似环境。

Here's my docker-compose.yml file: 这是我docker-compose.yml文件:

version: '2'
services:
  app:
    build: .
    volumes:
      - .:/code
    links:
      - mongo
      - haproxy
  mongo:
    image: mongo
  phantomjs1:
    image: wernight/phantomjs:latest
    ports:
      - 8910
    entrypoint:
      - phantomjs
      - --webdriver=8910
      - --ignore-ssl-errors=true
      - --load-images=false
  phantomjs2:
    image: wernight/phantomjs:latest
    ports:
      - 8910
    entrypoint:
      - phantomjs
      - --webdriver=8910
      - --ignore-ssl-errors=true
      - --load-images=false
  phantomjs3:
    image: wernight/phantomjs:latest
    ports:
      - 8910
    entrypoint:
      - phantomjs
      - --webdriver=8910
      - --ignore-ssl-errors=true
      - --load-images=false
  phantomjs4:
    image: wernight/phantomjs:latest
    ports:
      - 8910
    entrypoint:
      - phantomjs
      - --webdriver=8910
      - --ignore-ssl-errors=true
      - --load-images=false
  haproxy:
    image: haproxy
    volumes:
      - ./haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:ro
    ports:
      - 8910:8910
    links:
      - phantomjs1
      - phantomjs2
      - phantomjs3
      - phantomjs4

As you can see, I've got four instances of phantomjs, one haproxy instance, and one app (written in python). 如您所见,我有四个phantomjs实例,一个haproxy实例和一个应用程序(用python编写)。

Here's my haproxy.cfg : 这是我的haproxy.cfg

global
    log 127.0.0.1   local0
    log 127.0.0.1   local1 notice
    maxconn 4096
    daemon

defaults
    log     global
    mode    http
    option  httplog
    option  dontlognull
    retries 3
    option redispatch
    maxconn 2000
    timeout connect 5000
    timeout client 50000
    timeout server 50000

frontend phantomjs_front
   bind *:8910
   stats uri /haproxy?stats
   default_backend phantomjs_back

backend phantomjs_back
   balance roundrobin
   server phantomjs1 phantomjs1:8910 check
   server phantomjs2 phantomjs2:8910 check
   server phantomjs3 phantomjs3:8910 check
   server phantomjs4 phantomjs4:8910 check

I know I need to use sticky sessions or something in haproxy to get this to work, but I don't know how to do that. 我知道我需要使用粘性会话或haproxy中的某些东西才能使它正常工作,但是我不知道该怎么做。

Here's a relevant snippet of my python app code that connects to this service: 这是我连接到该服务的python应用程序代码的相关代码段:

def get_page(url):
    driver = webdriver.Remote(
        command_executor='http://haproxy:8910',
        desired_capabilities=DesiredCapabilities.PHANTOMJS
    )

    driver.get(url)
    source = driver.page_source
    driver.close()

    return source

The error I get when I try to run this code is this: 我尝试运行此代码时遇到的错误是:

phantomjs2_1  | [ERROR - 2016-07-12T23:35:25.454Z] RouterReqHand - _handle.error - {"name":"Variable Resource Not Found","message":"{\"headers\":{\"Accept\":\"application/json\",\"Accept-Encoding\":\"identity\",\"Connection\":\"close\",\"Content-Length\":\"96\",\"Content-Type\":\"application/json;charset=UTF-8\",\"Host\":\"172.19.0.7:8910\",\"User-Agent\":\"Python-urllib/3.5\"},\"httpVersion\":\"1.1\",\"method\":\"POST\",\"post\":\"{\\\"url\\\": \\\"\\\\\\\"http://www.REDACTED.com\\\\\\\"\\\", \\\"sessionId\\\": \\\"4eff6a60-4889-11e6-b4ad-095b9e1284ce\\\"}\",\"url\":\"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url\",\"urlParsed\":{\"anchor\":\"\",\"query\":\"\",\"file\":\"url\",\"directory\":\"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/\",\"path\":\"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url\",\"relative\":\"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url\",\"port\":\"\",\"host\":\"\",\"password\":\"\",\"user\":\"\",\"userInfo\":\"\",\"authority\":\"\",\"protocol\":\"\",\"source\":\"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url\",\"queryKey\":{},\"chunks\":[\"session\",\"4eff6a60-4889-11e6-b4ad-095b9e1284ce\",\"url\"]}}","line":80,"sourceURL":"phantomjs://code/router_request_handler.js","stack":"_handle@phantomjs://code/router_request_handler.js:80:82"}
phantomjs2_1  | 
phantomjs2_1  |   phantomjs://platform/console++.js:263 in error
app_1         | Traceback (most recent call last):
app_1         |   File "selenium_process.py", line 69, in <module>
app_1         |     main()
app_1         |   File "selenium_process.py", line 61, in main
app_1         |     source = get_page(args.url)
app_1         |   File "selenium_process.py", line 52, in get_page
app_1         |     driver.get(url)
app_1         |   File "/usr/local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 248, in get
app_1         |     self.execute(Command.GET, {'url': url})
app_1         |   File "/usr/local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 236, in execute
app_1         |     self.error_handler.check_response(response)
app_1         |   File "/usr/local/lib/python3.5/site-packages/selenium/webdriver/remote/errorhandler.py", line 163, in check_response
app_1         |     raise exception_class(value)
app_1         | selenium.common.exceptions.WebDriverException: Message: Variable Resource Not Found - {"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"96","Content-Type":"application/json;charset=UTF-8","Host":"172.19.0.7:8910","User-Agent":"Python-urllib/3.5"},"httpVersion":"1.1","method":"POST","post":"{\"url\": \"\\\"http://www.REDACTED.com\\\"\", \"sessionId\": \"4eff6a60-4889-11e6-b4ad-095b9e1284ce\"}","url":"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url","urlParsed":{"anchor":"","query":"","file":"url","directory":"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/","path":"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url","relative":"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url","queryKey":{},"chunks":["session","4eff6a60-4889-11e6-b4ad-095b9e1284ce","url"]}}
app_1         |

So, how do I get load balancing working? 那么,如何使负载平衡起作用? What am I missing? 我想念什么?

UPDATE 更新

I figured out that I need some kind of session management in haproxy. 我发现在haproxy中需要某种会话管理。 The selenium webdriver and phantomjs communicate via sessions. 硒webdriver和phantomjs通过会话进行通信。 The client sends a POST /session and receives a reply with the session id in the body. 客户端发送一个POST /session并在正文中收到一个带有会话ID的回复。 That reply looks something like this: 该回复看起来像这样:

{"sessionId":"5a27f2b0-48a5-11e6-97d7-7f5820fc7aa6","status":0,"value":{"browserName":"phantomjs","version":"2.1.1","driverName":"ghostdriver","driverVersion":"1.2.0","platform":"linux-unknown-64bit","javascriptEnabled":true,"takesScreenshot":true,"handlesAlerts":false,"databaseEnabled":false,"locationContextEnabled":false,"applicationCacheEnabled":false,"browserConnectionEnabled":false,"cssSelectorsEnabled":true,"webStorageEnabled":false,"rotatable":false,"acceptSslCerts":false,"nativeEvents":true,"proxy":{"proxyType":"direct"}}}

Then, as the session progresses, the session id is sent to the server as part of the URI in subsequent requests, such as GET /session/5a27f2b0-48a5-11e6-97d7-7f5820fc7aa6/source . 然后,随着会话的进行,会话ID在后续请求中作为URI的一部分发送到服务器,例如GET /session/5a27f2b0-48a5-11e6-97d7-7f5820fc7aa6/source How can I grab this stuff to use it for sticky sessions in haproxy? 我该如何获取这些东西以用于haproxy的粘性会话?

You should be able to add cookies within haproxy config itself.. 您应该能够在haproxy配置本身中添加cookie。

cookie SERVERID insert indirect nocache
server  httpd1 10.0.0.19:9443 cookie httpd1 check 
server  httpd2 10.0.0.18:9443 cookie httpd2 check 

Then sessions will be stick through haproxy itself. 然后,会话将通过haproxy自身进行坚持。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 Linux 服务器上使用 docker-compose 安装 phantomjs 和 selenium? - How to install phantomjs and selenium with docker-compose on linux server? Docker-Compose HAProxy缺少前端 - Docker-Compose HAProxy missing frontend 如何访问使用 docker-compose 设置的 postgresql 数据库? - How do I access a postgresql DB that was setup with docker-compose? 如何使用Pycharm逐步完成使用Django和docker-compose的测试? - How do I use Pycharm to step through tests that use Django and docker-compose? 我无法使用 docker-compose Dockerize MySQL 和 Django App - I can not Dockerize MySQL and Django App using docker-compose 如何在不创建虚拟环境的情况下使用Django和docker-compose解决reportMissingModuleSource? - How can I solve the reportMissingModuleSource by using Django and docker-compose without creating a virtual environment? 如何使用 docker-compose 在容器中保存来自 selenium webdriver 的文件? - How can I save file from selenium webdriver in conteiner using docker-compose? 如何使用docker-compose运行Python Django和Celery? - How to run Python Django and Celery using docker-compose? 为什么在docker-compose中使用python出现ConnectionError? - Why ConnectionError using python in docker-compose? 无法使用 docker-compose 连接到 MongoDB - Unable to connect to MongoDB using docker-compose
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM