簡體   English   中英

Nohup在Ubuntu EC2中運行Silenium Webscraper

[英]Nohup run silenium webscraper in Ubuntu ec2

我有一個使用硒的網絡刮板,我想在注銷后在后台事件中在Ubuntu EC2上運行,所以我嘗試使用nohup 我當前的代碼是:

webscrape.py

from pyvirtualdisplay import Display
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait # available since 2.4.0
from selenium.webdriver.support import expected_conditions as EC 

def main():

    display = Display(visible=0, size=(800, 600))
    display.start() #starts vitual display

    driver = webdriver.Firefox()

    ...do the webscraping...

    driver.close()
    display.stop()

if __name__ == "__main__": main()

當我登錄EC2並執行python webscrape.py它將正常運行。 但是,當我執行nohup python webscrape.py並注銷時,它將停止。 nohup.out日志中,出現以下錯誤:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/Cruz/Scripts/WebScrape/google_brand_web_scraper.py", line 175, in <module>
    if __name__ == "__main__": main()
  File "/usr/local/lib/python2.7/dist-packages/Cruz/Scripts/WebScrape/google_brand_web_scraper.py", line 120, in main
    website = GoogleBrandWebsiteScraper().brand_url_pull_from_google(i,driver) # get website for a brand
  File "/usr/local/lib/python2.7/dist-packages/Cruz/Scripts/WebScrape/google_brand_web_scraper.py", line 34, in brand_url_pull_from_google
    s = BeautifulSoup(driver.page_source)
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 436, in page_source
    return self.execute(Command.GET_PAGE_SOURCE)['value']
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 171, in execute
    response = self.command_executor.execute(driver_command, params)
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 349, in execute
    return self._request(command_info[0], url, body=data)
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 379, in _request
    self._conn.request(method, parsed_url.path, body, headers)
  File "/usr/lib/python2.7/httplib.py", line 973, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.7/httplib.py", line 1007, in _send_request
    self.endheaders(body)
  File "/usr/lib/python2.7/httplib.py", line 969, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 829, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 791, in send
    self.connect()
  File "/usr/lib/python2.7/httplib.py", line 772, in connect
    self.timeout, self.source_address)
  File "/usr/lib/python2.7/socket.py", line 571, in create_connection
    raise err
socket.error: [Errno 111] Connection refused

因此,很明顯,我注銷后會有些混亂。 任何提示表示贊賞。

您可能需要嘗試screen 我對nohup不熟悉,無法弄清楚您在那里遇到的問題,但是screen應該可以工作。

  1. 運行screen以創建一個新終端,您可以在其中進行工作。
  2. 運行你的代碼
  3. 按Ctrl + a,然后按d以從該終端分離(它將繼續在后台運行)。
  4. 運行screen -r將重新連接到該終端。

當您與終端“分離”時,可以與系統斷開連接,該分離的終端將繼續運行。 因此,在步驟3和4之間,您可以斷開連接。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM