簡體   English   中英

python2.7 + multiprocessing + selenium:異常時重啟進程

[英]python2.7 + multiprocessing + selenium: restart process on exception

我似乎對使用多處理的python腳本有疑問。 它實際上要做的是獲取ID代碼列表,並啟動使用Selenium和PhantomJS作為驅動程序的進程,以導航到包含該ID代碼的URL,將數據提取到單個csv文件中,然后在所有進程完成后編譯另一個csv文件。 一切運行良好,但有時其中一個進程將返回一個異常,內容為:

Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "modtest.py", line 11, in worker
    do_work(item)
  File "/home/mdrouin/Dropbox/Work/Dev/Python/WynInvScrape/items.py", line 14, in do_work
    driver = webdriver.PhantomJS()
  File "/usr/lib/python2.7/site-packages/selenium/webdriver/phantomjs/webdriver.py", line 50, in __init__
    self.service.start()
  File "/usr/lib/python2.7/site-packages/selenium/webdriver/phantomjs/service.py", line 72, in start
    raise WebDriverException("Can not connect to GhostDriver")

我嘗試過以某種方式在出現異常的情況下重新啟動過程,但是無論如何,似乎正在發生的事情是一旦過程完成,程序掛起且無法繼續運行,或為此執行任何操作。 我本質上想在進程崩潰時重新啟動正在搜索的ID號,並在所有進程完成后繼續進行。 這是代碼的極簡縮版:

from selenium import webdriver
from time import sleep
from bs4 import BeautifulSoup as bs
import multiprocessing
import datetime, time, csv, glob


num_procs = 8

def do_work(rsrt):

        driver = webdriver.PhantomJS()

        try:
            driver.get('http://www.example.com/get.php?resort=' + rsrt)

            rows = []

            for row in soup.find_all('tr'):
                if row.find('input', {'name': 'booksubmit'}):
                    wyncheckin = row.find('td', {'class': 'searchAvailDate'}).string
                    wynnights = row.find('td', {'class': 'searchAvailNights'}).string
                    wynroom = row.find('td', {'class': 'searchAvailUnitType'}).string
                    rows.append([wynresort, wyncheckin, wynroom])


            driver.quit()

            with open('/home/mdrouin/Dropbox/Work/Dev/Python/WynInvScrape/availability/'+rsrt+'.csv', 'wb') as f:
                writer = csv.writer(f)
                writer.writerows(row for row in rows if row)

            print 'Process ' + rsrt + ' End: ' + str(time.strftime('%c'))


        except:
            driver.quit()



def worker():
    for item in iter( q.get, None ):
        do_work(item)
        q.task_done()
    q.task_done()


q = multiprocessing.JoinableQueue()

procs = []

for i in range(num_procs):
    procs.append( multiprocessing.Process(target=worker) )
    procs[-1].daemon = True
    procs[-1].start()

source = ['0017', '0113', '0020', '0013', '0038', '1028', '0115', '0105', '0041', '0037', '0043', '2026', '0165', '0164',
        '0033', '0126', '0116', '0103', '9135', '0185', '0206', '0053', '0062', '1020', '0019', '0042', '2028', '0213',
        '0211', '0163', '0073', '2020', '0214', '2140', '0084', '0193', '0095', '0064', '0196', '0028', '0068', '0074']

for item in source:
    q.put(item)

q.join()

for p in procs:
    q.put( None )

q.join()

for p in procs:
    p.join()

print "Finished"
print 'Writting core output: ' + str(time.strftime('%c'))
with open('availability.csv', 'wb') as outfile:
    for csvfile in glob.glob('/home/mdrouin/Dropbox/Work/Dev/Python/WynInvScrape/availability/*.csv'):
        for line in open(csvfile, 'r'):
            outfile.write(line)

print 'Process End: ' + str(time.strftime('%c'))

解決此類問題的方法之一是反復調用自身,類似於以下內容:

def do_work(rsrt):
    if failed:
        return do_work(rsrt)

當然,它會一直運行到解析為止,因此您可能需要傳遞一個計數器,如果它高於某個值,則返回false。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM