I wrote a python-daemon that parses some web pages. But sometimes there are errors due to the fact that some of the pages are not compatible with the parser.
Actually the question: how to make the script when errors did not stop, but just continued to work? And if possible, record all the errors in the log file.
Thanks.
Part of my code:
# row - array of links
for row in result:
page_html = getPage(row['url'])
self.page_data = row
if page_html != False:
self.deletePageFromIndex(row['id'])
continue
parser.mainlink = row['url']
parser.feed(page_html)
links = parser.links # get links from page
words = wordParser(page_html); # words from page
# insert data to DB
self.insertWords(words)
self.insertLinks(links)
# print row['url'] + ' parsed. sleep... '
self.markAsIndexed(row['id'])
sleep(uniform(1, 3)) # sleep script
Here's what you can do:
import logging
should_abort = False
def do_stuff():
global should_abort
...
def main():
while not should_abort: # your main loop
try:
do_stuff()
except MyException1, e:
logging.exception('GOT MyException1 %s', e)
except MyException2, e:
logging.exception('GOT MyException2 %s', e)
except Exception, e:
logging.exception('UNKNOWN EXCEPTION %s', e)
This still allows you to stop using ctrl-C, as KeyboardInterrupt
derives from BaseException
, not Exception
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.