简体   繁体   English

在Python中继续发生异常

[英]Continue on exception in Python

I'm working on a series of scripts that pulls URLs from a database and uses the textstat package to calculate the readability of the page based on a set of predefined calculations. 我正在研究一系列脚本,这些脚本从数据库中提取URL,并使用textstat包根据一组预定义的计算来计算页面的可读性。 The function below takes a url (from a CouchDB), calculates the defined readability scores, and then saves the scores back to the same CouchDB document. 下面的函数获取一个URL(来自CouchDB),计算定义的可读性分数,然后将分数保存回相同的CouchDB文档。

The issue I'm having is with error handling. 我遇到的问题是错误处理。 As an example, the Flesch Reading Ease score calculation requires a count of the total number of sentences on the page. 例如,Flesch Reading Ease分数计算需要计算页面上句子总数。 If this returns as zero, an exception is thrown. 如果此结果返回零,则抛出异常。 Is there a way to catch this exception, save a note of the exception in the database, and move on to the next URL in the list? 有没有办法捕获此异常,在数据库中保存该异常的记录,然后移至列表中的下一个URL? Can I do this in the function below (preferred), or will I need to edit the package itself? 我可以在下面的功能(首选)中执行此操作,还是需要编辑程序包本身?

I know variations of this question have been asked before. 我知道以前曾问过这个问题。 If you know of one that might answer my question, please point me in that direction. 如果您知道有人可以回答我的问题,请指出我的方向。 My search has been fruitless thus far. 到目前为止,我的搜索没有任何结果。 Thanks in advance. 提前致谢。

def get_readability_data(db, url, doc_id, rank, index):
    readability_data = {}
    readability_data['url'] = url
    readability_data['rank'] = rank
    user_agent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)'
    headers = { 'User-Agent' : user_agent }
    try:
        req = urllib.request.Request(url)
        response = urllib.request.urlopen(req)
        content = response.read()
        readable_article = Document(content).summary()
        soup = BeautifulSoup(readable_article, "lxml")
        text = soup.body.get_text()
        try:
            readability_data['flesch_reading_ease'] = textstat.flesch_reading_ease(text)
            readability_data['smog_index'] = textstat.smog_index(text)
            readability_data['flesch_kincaid_grade'] = textstat.flesch_kincaid_grade(text)
            readability_data['coleman_liau'] = textstat.coleman_liau_index(text)
            readability_data['automated_readability_index'] = textstat.automated_readability_index(text)
            readability_data['dale_chall_score'] = textstat.dale_chall_readability_score(text)
            readability_data['linear_write_formula'] = textstat.linsear_write_formula(text)
            readability_data['gunning_fog'] = textstat.gunning_fog(text)
            readability_data['total_words'] = textstat.lexicon_count(text)
            readability_data['difficult_words'] = textstat.difficult_words(text)
            readability_data['syllables'] = textstat.syllable_count(text)
            readability_data['sentences'] = textstat.sentence_count(text)
            readability_data['readability_consensus'] = textstat.text_standard(text)
            readability_data['readability_scores_date'] = time.strftime("%a %b %d %H:%M:%S %Y")

            # use the doc_id to make sure we're saving this in the appropriate place
            readability = json.dumps(readability_data, sort_keys=True, indent=4 * ' ')
            doc = db.get(doc_id)
            data = json.loads(readability)
            doc['search_details']['search_details'][index]['readability'] = data
            #print(doc['search_details']['search_details'][index])
            db.save(doc)
            time.sleep(.5)

        except: # catch *all* exceptions
            e = sys.exc_info()[0]
            write_to_page( "<p>---ERROR---: %s</p>" % e )

    except urllib.error.HTTPError as err:
        print(err.code)

This is the error I receive: 这是我收到的错误:

Error(ASL): Sentence Count is Zero, Cannot Divide
Error(ASyPW): Number of words are zero, cannot divide
Traceback (most recent call last):
  File "new_get_readability.py", line 114, in get_readability_data
    readability_data['flesch_reading_ease'] = textstat.flesch_reading_ease(text)
  File "/Users/jrs/anaconda/lib/python3.5/site-packages/textstat/textstat.py", line 118, in flesch_reading_ease
    FRE = 206.835 - float(1.015 * ASL) - float(84.6 * ASW)
TypeError: unsupported operand type(s) for *: 'float' and 'NoneType'

This is the code that calls the function: 这是调用该函数的代码:

if __name__ == '__main__':
    db = connect_to_db(parse_args())
    print("~~~~~~~~~~" + " GETTING IDs " + "~~~~~~~~~~")
    ids = get_ids(db)
    for i in ids:
        details = get_urls(db, i)
        for d in details:
            get_readability_data(db, d['url'], d['id'], d['rank'], d['index'])

It is generally good practice to keep try: except: blocks as small as possible. 通常, try: except:块尽可能小。 I would wrap your textstat functions in some sort of decorator that catches the exception you expect, and returns the function output and the exception caught. 我会将您的textstat函数包装在某种装饰器中,该装饰器捕获您期望的异常,并返回函数输出和捕获的异常。

for example: 例如:

def catchExceptions(exception):  #decorator with args (sorta boilerplate)
    def decorator(func):
        def wrapper(*args, **kwargs):
            try:
                retval = func(*args, **kwargs)
            except exception as e:
                return None, e
            else:
                return retval, None
        return wrapper
    return decorator

@catchExceptions(ZeroDivisionError)
def testfunc(x):
    return 11/x

print testfunc(0)
print '-----'
print testfunc(3)

prints: 印刷品:

(None, ZeroDivisionError('integer division or modulo by zero',))
-----
(3, None)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM