[英]Continue on exception in Python
我正在研究一系列腳本,這些腳本從數據庫中提取URL,並使用textstat包根據一組預定義的計算來計算頁面的可讀性。 下面的函數獲取一個URL(來自CouchDB),計算定義的可讀性分數,然后將分數保存回相同的CouchDB文檔。
我遇到的問題是錯誤處理。 例如,Flesch Reading Ease分數計算需要計算頁面上句子總數。 如果此結果返回零,則拋出異常。 有沒有辦法捕獲此異常,在數據庫中保存該異常的記錄,然后移至列表中的下一個URL? 我可以在下面的功能(首選)中執行此操作,還是需要編輯程序包本身?
我知道以前曾問過這個問題。 如果您知道有人可以回答我的問題,請指出我的方向。 到目前為止,我的搜索沒有任何結果。 提前致謝。
def get_readability_data(db, url, doc_id, rank, index):
readability_data = {}
readability_data['url'] = url
readability_data['rank'] = rank
user_agent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)'
headers = { 'User-Agent' : user_agent }
try:
req = urllib.request.Request(url)
response = urllib.request.urlopen(req)
content = response.read()
readable_article = Document(content).summary()
soup = BeautifulSoup(readable_article, "lxml")
text = soup.body.get_text()
try:
readability_data['flesch_reading_ease'] = textstat.flesch_reading_ease(text)
readability_data['smog_index'] = textstat.smog_index(text)
readability_data['flesch_kincaid_grade'] = textstat.flesch_kincaid_grade(text)
readability_data['coleman_liau'] = textstat.coleman_liau_index(text)
readability_data['automated_readability_index'] = textstat.automated_readability_index(text)
readability_data['dale_chall_score'] = textstat.dale_chall_readability_score(text)
readability_data['linear_write_formula'] = textstat.linsear_write_formula(text)
readability_data['gunning_fog'] = textstat.gunning_fog(text)
readability_data['total_words'] = textstat.lexicon_count(text)
readability_data['difficult_words'] = textstat.difficult_words(text)
readability_data['syllables'] = textstat.syllable_count(text)
readability_data['sentences'] = textstat.sentence_count(text)
readability_data['readability_consensus'] = textstat.text_standard(text)
readability_data['readability_scores_date'] = time.strftime("%a %b %d %H:%M:%S %Y")
# use the doc_id to make sure we're saving this in the appropriate place
readability = json.dumps(readability_data, sort_keys=True, indent=4 * ' ')
doc = db.get(doc_id)
data = json.loads(readability)
doc['search_details']['search_details'][index]['readability'] = data
#print(doc['search_details']['search_details'][index])
db.save(doc)
time.sleep(.5)
except: # catch *all* exceptions
e = sys.exc_info()[0]
write_to_page( "<p>---ERROR---: %s</p>" % e )
except urllib.error.HTTPError as err:
print(err.code)
這是我收到的錯誤:
Error(ASL): Sentence Count is Zero, Cannot Divide
Error(ASyPW): Number of words are zero, cannot divide
Traceback (most recent call last):
File "new_get_readability.py", line 114, in get_readability_data
readability_data['flesch_reading_ease'] = textstat.flesch_reading_ease(text)
File "/Users/jrs/anaconda/lib/python3.5/site-packages/textstat/textstat.py", line 118, in flesch_reading_ease
FRE = 206.835 - float(1.015 * ASL) - float(84.6 * ASW)
TypeError: unsupported operand type(s) for *: 'float' and 'NoneType'
這是調用該函數的代碼:
if __name__ == '__main__':
db = connect_to_db(parse_args())
print("~~~~~~~~~~" + " GETTING IDs " + "~~~~~~~~~~")
ids = get_ids(db)
for i in ids:
details = get_urls(db, i)
for d in details:
get_readability_data(db, d['url'], d['id'], d['rank'], d['index'])
通常, try: except:
塊盡可能小。 我會將您的textstat
函數包裝在某種裝飾器中,該裝飾器捕獲您期望的異常,並返回函數輸出和捕獲的異常。
例如:
def catchExceptions(exception): #decorator with args (sorta boilerplate)
def decorator(func):
def wrapper(*args, **kwargs):
try:
retval = func(*args, **kwargs)
except exception as e:
return None, e
else:
return retval, None
return wrapper
return decorator
@catchExceptions(ZeroDivisionError)
def testfunc(x):
return 11/x
print testfunc(0)
print '-----'
print testfunc(3)
印刷品:
(None, ZeroDivisionError('integer division or modulo by zero',))
-----
(3, None)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.