简体   繁体   中英

How do I reduce the number of try/catch statements here?

I'm currently working with Scrapy to pull company information from a website. However, the amount of data provided across the pages is vastly different; say, one company lists three of its team members, while another only lists two, or one company lists where its located, while another doesn't. Therefore, some XPaths may return null, so attempting to access them results in errors:

try: 
    item['industry'] = hxs.xpath('//*[@id="overview"]/div[2]/div[2]/p/text()[2]').extract()[0]
except IndexError:
    item['industry'] = "None provided"
try:
    item['URL'] = hxs.xpath('//*[@id="ContentPlaceHolder_lnkWebsite"]/text()').extract()[0]
except IndexError:
    item['URL'] = "None provided"
try:
    item['desc'] = hxs.xpath('//*[@id="overview"]/div[2]/div[4]/p/text()[1]').extract()[0]
except IndexError:
    item['desc'] = "None provided"
try:
    item['founded'] = hxs.xpath('//*[@id="ContentPlaceHolder_updSummary"]/div/div[2]/table/tbody/tr/td[1]/text()').extract()[0]
except IndexError:
    item['founded'] = "None provided"

My code uses many try/catch statements. Since each exception is specific to the field I am trying to populate, is there a cleaner way of working around this?

Use TakeFirst() output processor :

Returns the first non-null/non-empty value from the values received, so it's typically used as an output processor to single-valued fields.

from scrapy.contrib.loader.processor import TakeFirst

class MyItem(Item):
    industry = Field(output_processor=TakeFirst())
    ...

Then, inside the spider, you would not need try/catch at all:

item['industry'] = hxs.xpath('//*[@id="overview"]/div[2]/div[2]/p/text()[2]').extract()

In the latest version extract-first()use used for this. It returns None if search doesn't return anything. Thus you will have no errors.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM