简体   繁体   中英

looking for another way to avoid if statements

I'm new to python language and I started working with Scrapy which is amazing, and I would like to know if there is any way to write less and avoid lots of if statements also avoid repeaters in my code:

titleText = response.css('title::text').get()

    if response.status == 200:
        if len(response.css('title').extract()) == 0:
            self.data['crawl']['meta']['missingTitle'].append({
                'url': response.url,
                'latency': response.meta.get('download_latency')
            })
        elif len(response.css('title').extract()) == 1:
            if len(titleText) > 70:
                self.data['crawl']['meta']['longTitle'].append({
                    'url': response.url,
                    'text': titleText,
                    'latency': response.meta.get('download_latency')
                })
            elif len(titleText) < 50:
                self.data['crawl']['meta']['shortTitle'].append({
                    'url': response.url,
                    'text': titleText,
                    'latency': response.meta.get('download_latency')
                })
        else:
            self.data['crawl']['meta']['multipleTitleTag'].append({
                'url': response.url,
                'text': titleText,
                'latency': response.meta.get('download_latency')
            })
    elif response.status in self.handle_httpstatus_list:
        self.data['crawl']['blockers']['url'].append({
            'url': response.url,
            'code':response.status,
            'text': titleText,
            'latency': response.meta.get('download_latency')
    })

Thank you

You have a lot of repeating code, but all of that is hard to remove if you have such different possibilities.

All in one function

titleText = response.css('title::text').get()
dataObj = {
    'url': response.url,
    'latency': response.meta.get('download_latency')
}
title = response.css('title').extract()
tagName = 'missingTitle' #default
if response.status == 200:
    if len(title) == 0:
        tagName = 'missingTitle'
    else:
        dataObj['text'] = titleText
    if len(title) > 1:
        tagName = 'multipleTitleTag'
    elif len(title) ==1:
        if len(titleText) > 70:
            tagName = 'longTitle'
        elif len(titleText) < 50:
            tagName = 'shortTitle'
        else:
            #Really nothing done if title between 50 and 70?
            pass
    self.data['crawl']['meta'][tagName].append(dataObj)
    
elif response.status in self.handle_httpstatus_list:
    dataObj['code'] = response.status
    dataObj['text'] = titleText
    self.data['crawl']['meta']['multipleTitleTag'].append(dataObj)

Tag name extracted to another function

The title tag functionality could be moved to a helper function to make things a bit cleaner. There the tag name choice can be well structured with early returns. Altogether it could look sth like this (adjust details accordingly):

def get_tag_name(self, title, titleText):
    if len(title) == 0:
        return 'missingTitle'
    if len(title) > 1:
        return 'multipleTitleTag'
    if len(titleText) > 70:
        return 'longTitle'
    if len(titleText) < 50:
        return 'longTitle'
    return 'DEFAULT' #???

def the_main_one(self, response):
    titleText = response.css('title::text').get()
    dataObj = {
        'url': response.url,
        'latency': response.meta.get('download_latency')
    }
    tagName = 'multipleTitleTag' #default
    title = response.css('title').extract()
    if response.status == 200:
        tagName = self.get_tag_name(title, titleText)
        if len(title) == 0:
            dataObj['text'] = titleText
    elif response.status in self.handle_httpstatus_list:
        dataObj['code'] = response.status
        dataObj['text'] = titleText
    self.data['crawl']['meta'][tagName].append(dataObj) 

It might not be a lot shorter, but it definitely removes some clutter and helps in maintaining of the code later
If you had a lot of IFs in the same level, you could think about different approach. Now this seems about right (at least for me).

titleText = response.css('title::text').get()
    metaResponse = {'text':titleText, 'url':response.url, 'httpCode':response.status, 'latency':response.meta.get('download_latency')}

    if response.status == 200:
        if len(response.css('title').extract()) == 0:
            self.data['crawl']['meta']['missingTitle'].append(metaResponse)
        elif len(response.css('title').extract()) == 1:
            if len(titleText) > 70:
                self.data['crawl']['meta']['longTitle'].append(metaResponse)
            elif len(titleText) < 40:
                self.data['crawl']['meta']['shortTitle'].append(metaResponse)
        else:
            self.data['crawl']['meta']['multipleTitleTag'].append(metaResponse)
    elif response.status in self.handle_httpstatus_list:
        self.data['crawl']['blockers']['url'].append(metaResponse)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM