TypeError: cannot concatenate 'str' and 'NoneType' objects when placing the custom url in scrapy.Request()

Question

I get a url that cannot be used to fetch data from next page, so created a base_url = 'http://www.marinetraffic.com' variable and passed it scrapy request. port_homepage_url = base_url + port_homepage_url . It works fine, when i yeild the result like this. yield {'a': port_homepage_url, 'b':item['port_name']} I get this result i wanted.

http://www.marinetraffic.com/en/ais/index/ships/range/port_id:20585/port_name:FUJAIRAH%20ANCH,FUJAIRAH ANCH

however if place it in scrapy request yield scrapy.Request(port_homepage_url, callback=self.parse, meta={'item': item}) i get error

port_homepage_url = base_url +  port_homepage_url
TypeError: cannot concatenate 'str' and 'NoneType' objects

here is code

class GetVessel(scrapy.Spider):
    name = "getvessel"
    allowed_domains = ["marinetraffic.com"]
    start_urls = [
        'http://www.marinetraffic.com/en/ais/index/ports/all/flag:AE',
    ]


    def parse(self, response):
        item = VesseltrackerItem()
        base_url = 'http://www.marinetraffic.com'
        for ports in response.xpath('//table/tr[position()>1]'):
            item['port_name'] = ports.xpath('td[2]/a/text()').extract_first()
            port_homepage_url = ports.xpath('td[7]/a/@href').extract_first()
            port_homepage_url = base_url +  port_homepage_url
            yield scrapy.Request(port_homepage_url, callback=self.parse, meta={'item': item})

Answer 1

The problem does not happen on the initial start URL page, but happens later on when subsequent requests are processed. Take for example this page . There are no links in the 7-th td element and, hence, ports.xpath('td[7]/a/@href').extract_first() returns None which results in a failure on the port_homepage_url = base_url + port_homepage_url line.

How to approach the problem depends on what were you planning to do on the "port" pages. From what I understand, you did not mean to actually handle the "port" page requests with self.parse and need to have a separate callback with different logic inside.

TypeError: cannot concatenate 'str' and 'NoneType' objects when placing the custom url in scrapy.Request()

Question

1 answers

solution1
2 ACCPTED 2016-09-30 15:20:21

TypeError: cannot concatenate 'str' and 'NoneType' objects when placing the custom url in scrapy.Request()

Question

1 answers

solution1 2 ACCPTED 2016-09-30 15:20:21

solution1
2 ACCPTED 2016-09-30 15:20:21