scraping multiple differnet urls simultaneously using scrapy splash using python

Question

I need to scrape multiple url concurrently using scrapy and splash.. i tried writing following code, but still no luck..
I have attached the URLS.. here..
'https://wunderground.com/forecast/us/ny/brooklyn/',
'https://www.wunderground.com/forecast/us/pa/california/',
'https://www.wunderground.com/forecast/us/ny/boston'

so I need to iterate through these URLs and then scrape it using scrapy.
I'm unable to get get the data using the multi url.. it's showing error. Please help
My question is how can I further scrape this list of URLs?

import scrapy
from scrapy_splash import SplashRequest
import scrapy_proxies

class WundergroundSpider(scrapy.Spider):
    name = 'wunderground'
    #allowed_domains = ['www.wunderground.com/forecast/us/ny/brooklyn']
    start_urls = []

    script = '''
    function main(splash, args)
        splash.private_mode_enabled = false
        assert(splash:go(args.url))
        assert(splash:wait(10))
        return splash:html()
    end
    '''
    
    def start_requests(self):
        urls = [
        'https://wunderground.com/forecast/us/ny/brooklyn/',
        'https://www.wunderground.com/forecast/us/pa/california/',
        'https://www.wunderground.com/forecast/us/ny/boston'
        ]
        for url in urls:
            yield SplashRequest(url, self.parse,  args={'wait': 8})

    def parse(self, response):
        tmps= {
            'tempHigh': response.xpath("//div[@class='forecast']/a[@class='navigate-to ng-star-inserted']/div[@class='obs-forecast']/span/span[@class='temp-hi']/text()")[0],
            'templow': response.xpath("//div[@class='forecast']/a[@class='navigate-to ng-star-inserted']/div[@class='obs-forecast']/span/span[@class='temp-lo']/text()")[0],
            'obsphs' : response.xpath("//div[@class='forecast']/a[@class='navigate-to ng-star-inserted']/div[@class='obs-forecast']/div[@class='obs-phrase']/text()")[0]
            }
        yield tmps

Answer 1

you created your lua script but never used.

try this: yield SplashRequest(url=url, callback=self.parse,endpoint='execute', args={'lua_source':self.script})

scraping multiple differnet urls simultaneously using scrapy splash using python

Question

1 answers

solution1
0 2022-10-08 23:41:42

scraping multiple differnet urls simultaneously using scrapy splash using python

Question

1 answers

solution1 0 2022-10-08 23:41:42

solution1
0 2022-10-08 23:41:42