python scrapy different result when running from shell when as script

Question

I have my script when I run script my print statement gives 'None' value. But when the same thing is run from scrapy shell I can get result what i want; What can be reason for such different results; Code is below

import scrapy
from scrapy.crawler import CrawlerProcess

class TestSpiderSpider(scrapy.Spider):
    name = 'test_spider'
    allowed_domains = ['dvlaregistrations.direct.gov.uk']
    start_urls = ['https://dvlaregistrations.dvla.gov.uk/search/results.html?search=CO11CTD"&"action=index"&"pricefrom=0"&"priceto="&"prefixmatches="&"currentmatches="&"limitprefix="&"limitcurrent="&"limitauction="&"searched=true"&"openoption="&"language=en"&"prefix2=Search"&"super="&"super_pricefrom="&"super_priceto='
]

    def parse(self, response):
        price=response.css('div.resultsstrip p::text').get()
        print(price)
        print('---+---')
        all_prices = response.css('div.resultsstrip p::text')
        for element in all_prices:
            yield print(element.css('::text').get())

[url][1]=''

process = CrawlerProcess()
process.crawl(TestSpiderSpider)
process.start()

this script when run gives None value, but when response.css('div.resultsstrip p::text').get() '£250' shell gives value what is located

Answer 1

You should edit your original post with the new url in your comment because the one in your question doesn't point to the same address.

Also you are trying to extract the text from a selector that points to the text only, that is why it is returning None .

In the following line your selector list already targets the ::text of the element.

all_prices = response.css('div.resultsstrip p::text')

Which is why when you try to extract the ::text again it doesn't work.

print(element.css('::text').get())

What would have worked would just calling get on the element itself.

print(element.get())

Try this:

import scrapy
from scrapy.crawler import CrawlerProcess

class TestSpiderSpider(scrapy.Spider):
    name = 'test_spider'
    allowed_domains = ['dvlaregistrations.direct.gov.uk']
    start_urls = ["https://dvlaregistrations.dvla.gov.uk/search/results.html?search=CO11CTD&action=index&pricefrom=0&priceto=&prefixmatches=&currentmatches=&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto="]

    def parse(self, response):
        for row in response.css('div.resultsstrip'):
            plate = row.css('a::text').get()
            price = row.css('p::text').get()
            yield {"plate": plate.strip(), "price": price.strip()}


process = CrawlerProcess()
process.crawl(TestSpiderSpider)
process.start()

output:

&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto=>
{'plate': 'CO02 CTO', 'price': '£250'}
2023-01-14 15:22:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://dvlaregistrations.dvla.gov.uk/search/results.html?search=CO11CTD&action=index&pricefrom=0&priceto=&prefixmatches=&currentmatches=
&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto=>
{'plate': 'CO03 CTO', 'price': '£250'}
2023-01-14 15:22:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://dvlaregistrations.dvla.gov.uk/search/results.html?search=CO11CTD&action=index&pricefrom=0&priceto=&prefixmatches=&currentmatches=
&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto=>
{'plate': 'CO04 CTO', 'price': '£250'}
2023-01-14 15:22:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://dvlaregistrations.dvla.gov.uk/search/results.html?search=CO11CTD&action=index&pricefrom=0&priceto=&prefixmatches=&currentmatches=
&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto=>
{'plate': 'CO05 CTO', 'price': '£250'}
2023-01-14 15:22:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://dvlaregistrations.dvla.gov.uk/search/results.html?search=CO11CTD&action=index&pricefrom=0&priceto=&prefixmatches=&currentmatches=
&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto=>
{'plate': 'B14 CCD', 'price': '£399'}
2023-01-14 15:22:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://dvlaregistrations.dvla.gov.uk/search/results.html?search=CO11CTD&action=index&pricefrom=0&priceto=&prefixmatches=&currentmatches=
&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto=>
{'plate': 'B15 CCD', 'price': '£399'}
2023-01-14 15:22:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://dvlaregistrations.dvla.gov.uk/search/results.html?search=CO11CTD&action=index&pricefrom=0&priceto=&prefixmatches=&currentmatches=
&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto=>
{'plate': 'B17 CCD', 'price': '£399'}
2023-01-14 15:22:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://dvlaregistrations.dvla.gov.uk/search/results.html?search=CO11CTD&action=index&pricefrom=0&priceto=&prefixmatches=&currentmatches=
&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto=>
{'plate': 'B18 CCD', 'price': '£399'}

python scrapy different result when running from shell when as script

Question

1 answers

solution1
3 ACCPTED 2023-01-14 23:26:03

python scrapy different result when running from shell when as script

Question

1 answers

solution1 3 ACCPTED 2023-01-14 23:26:03

solution1
3 ACCPTED 2023-01-14 23:26:03