简体   繁体   中英

python scrapy different result when running from shell when as script

在此处输入图像描述 I have my script when I run script my print statement gives 'None' value. But when the same thing is run from scrapy shell I can get result what i want; What can be reason for such different results; Code is below

import scrapy
from scrapy.crawler import CrawlerProcess

class TestSpiderSpider(scrapy.Spider):
    name = 'test_spider'
    allowed_domains = ['dvlaregistrations.direct.gov.uk']
    start_urls = ['https://dvlaregistrations.dvla.gov.uk/search/results.html?search=CO11CTD"&"action=index"&"pricefrom=0"&"priceto="&"prefixmatches="&"currentmatches="&"limitprefix="&"limitcurrent="&"limitauction="&"searched=true"&"openoption="&"language=en"&"prefix2=Search"&"super="&"super_pricefrom="&"super_priceto='
]

    def parse(self, response):
        price=response.css('div.resultsstrip p::text').get()
        print(price)
        print('---+---')
        all_prices = response.css('div.resultsstrip p::text')
        for element in all_prices:
            yield print(element.css('::text').get())

[url][1]=''

process = CrawlerProcess()
process.crawl(TestSpiderSpider)
process.start()

this script when run gives None value, but when response.css('div.resultsstrip p::text').get() '£250' shell gives value what is located

You should edit your original post with the new url in your comment because the one in your question doesn't point to the same address.

Also you are trying to extract the text from a selector that points to the text only, that is why it is returning None .

In the following line your selector list already targets the ::text of the element.

all_prices = response.css('div.resultsstrip p::text')

Which is why when you try to extract the ::text again it doesn't work.

print(element.css('::text').get())

What would have worked would just calling get on the element itself.

print(element.get())

Try this:

import scrapy
from scrapy.crawler import CrawlerProcess

class TestSpiderSpider(scrapy.Spider):
    name = 'test_spider'
    allowed_domains = ['dvlaregistrations.direct.gov.uk']
    start_urls = ["https://dvlaregistrations.dvla.gov.uk/search/results.html?search=CO11CTD&action=index&pricefrom=0&priceto=&prefixmatches=&currentmatches=&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto="]

    def parse(self, response):
        for row in response.css('div.resultsstrip'):
            plate = row.css('a::text').get()
            price = row.css('p::text').get()
            yield {"plate": plate.strip(), "price": price.strip()}


process = CrawlerProcess()
process.crawl(TestSpiderSpider)
process.start()

output:

&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto=>
{'plate': 'CO02 CTO', 'price': '£250'}
2023-01-14 15:22:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://dvlaregistrations.dvla.gov.uk/search/results.html?search=CO11CTD&action=index&pricefrom=0&priceto=&prefixmatches=&currentmatches=
&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto=>
{'plate': 'CO03 CTO', 'price': '£250'}
2023-01-14 15:22:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://dvlaregistrations.dvla.gov.uk/search/results.html?search=CO11CTD&action=index&pricefrom=0&priceto=&prefixmatches=&currentmatches=
&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto=>
{'plate': 'CO04 CTO', 'price': '£250'}
2023-01-14 15:22:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://dvlaregistrations.dvla.gov.uk/search/results.html?search=CO11CTD&action=index&pricefrom=0&priceto=&prefixmatches=&currentmatches=
&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto=>
{'plate': 'CO05 CTO', 'price': '£250'}
2023-01-14 15:22:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://dvlaregistrations.dvla.gov.uk/search/results.html?search=CO11CTD&action=index&pricefrom=0&priceto=&prefixmatches=&currentmatches=
&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto=>
{'plate': 'B14 CCD', 'price': '£399'}
2023-01-14 15:22:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://dvlaregistrations.dvla.gov.uk/search/results.html?search=CO11CTD&action=index&pricefrom=0&priceto=&prefixmatches=&currentmatches=
&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto=>
{'plate': 'B15 CCD', 'price': '£399'}
2023-01-14 15:22:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://dvlaregistrations.dvla.gov.uk/search/results.html?search=CO11CTD&action=index&pricefrom=0&priceto=&prefixmatches=&currentmatches=
&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto=>
{'plate': 'B17 CCD', 'price': '£399'}
2023-01-14 15:22:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://dvlaregistrations.dvla.gov.uk/search/results.html?search=CO11CTD&action=index&pricefrom=0&priceto=&prefixmatches=&currentmatches=
&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto=>
{'plate': 'B18 CCD', 'price': '£399'}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM