简体   繁体   English

python scrapy 作为脚本从 shell 运行时的不同结果

[英]python scrapy different result when running from shell when as script

在此处输入图像描述 I have my script when I run script my print statement gives 'None' value.当我运行脚本时,我有我的脚本,我的打印语句给出了“无”值。 But when the same thing is run from scrapy shell I can get result what i want;但是当同样的事情从 scrapy shell 运行时,我可以得到我想要的结果; What can be reason for such different results;造成这种不同结果的原因是什么? Code is below代码如下

import scrapy
from scrapy.crawler import CrawlerProcess

class TestSpiderSpider(scrapy.Spider):
    name = 'test_spider'
    allowed_domains = ['dvlaregistrations.direct.gov.uk']
    start_urls = ['https://dvlaregistrations.dvla.gov.uk/search/results.html?search=CO11CTD"&"action=index"&"pricefrom=0"&"priceto="&"prefixmatches="&"currentmatches="&"limitprefix="&"limitcurrent="&"limitauction="&"searched=true"&"openoption="&"language=en"&"prefix2=Search"&"super="&"super_pricefrom="&"super_priceto='
]

    def parse(self, response):
        price=response.css('div.resultsstrip p::text').get()
        print(price)
        print('---+---')
        all_prices = response.css('div.resultsstrip p::text')
        for element in all_prices:
            yield print(element.css('::text').get())

[url][1]=''

process = CrawlerProcess()
process.crawl(TestSpiderSpider)
process.start()

this script when run gives None value, but when response.css('div.resultsstrip p::text').get() '£250' shell gives value what is located此脚本在运行时给出 None 值,但是当response.css('div.resultsstrip p::text').get() '£250' shell 给出值时

You should edit your original post with the new url in your comment because the one in your question doesn't point to the same address.您应该在您的评论中使用新的 url 编辑您的原始帖子,因为您问题中的那个不指向相同的地址。

Also you are trying to extract the text from a selector that points to the text only, that is why it is returning None .您还试图从仅指向文本的选择器中提取文本,这就是它返回None的原因。

In the following line your selector list already targets the ::text of the element.在下一行中,您的选择器列表已经以元素的::text为目标。

all_prices = response.css('div.resultsstrip p::text')

Which is why when you try to extract the ::text again it doesn't work.这就是为什么当您尝试再次提取::text它不起作用的原因。

print(element.css('::text').get())

What would have worked would just calling get on the element itself.什么会起作用只是调用get element本身。

print(element.get())

Try this:尝试这个:

import scrapy
from scrapy.crawler import CrawlerProcess

class TestSpiderSpider(scrapy.Spider):
    name = 'test_spider'
    allowed_domains = ['dvlaregistrations.direct.gov.uk']
    start_urls = ["https://dvlaregistrations.dvla.gov.uk/search/results.html?search=CO11CTD&action=index&pricefrom=0&priceto=&prefixmatches=&currentmatches=&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto="]

    def parse(self, response):
        for row in response.css('div.resultsstrip'):
            plate = row.css('a::text').get()
            price = row.css('p::text').get()
            yield {"plate": plate.strip(), "price": price.strip()}


process = CrawlerProcess()
process.crawl(TestSpiderSpider)
process.start()

output: output:

&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto=>
{'plate': 'CO02 CTO', 'price': '£250'}
2023-01-14 15:22:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://dvlaregistrations.dvla.gov.uk/search/results.html?search=CO11CTD&action=index&pricefrom=0&priceto=&prefixmatches=&currentmatches=
&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto=>
{'plate': 'CO03 CTO', 'price': '£250'}
2023-01-14 15:22:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://dvlaregistrations.dvla.gov.uk/search/results.html?search=CO11CTD&action=index&pricefrom=0&priceto=&prefixmatches=&currentmatches=
&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto=>
{'plate': 'CO04 CTO', 'price': '£250'}
2023-01-14 15:22:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://dvlaregistrations.dvla.gov.uk/search/results.html?search=CO11CTD&action=index&pricefrom=0&priceto=&prefixmatches=&currentmatches=
&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto=>
{'plate': 'CO05 CTO', 'price': '£250'}
2023-01-14 15:22:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://dvlaregistrations.dvla.gov.uk/search/results.html?search=CO11CTD&action=index&pricefrom=0&priceto=&prefixmatches=&currentmatches=
&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto=>
{'plate': 'B14 CCD', 'price': '£399'}
2023-01-14 15:22:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://dvlaregistrations.dvla.gov.uk/search/results.html?search=CO11CTD&action=index&pricefrom=0&priceto=&prefixmatches=&currentmatches=
&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto=>
{'plate': 'B15 CCD', 'price': '£399'}
2023-01-14 15:22:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://dvlaregistrations.dvla.gov.uk/search/results.html?search=CO11CTD&action=index&pricefrom=0&priceto=&prefixmatches=&currentmatches=
&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto=>
{'plate': 'B17 CCD', 'price': '£399'}
2023-01-14 15:22:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://dvlaregistrations.dvla.gov.uk/search/results.html?search=CO11CTD&action=index&pricefrom=0&priceto=&prefixmatches=&currentmatches=
&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto=>
{'plate': 'B18 CCD', 'price': '£399'}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM