[英]Scrapy - ValueError: Missing scheme in request url: h
在編寫命令“scrapy crawl weather_spider2 -o output.json”時,出現錯誤 [scrapy.core.engine] 錯誤:獲取啟動請求時出錯
然后 ValueError: Missing scheme in request url: h 我在 stakoverflow 中閱讀了一些文章,並嘗試修復但無濟於事
我的代碼:
import scrapy
import re
from weather_parent.weather_spider.items import WeatherItem
class WeatherSpiderSpider(scrapy.Spider):
name = "weather_spider2"
allowed_domains = 'https://weather.com'
start_urls = ['https://weather.com/en-MT/weather/today/l/bf01d09009561812f3f95abece23d16e123d8c08fd0b8ec7ffc9215c0154913c']
def parse_url(self, response):
city = response.xpath('//h1[contains(@class,"location")].text()').get()
temp = response.xpath('//span[@data-testid="TemperatureValue"]/text()').get()
air_quality = response.xpath('//span[@data-testid="AirQualityCategory"]/text()').get()
cond = response.xpath('//div[@data-testid="wxPhrase"]/text()').get()
item = WeatherItem()
item["city"] = city
item["temp"] = temp
item["air_quality"] = air_quality
item["cond"] = cond
yield item
錯誤
[]
2021-10-25 22:07:39 [scrapy.core.engine] INFO: Spider opened
2021-10-25 22:07:39 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2021-10-25 22:07:39 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2021-10-25 22:07:39 **[scrapy.core.engine] ERROR: Error while obtaining start requests**
Traceback (most recent call last):
File "D:\ca nhan\Anaconda\lib\site-packages\scrapy\core\engine.py", line 129, in _next_request
request = next(slot.start_requests)
File "D:\ca nhan\Anaconda\weather_parent\weather_spider\spiders\crawl_weather.py", line 12, in start_requests
yield scrapy.Request(url = url, callback= self.parse_url)
File "D:\ca nhan\Anaconda\lib\site-packages\scrapy\http\request\__init__.py", line 25, in __init__
self._set_url(url)
File "D:\ca nhan\Anaconda\lib\site-packages\scrapy\http\request\__init__.py", line 73, in _set_url
raise ValueError(f'Missing scheme in request url: {self._url}')
**ValueError: Missing scheme in request url: h**
2021-10-25 22:07:39 [scrapy.core.engine] INFO: Closing spider (finished)
2021-10-25 22:07:39 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'elapsed_time_seconds': 0.015959,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2021, 10, 25, 15, 7, 39, 874679),
'log_count/ERROR': 1,
'log_count/INFO': 10,
'start_time': datetime.datetime(2021, 10, 25, 15, 7, 39, 858720)}
request = next(slot.start_requests)
File "D:\ca nhan\Anaconda\weather_parent\weather_spider\spiders\crawl_weather.py", line 12, in start_requests
yield scrapy.Request(url = url, callback= self.parse_url)
File "D:\ca nhan\Anaconda\lib\site-packages\scrapy\http\request\__init__.py", line 25, in __init__
self._set_url(url)
File "D:\ca nhan\Anaconda\lib\site-packages\scrapy\http\request\__init__.py", line 73, in _set_url
raise ValueError(f'Missing scheme in request url: {self._url}')
ValueError: Missing scheme in request url: h
2021-10-25 22:07:39 [scrapy.core.engine] INFO: Closing spider (finished)
2021-10-25 22:07:39 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'elapsed_time_seconds': 0.015959,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2021, 10, 25, 15, 7, 39, 874679),
'log_count/ERROR': 1,
'log_count/INFO': 10,
'start_time': datetime.datetime(2021, 10, 25, 15, 7, 39, 858720)}
重命名您的 parse_url 函數以解析這是 Scrapy 用於處理下載響應的默認回調,當他們的請求未指定回調時。 城市 xpath 是錯誤的,只需使用/text()
。 在 allowed_domains 中,假設您的目標網址是https://www.example.com/1.html ,然后將“example.com”添加到列表中。 這將是列表。 沒有別的一切都好。
from scrapy.crawler import CrawlerProcess
import scrapy
import re
# from weather_parent.weather_spider.items import WeatherItem
class WeatherSpiderSpider(scrapy.Spider):
name = "weather_spider2"
allowed_domains = 'https://weather.com'
start_urls = ['https://weather.com/en-MT/weather/today/l/bf01d09009561812f3f95abece23d16e123d8c08fd0b8ec7ffc9215c0154913c']
def parse(self, response):
city = response.xpath('//h1[contains(@class,"location")]/text()').get()
temp = response.xpath('//span[@data-testid="TemperatureValue"]/text()').get()
air_quality = response.xpath('//span[@data-testid="AirQualityCategory"]/text()').get()
cond = response.xpath('//div[@data-testid="wxPhrase"]/text()').get()
item = {}
item["city"] = city
item["temp"] = temp
item["air_quality"] = air_quality
item["cond"] = cond
yield item
#
process = CrawlerProcess()
process.crawl(WeatherSpiderSpider)
process.start()
{'city': 'Chennai, Tamil Nadu, India Weather', 'temp': '29�', 'air_quality': 'Unhealthy for Sensitive Groups', 'cond': 'Partly Cloudy'}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.