![](/img/trans.png)
[英]Hello, I am kind of new to scrapy, and I am try to scrape a particular site using scrapy but my scrapy program isn't returning anything
[英]Why isn't scrapy finding anything on my local site?
我有一个在http://service.localhost:8021上运行的本地站点,我正在尝试从站点上抓取图像链接 (src attr)。 当我抓取它时,它似乎确实可以访问它(因为我收到了 200 响应); 但没有返回链接。
我的脚本是:
# -*- coding: utf-8 -*-
import scrapy
from scrapy.spiders import CrawlSpider, Rule
from bs4 import BeautifulSoup
import urllib
class crawlImages(CrawlSpider):
name = 'crawlImages'
allowed_domains = ["service.localhost"]
start_urls = ['http://service.localhost:8021']
def parse(self, response):
titles = response.css('img::attr(alt)').extract()
links = response.css('img::attr(src)').extract()
print('##########')
for item in zip(titles, links):
all_items = {
'title' : BeautifulSoup(item[0]).text,
'link' : item[1]
}
print(item[1])
yield all_items
我像这样运行它:
scrapy runspider crawlImages.py -s USER_AGENT="Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36" -s ROBOTSTXT_OBEY=False
我得到的答复是:已删除,因为不允许在此处发布。
任何提示?
您可以打印“标题”和“链接”以进行调试,并且您的函数缩进不正确。
改变
def parse(self, response):
titles = response.css('img::attr(alt)').extract()
links = response.css('img::attr(src)').extract()
print('##########')
for item in zip(titles, links):
all_items = {
'title' : BeautifulSoup(item[0]).text,
'link' : item[1]
}
print(item[1])
yield all_items
到
def parse(self, response):
titles = response.css('img::attr(alt)').extract()
links = response.css('img::attr(src)').extract()
print('##########')
for item in zip(titles, links):
all_items = {
'title' : BeautifulSoup(item[0]).text,
'link' : item[1]
}
print(item[1])
yield all_items
检查 response.body 与:
print(response.body)
尝试重写循环:
def parse(self, response):
print(response.body)
for img in response.xpath('//img'):
title = img.xpath('./@alt').get()
link = img.xpath('./@src').get()
item = {}
item['title'] = title
item['link'] = link
print(item)
yield item
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.