[英]Python Scrapy unexpected indent error
We're trying to crawl items such as 'product', 'price', etc. but we keep getting a indentation error. 我们正在尝试抓取“产品”,“价格”等项目,但始终会出现缩进错误。
The code we're using (crawlproduct.py): 我们正在使用的代码(crawlproduct.py):
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from productcrawl.items import ProductCrawlItem
class MySpider(BaseSpider):
name = "crawlproduct"
allowed_domains = ["yorcom.nl"]
f = open("items.txt")
start_urls = [url.strip() for url in f.readlines()]
f.close()
def parse(self, response):
hxs = HtmlXPathSelector(response)
events = hxs.select("//div[@class='productOverview']")
items = []
for event in events:
item = ProductCrawlItem()
item ["product"] = events.select("table/tbody/tr/td[@class='productTitle']/a/text()").extract()
item ["price"] = events.select("table/tbody/tr/td[@class='productPrice']/a/text()").extract()
item ["stock"] = events.select("table/tbody/tr/td[@class='productStock voorraad']/a/text()").extract()
item ["link"] = events.select("table/tbody/tr/td[@class='productTitle']/a").extract()
yield item
and items.py: 和items.py:
from scrapy.item import Item, Field
class ProductCrawlItem(Item):
product = Field()
price = Field()
stock = Field()
link = Field()
When we only use 1 field, it does work... Does anyone know the problem? 当我们仅使用1个字段时,它确实起作用...有人知道问题吗?
Thanks in advance, 提前致谢,
Dean 院长
With the following indentation, this is probably what you intended: 使用以下缩进,这可能就是您想要的:
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from productcrawl.items import ProductCrawlItem
class MySpider(BaseSpider):
name = "crawlproduct"
allowed_domains = ["yorcom.nl"]
f = open("items.txt")
start_urls = [url.strip() for url in f.readlines()]
f.close()
def parse(self, response):
hxs = HtmlXPathSelector(response)
events = hxs.select("//div[@class='productOverview']")
items = []
for event in events:
item = ProductCrawlItem()
item ["product"] = events.select("table/tbody/tr/td[@class='productTitle']/a/text()").extract()
item ["price"] = events.select("table/tbody/tr/td[@class='productPrice']/a/text()").extract()
item ["stock"] = events.select("table/tbody/tr/td[@class='productStock voorraad']/a/text()").extract()
item ["link"] = events.select("table/tbody/tr/td[@class='productTitle']/a").extract()
yield item
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.