简体   繁体   English

Python Scrapy意外缩进错误

[英]Python Scrapy unexpected indent error

We're trying to crawl items such as 'product', 'price', etc. but we keep getting a indentation error. 我们正在尝试抓取“产品”,“价格”等项目,但始终会出现缩进错误。

The code we're using (crawlproduct.py): 我们正在使用的代码(crawlproduct.py):

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from productcrawl.items import ProductCrawlItem

class MySpider(BaseSpider):
name = "crawlproduct"
allowed_domains = ["yorcom.nl"]
f = open("items.txt")
start_urls = [url.strip() for url in f.readlines()]
f.close()


def parse(self, response):
hxs = HtmlXPathSelector(response)
events = hxs.select("//div[@class='productOverview']")
items = []
for event in events:
item = ProductCrawlItem()
item ["product"] = events.select("table/tbody/tr/td[@class='productTitle']/a/text()").extract()
item ["price"] = events.select("table/tbody/tr/td[@class='productPrice']/a/text()").extract()
item ["stock"] = events.select("table/tbody/tr/td[@class='productStock   voorraad']/a/text()").extract()
item ["link"] = events.select("table/tbody/tr/td[@class='productTitle']/a").extract()
yield item

and items.py: 和items.py:

from scrapy.item import Item, Field

    class ProductCrawlItem(Item):
        product = Field()
        price = Field()
        stock = Field()
        link = Field()

When we only use 1 field, it does work... Does anyone know the problem? 当我们仅使用1个字段时,它确实起作用...有人知道问题吗?

Thanks in advance, 提前致谢,

Dean 院长

With the following indentation, this is probably what you intended: 使用以下缩进,这可能就是您想要的:

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from productcrawl.items import ProductCrawlItem

class MySpider(BaseSpider):
    name = "crawlproduct"
    allowed_domains = ["yorcom.nl"]
    f = open("items.txt")
    start_urls = [url.strip() for url in f.readlines()] 
    f.close()


def parse(self, response):
    hxs = HtmlXPathSelector(response)
    events = hxs.select("//div[@class='productOverview']")
    items = []
    for event in events:
        item = ProductCrawlItem()
        item ["product"] = events.select("table/tbody/tr/td[@class='productTitle']/a/text()").extract()
        item ["price"] = events.select("table/tbody/tr/td[@class='productPrice']/a/text()").extract()
        item ["stock"] = events.select("table/tbody/tr/td[@class='productStock   voorraad']/a/text()").extract()
        item ["link"] = events.select("table/tbody/tr/td[@class='productTitle']/a").extract()
        yield item

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM