[英]Why all my items on scrapy are the same?
I am new on Scrapy programming, and I am stuck in a issue. 我是Scrapy编程的新手,我陷入了一个问题。 There is this web site that I want to create a unique item for each element of the table, but every item is the same, and I don't know why, here is my code:
我有一个网站想要为表的每个元素创建一个唯一的项目,但是每个项目都是相同的,我也不知道为什么,这是我的代码:
from scrapy import Spider
from scrapy.selector import Selector
from petroleo.items import PetroleoItem
class PetroleoSpider(Spider):
name = "petroleo"
site = "http://www.glossary.oilfield.slb.com/"
allowed_domains = [site]
start_urls = [site + 'en/Terms.aspx?filter=sym&LookIn=term%20name&searchtype=starts%20with',]
def parse(self, response):
words = Selector(response).xpath("//table[@id='pagecolumns_0_columncontent_0__rptLetter_ctl00__dlTerms']//td")
for word in words:
item = PetroleoItem()
if word.xpath("//table[@id='pagecolumns_0_columncontent_0__rptLetter_ctl00__dlTerms']//td/a/em").extract():
item['title'] = word.xpath(
"//table[@id='pagecolumns_0_columncontent_0__rptLetter_ctl00__dlTerms']//td/a/em/text()").extract()[0]
item['title'] += word.xpath(
"//table[@id='pagecolumns_0_columncontent_0__rptLetter_ctl00__dlTerms']//td/a/sub/text()").extract()[0]
if word.xpath("//table[@id='pagecolumns_0_columncontent_0__rptLetter_ctl00__dlTerms']//td/a/i").extract():
item['title'] = {'en': word.xpath(
"//table[@id='pagecolumns_0_columncontent_0__rptLetter_ctl00__dlTerms']//td/a/i/text()").extract()}
item['title']['en'][0] += word.xpath(
"//table[@id='pagecolumns_0_columncontent_0__rptLetter_ctl00__dlTerms']//td/a/i/sub/text()").extract()[0]
if word.xpath("//table[@id='pagecolumns_0_columncontent_0__rptLetter_ctl00__dlTerms']//td/a/text()").extract():
item['title'] = {'en': word.xpath(
"//table[@id='pagecolumns_0_columncontent_0__rptLetter_ctl00__dlTerms']//td/a/text()").extract()}
yield item
Make your expressions context-specific by prepending a dot and don't repeat the //table[@id='pagecolumns_0_columncontent_0__rptLetter_ctl00__dlTerms']//td
part: 通过在句点前加一个特定于上下文的表达式,不要重复
//table[@id='pagecolumns_0_columncontent_0__rptLetter_ctl00__dlTerms']//td
部分:
words = response.xpath("//table[@id='pagecolumns_0_columncontent_0__rptLetter_ctl00__dlTerms']//td")
for word in words:
item = PetroleoItem()
if word.xpath("./a/em").extract():
item['title'] = word.xpath("./a/em/text()").extract()[0]
item['title'] += word.xpath("./a/sub/text()").extract()[0]
if word.xpath("./a/i").extract():
item['title'] = {'en': word.xpath("./a/i/text()").extract()}
item['title']['en'][0] += word.xpath("./a/i/sub/text()").extract()[0]
if word.xpath("./a/text()").extract():
item['title'] = {'en': word.xpath("./a/text()").extract()}
yield item
I don't particularly like and understand what are you trying to do in the loop, but this should at least solve the problem you've described in the question. 我不是特别喜欢并了解您在循环中要做什么,但这至少应该可以解决您在问题中描述的问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.