[英]Scrapy Only Returning First Result in Loop
I have a loop (as shown below) that executes twice (indexes 1->3), but Scrapy only returns the first trackname in both results. 我有一个循环(如下所示)执行两次(索引1-> 3),但是Scrapy在两个结果中仅返回第一个音轨名称。 But the print item
line shows different values for str_selector
so I know the loop works but Scrapy isn't seeing the changing value of x
. 但是print item
行为str_selector
显示了不同的值,所以我知道循环有效,但是Scrapy没有看到x
的变化值。
Any idea what mistake I have made? 知道我犯了什么错误吗?
items = []
item = scrapyItem()
for x in range (1,3):
str_selector = '//tr[@name="tracks-grid-browse_track_{0}"]/td[contains(@class,"secondColumn")]/a/text()'.format(x)
item['trackname'] = hxs.select(str_selector).extract()
print item
items.append(item)
return items
It's just that you should build a new item for each iteration, instead of keeping the same: you add in items
the same object, which is mutable (as for all user-defined classes by default in python) and so when you update item['trackname']
, all items contained are updated ! 只是您应该为每次迭代构建一个新项目,而不要保持不变:您将相同的对象添加到items
该对象是可变的 (对于python中默认情况下所有用户定义的类),因此在更新item['trackname']
,其中包含的所有项目均已更新!
Here is some code to illustrate: 这是一些代码说明:
>>> class C(object):
# Basic user-defined class
def __init__(self):
self.test = None
>>> c = C()
>>> items = []
>>> for x in range (1,3):
c.test = x
print c, c.test
items.append(c)
<__main__.C object at 0x01CEB130> 1
<__main__.C object at 0x01CEB130> 2
>>> items # All objects contained are the same !!!
[<__main__.C object at 0x01CEB130>, <__main__.C object at 0x01CEB130>]
>>> for c in items:
print c.test
2
2
Now create a new object each time: 现在每次创建一个新对象:
>>> items = []
>>> for x in range (1,3):
c = C()
c.test = x
print c, c.test
items.append(c)
<__main__.C object at 0x01CEB110> 1
<__main__.C object at 0x011F2270> 2
Objects are now different ! 对象现在不同了!
>>> for c in items:
print c.test
1
2
what actually you are doing right now is creating an item object and changing its value in loop, you need to create item in loop. 您现在实际要做的是创建一个item对象并循环更改其值,您需要创建一个item循环。
items = []
#item = scrapyItem()
for x in range (1,3):
item = scrapyItem()
str_selector = '//tr[@name="tracks-grid-browse_track_{0}"]/td[contains(@class,"secondColumn")]/a/text()'.format(x)
item['trackname'] = hxs.select(str_selector).extract()
print item
items.append(item)
return items
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.