[英]While using scrapy its ignoring blank values while extracting values
from scrapy.spider import BaseSpider
from scrapy.selector import Selector
from sample3.items import taamaaItem
class taamaaSpider(BaseSpider):
name = "taamaa"
allowed_domains = ["taamaa.com"]
start_urls = [
"http://www.taamaa.com/store-directory/"]
def parse(self, response):
sel = Selector(response)
sites = sel.xpath('//div/div[@class="section clearfix col-md-12"]')
items = []
list1 = []
list2 = []
for site in sites:
list1 = sites[0].xpath('//div[@class="pull-left col-md-3 merchant"]/div[@class="name"]/a/text()').extract()
list2 = sites[0].xpath('//div[@class="pull-left col-md-3 merchant"]/div[@class="url"]/a/text()').extract()
for index in range(len(list2)):
td = taamaaItem()
td['name'] = list1[index]
td['link'] = list2[index]
items.append(td)
return items
While extracting data it leaves the blank value and fetches the next value of link, thus incorrecting my data alignment. 提取数据时,它保留空白值并获取链接的下一个值,从而使我的数据对齐方式不正确。
Example if A = a , B = , C = c, D = d, E = e 如果A = a,B =,C = c,D = d,E = e
it fetches the output A = a , B = c , C = d , D = e , E = a 它获取输出A = a,B = c,C = d,D = e,E = a
and I want the output to be like this 我希望输出像这样
A = a , B = , C = c, D = d, E = e A = a,B =,C = c,D = d,E = e
how can I achieve this. 我怎样才能做到这一点。
I see 2 strange things: 我看到2件奇怪的事情:
sites[0]
in your loop for each iteration 并在每次迭代中将它们应用于循环中的sites[0]
For your problem grouping 2 lists with some empty text elements, you can use the same structure with a loop on sites
but extracting name
and link
in each iteration, so you don't need intermediate lists 对于将两个带有一些空文本元素的列表分组的问题,您可以在sites
上使用具有循环的相同结构,但是在每次迭代中都提取name
和link
,因此不需要中间列表
from scrapy.spider import BaseSpider
from scrapy.selector import Selector
from sample3.items import taamaaItem
class taamaaSpider(BaseSpider):
name = "taamaa"
allowed_domains = ["taamaa.com"]
start_urls = [
"http://www.taamaa.com/store-directory/"]
def parse(self, response):
sel = Selector(response)
sites = sel.xpath('//div/div[@class="section clearfix col-md-12"]')
items = []
for site in sites:
td = taamaaItem()
td['name'] = site.xpath("""
.//div[@class="pull-left col-md-3 merchant"]
/div[@class="name"]/a/text()""").extract()
td['link'] = site.xpath("""
.//div[@class="pull-left col-md-3 merchant"]
/div[@class="url"]/a/text()""").extract()
items.append(td)
return items
See how I use relative XPath expression ( .//div......
) 看看我如何使用相对XPath表达式( .//div......
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.