I am creating a spider in Scrapy. And I want to scrape table in this way:
<tr>
<th>
as key and <td>
as contentThe code I came up with is this.
def parse(self, response):
item = {}
item['code'] = response.xpath('//meta[@itemprop="sku"]/@content').extract_first()
tables = response.css('.technical-specs')
for table in tables:
specs = tables.xpath('tbody/tr')
for s in specs:
key = s.xpath('th/text()').extract_first().replace(" ", "_").replace("(", "_").replace(")", "_").replace("/", "").lower()
value = s.xpath('td/text()').extract_first()
item[key] = value
return item
But it is not working. Is this posible to achieve?
You need to create a dict instance and then add the items inside the loop. Eg:
my_dict = dict() # Can be {} to
for item in items:
key = item.key
value = item.value
my_dict[key] = value
Regards
The, now working, code of parse function is updated in my question details. Problem was not in the way loop or dictionary was implemented, but in how I extracted data. I was using .extract() which makes response unicode and "unscrapable". Removing.extract was the fix.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.