I need help cleaning Python Scrapy output. I have the following simple spider which fetches the content of an element.
class ScrapyscrapSpider (BaseSpider) :
name = "ss"
allowed_domains = ["purecss.io"]
start_urls = ['http://purecss.io/tables/']
def parse(self, response) :
sel = Selector (response)
item = ScrapscrapyItem ()
item['Heading'] = str (sel.xpath('/html/body/div[2]/div/div[1]/div/div[1]/h1').extract ())
item['Content'] = str (sel.xpath ('//table[@class = "pure-table"]//tr[1]/td[2]').extract ())
item['Source_Website'] = "http://purecss.io"
return item
Command :
scrapy crawl ss -o data.csv -t csv
Output :
Content,Heading,Source_Website
"[u'<td>Honda</td>', u'<td>Honda</td>']",,
I just want "Honda" to be printed to the csv file and everything else deleted.
extract ()[1] still gives me "[u'Honda', u'Honda']",,
you can make xpath as follow
item['Heading'] = str (sel.xpath('/html/body/div[2]/div/div[1]/div/div[1]/h1/text()').extract ())
item['Content'] = str (sel.xpath ('//table[@class = "pure-table"]//tr[1]/td[2]/text()').extract ())
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.