I'm trying to parse a web table and export certain data into a csv file.
I'm ignorant to forming two XPaths followed by a single for-statement (or maybe two is correct?).
Current Spider:
class MySpider(BaseSpider):
symbols = ["SCMP"]
name = "dozen"
allowed_domains = ["yahoo.com"]
start_urls = ["http://finance.yahoo.com/q/is?s=SCMP&annual"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
revenue = response.xpath('//td[@align="right"]/strong/text()')
date = response.xpath('//tr[@class="yfnc_modtitle1"]/th/text()')
items = []
for rev in revenue:
item = DozenItem()
item["Revenue"] = rev.re('\d*,\d*')
items.append(item)
return items[:3]
days = []
for day in dates:
item = DozenItem()
item["Date"] = day.re('\d*')
days.append(item)
return items[:3]
I know this needs work, I'm just not sure which direction to go...?
This is the output:
As is visible, I can't get the dates to fill in.
Here is the html I'm parsing the dates from:
<TR class="yfnc_modtitle1" style="border-top:none;">
<th scope="col" style="border-top:2px solid #000;text-align:right; font-weight:bold">Dec 31, 2014</th>
<th scope="col" style="border-top:2px solid #000;text-align:right; font-weight:bold">Dec 31, 2013</th>
<th scope="col" style="border-top:2px solid #000;text-align:right; font-weight:bold">Dec 31, 2012</th>
</TR>
<tr>
<td colspan="2">
for rev, day in zip(revenue, dates):
pass # code here
class MySpider(BaseSpider):
symbols = ["SCMP"]
name = "dozen"
allowed_domains = ["yahoo.com"]
start_urls = ["http://finance.yahoo.com/q/is?s=SCMP&annual"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
revenue = response.xpath('//td[@align="right"]/strong/text()')
date = response.xpath('//tr[@class="yfnc_modtitle1"]/th/text()')
items = []
for rev, day in zip(revenue, dates):
item = DozenItem()
item["Revenue"] = rev.re('\d*,\d*')
item["Date"] = day.re('\d*')
items.append(item)
return items[:3]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.