简体   繁体   中英

Python: If statements and Scrapy XPath selector

I'm trying to select the values contained in the last column of a table at: https://ca.finance.yahoo.com/q/hp?s=bmo.TO&a=02&b=2&c=2005&d=02&e=2&f=2015&g=m

Usually, this would be quite simple. Something like:

response.xpath('//table//table//tr[::6]/text()').extract()

However, the nth element constantly changes due to these dividend rows yahoo chooses to throw in there. However, I noticed that for every row that I want to select the data from, the first td contains:

Feb 2, 2015

instead of:

2015-01-29

Therefore, I am trying to build a code that follows the logic where if the first cell of the table contains ANY letters, select the last column and append it to a list. The code I have is below:

returns = []
trows = response.xpath('//table//table//tr')
for tr in trows:
      # don't know why I need to use "2" in the following line, but that's what gives me the first value.
    check = response.xpath('//td[2]/text()').extract()
    if any(c.isalpha() for c in check) == True:
        these = tr[6]
        returns.append(these)

This contains all sorts of problems though as I am sure you can imagine. It gives me the value of the 1st td repeated as many times as there is a tr in the table. When the end result that I need is the last td .

Very grateful for any help received! I'm trying to do this for a finance class project to learn python instead of inputing the values manually.

Cheers!

I would check whether the date matches the %b %d, %Y format with the help strptime() and exception handling. In other words, follow the EAFP principle .

Demo from the Scrapy Shell :

In [1]: from datetime import datetime
In [2]: rows = response.xpath('//table[@class="yfnc_datamodoutline1"]//table/tr')[1:]
In [3]: for row in rows:
            cells = row.xpath('.//td/text()').extract()
            try:
                datetime.strptime(cells[0], "%b %d, %Y")
                print cells[-1]
            except ValueError:
                continue    
77.15
77.46
72.93
81.33
82.99
80.88
...
44.12
42.46
39.00
42.20

I've also improved the XPath expressions to focus more on the desired table data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM