简体   繁体   中英

Python: If statements and Scrapy XPath selector

I'm trying to select the values contained in the last column of a table at: https://ca.finance.yahoo.com/q/hp?s=bmo.TO&a=02&b=2&c=2005&d=02&e=2&f=2015&g=m

Usually, this would be quite simple. Something like:


However, the nth element constantly changes due to these dividend rows yahoo chooses to throw in there. However, I noticed that for every row that I want to select the data from, the first td contains:

Feb 2, 2015

instead of:


Therefore, I am trying to build a code that follows the logic where if the first cell of the table contains ANY letters, select the last column and append it to a list. The code I have is below:

returns = []
trows = response.xpath('//table//table//tr')
for tr in trows:
      # don't know why I need to use "2" in the following line, but that's what gives me the first value.
    check = response.xpath('//td[2]/text()').extract()
    if any(c.isalpha() for c in check) == True:
        these = tr[6]

This contains all sorts of problems though as I am sure you can imagine. It gives me the value of the 1st td repeated as many times as there is a tr in the table. When the end result that I need is the last td .

Very grateful for any help received! I'm trying to do this for a finance class project to learn python instead of inputing the values manually.


I would check whether the date matches the %b %d, %Y format with the help strptime() and exception handling. In other words, follow the EAFP principle .

Demo from the Scrapy Shell :

In [1]: from datetime import datetime
In [2]: rows = response.xpath('//table[@class="yfnc_datamodoutline1"]//table/tr')[1:]
In [3]: for row in rows:
            cells = row.xpath('.//td/text()').extract()
                datetime.strptime(cells[0], "%b %d, %Y")
                print cells[-1]
            except ValueError:

I've also improved the XPath expressions to focus more on the desired table data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM