简体   繁体   中英

Python - error thrown when separating data - list index out of range

Scraping Yahoo table for earnings dates with Bs4. My code works until I try to compart the data into cells. The exact error is:

ticker = cells[1].get_text() IndexError: list index out of range

I thought it was due to the table having an 'a href'...but there's text too.

Ideally the format should look something like:

{'company': '2U Inc', 'ticker': 'TWOU', 'eps_est': '-0.04', 'time': 'after market close'}

How can I achieve something like the above output, what am I missing?

from urlparse import urljoin
from urllib2 import urlopen
import requests
from bs4 import BeautifulSoup
import MySQLdb
import re

#mysql portion
mydb = MySQLdb.connect(host='localhost',
user= '####',
passwd='#####',
db='testdb')
cur = mydb.cursor()

#def store (company, ticker, eps_est, time):
#    cur.execute('INSERT IGNORE INTO EARN (company, ticker, eps_est, time)  VALUES ( \"%s\", \"%s\", \"%s\", \"%s\")',(company, ticker, eps_est, time))
#    cur.connection.commit()

base_url = "https://biz.yahoo.com/research/earncal/today.html"
html = urlopen(base_url)
soup = BeautifulSoup(html.read().decode('utf-8'),"lxml")
table = soup.find_all('table')
rows = table[6].find_all('tr')

for row in rows[2:]:
    cells = row.find_all('td')
    company = cells[0].get_text()
    ticker =  cells[1].get_text()
    eps_est = cells[2].get_text()
    time =    cells[3].get_text()
    #    store(company, ticker, eps_est, time)
data = {
    'company': cells[0].get_text(),
    'ticker': cells[1].get_link('href'),
    'eps_est': cells[2].get_text(),
    'time': cells[3].get_text(),
}
print data
print '\n'

Use the "dot-notation" to find elements inside other elements. Replace:

cells[1].get_link('href')

with:

cells[1].a.get_text()

which should be read as and is equivalent to cells[1].find("a").get_text() .

And, you need to skip the last "empty" row as well:

for row in rows[2:-1]:

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM