简体   繁体   English

Python 2.7网络抓取-列表索引超出范围

[英]Python 2.7 web-scraping - list index out of range

New to python and web-scraping. python和网络抓取的新手。 I'm trying to scrape http://www.basketball-reference.com/awards/all_league.html for some analysis and only gotten so far. 我正在尝试抓取http://www.basketball-reference.com/awards/all_league.html进行一些分析,但直到现在为止。 With the below code I'm able to scrape only 3 rows and getting 'list index out of range' error while assigning the year. 使用下面的代码,在分配年份时,我只能刮取3行并得到“列表索引超出范围”错误。 Any help/tips appreciated. 任何帮助/提示表示赞赏。

r = requests.get('http://www.basketball-reference.com/awards/all_league.html')
soup=BeautifulSoup(r.text.replace(' ','').replace('>','').encode('ascii','ignore'),"html.parser")
all_league_data = pd.DataFrame(columns = ['year','team','player']) 


stw_list = soup.findAll('div', attrs={'class': 'stw'}) # Find all 'stw's'
for stw in stw_list:
    table = stw.find('table', attrs = {'class':'no_highlight stats_table'})
    for row in table.findAll('tr'):
        col = row.findAll('td')
        year = col[0].find(text=True)
        print year

Some of the rows don't have td , so you try to get element 0 of empty list. 有些行没有td ,因此您尝试获取空列表的元素0。

Do: 做:

r = requests.get('http://www.basketball-reference.com/awards/all_league.html')
soup=BeautifulSoup(r.text.replace(' ','').replace('>','').encode('ascii','ignore'),"html.parser")
all_league_data = pd.DataFrame(columns = ['year','team','player']) 

stw_list = soup.findAll('div', attrs={'class': 'stw'}) # Find all 'stw's'
for stw in stw_list:
    table = stw.find('table', attrs = {'class':'no_highlight stats_table'})
    for row in table.findAll('tr'):
        col = row.findAll('td')
        if col:
            year = col[0].find(text=True)
            print year

It is because of the grey line that is a tr and is empty. 由于灰线是tr,因此为空。 Do a check if col 检查是否上校

col = row.findAll('td')
    if col:
        year = col[0].find(text=True)
        print year

and gives results correct 并给出正确的结果

2014-15
2014-15
2014-15
2013-14
2013-14
2013-14
2012-13
2012-13
etc

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM