簡體   English   中英

IndexError:列表索引超出范圍?

[英]IndexError: list index out of range?

嘗試運行此代碼時,我總是收到列表索引超出范圍的錯誤,該代碼通過瀏覽其頁面來解析站點表,並將數據輸入到excel工作表中。

錯誤是在Revenue = cols [0] .string上給出的:

from urllib.request import urlopen
from bs4 import BeautifulSoup
from openpyxl import Workbook
from openpyxl.cell import get_column_letter
import datetime


now = datetime.datetime.now()

wb = Workbook()
dest_filename = r'iOS Top Grossing Data.xlsx'
ws = wb.active
ws = wb.create_sheet()
ws.title = now.strftime("%m-%d-%y")
sh = wb.get_sheet_by_name('Sheet')
wb.remove_sheet(sh)  

ws['A1'] = "REVENUE"
ws.column_dimensions['A'].width = 11
ws.cell('A1').style.alignment.horizontal = 'center'
ws.cell('A1').style.font.bold = True

ws['B1'] = "FREE"
ws.column_dimensions['B'].width = 7
ws.cell('B1').style.alignment.horizontal = 'center'
ws.cell('B1').style.font.bold = True

ws['C1'] = "PAID"
ws.column_dimensions['C'].width = 7
ws.cell('C1').style.alignment.horizontal = 'center'
ws.cell('C1').style.font.bold = True

ws['D1'] = "GAME"
ws.column_dimensions['D'].width = 27
ws.cell('D1').style.alignment.horizontal = 'center'
ws.cell('D1').style.font.bold = True

ws['E1'] = "PRICE"
ws.column_dimensions['E'].width = 7
ws.cell('E1').style.alignment.horizontal = 'center'
ws.cell('E1').style.font.bold = True

ws['F1'] = "REVENUE"
ws.column_dimensions['F'].width = 11
ws.cell('F1').style.alignment.horizontal = 'center'
ws.cell('F1').style.font.bold = True

ws['G1'] = "ARPU INDEX"
ws.column_dimensions['G'].width = 15
ws.cell('G1').style.alignment.horizontal = 'center'
ws.cell('G1').style.font.bold = True

ws['H1'] = "DAILY NEW USERS"
ws.column_dimensions['H'].width = 17
ws.cell('H1').style.alignment.horizontal = 'center'
ws.cell('H1').style.font.bold = True

ws['I1'] = "DAILY ACTIVE USERS"
ws.column_dimensions['I'].width = 19
ws.cell('I1').style.alignment.horizontal = 'center'
ws.cell('I1').style.font.bold = True

ws['J1'] = "ARPU"
ws.column_dimensions['J'].width = 7
ws.cell('J1').style.alignment.horizontal = 'center'
ws.cell('J1').style.font.bold = True

ws['K1'] = "RANK CHANGE"
ws.column_dimensions['K'].width = 14
ws.cell('K1').style.alignment.horizontal = 'center'
ws.cell('K1').style.font.bold = True

page = 0

while page < 6:
        page += 1
        url = "http://thinkgaming.com/app-sales-data/?page=" + str(page) 
        html = str(urlopen(url).read()) 

        soup = BeautifulSoup(html) 
        table = soup.find("table")

        counter = 0

        while counter < 51:      
                        rows = table.findAll('tr')[counter]
                        cols = rows.findAll('td')

                        revenue = cols[0].string
                        revenue = revenue.replace('\\n', '')
                        revenue = revenue.replace(' ', '') 

                        free = cols[1].string
                        free = free.replace('\\n', '')
                        free = free.replace(' ', '') 

                        paid = cols[2].string
                        paid = paid.replace('\\n', '')
                        paid = paid.replace(' ', '') 

                        game = cols[3].string

                        price = cols[4].string
                        price = price.replace('\\n', '')
                        price = price.replace(' ', '') 

                        revenue2 = cols[5].string
                        revenue2 = revenue2.replace('\\n', '')
                        revenue2 = revenue2.replace(' ', '') 

                        dailynewusers = cols[6].string
                        dailynewusers = dailynewusers.replace('\\n', '')
                        dailynewusers = dailynewusers.replace(' ', '') 

                        cell_location = counter
                        cell_location += 1

                        ws['A'+str(cell_location)] = revenue

                        counter += 1

wb.save(filename = dest_filename)             

這是回溯:

Traceback (most recent call last): 
File "C:\Users\shiver_admin\Desktop\script.py", line 89, in <module> revenue = cols[0].string IndexError: list index out of range

與注釋相同,您不會僅僅因為它們不存在而得到任何<td>標記,尤其是對於索引[0] 該表中的第一個<tr>標記是這樣的:

在此處輸入圖片說明

如果您注意到,它里面有標題。 基本上,您應該從1開始而不是從0開始counter

確保您獲得正確行的另一種方法是檢查它們是否具有類。 如果您注意到了,正確的<tr>行中就有類( oddeven )。 您可以使用諸如table.find_all("tr", class_=True)來獲取它們。

示例代碼(注意:使用Python 2.7進行編碼,但是很容易修改以適合Python 3.x):

import requests as rq
from bs4 import BeautifulSoup as bsoup

url = "http://thinkgaming.com/app-sales-data/?page=1"
r = rq.get(url)
soup = bsoup(r.content)

table = soup.find("table", class_="table")

rows = table.find_all("tr", class_=True)
cols = [td.get_text().strip().encode("utf-8") for td in rows[0].find_all("td")]

print cols

結果:

['1', '10', '-', 'Clash of Clans', 'Free', 'n/a', '44,259']
[Finished in 2.8s]

讓我們知道是否有幫助。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM