簡體   English   中英

IndexError:字符串索引超出范圍[python,抓取]

[英]IndexError: string index out of range [python, scraping]

我想抓取一個網站,但只想將特定的行寫入最終的csv文件。 當我嘗試指定行時

IndexError: string index out of range.

運行此代碼時,我沒有收到此錯誤:

rows = [
["The Conservation Fund",2014,"","","","Program Services: ","$174,530,077"],
["The Conservation Fund",2014,"","","","Administration: ","$2,810,944"],
["The Conservation Fund",2014,"","","","Fundraising: ","$2,144,456"],
["The Conservation Fund",2013,"$480,674","$55,266","$0","LAWRENCE A SELZER","PRESIDENT & CEO"],
["The Conservation Fund",2013,"$369,848","$54,856","$0","RICHARD L ERDMANN","EXECUTIVE VICE PRESIDENT"],
["The Conservation Fund",2013,"$312,232","$44,386","$0","DAVID K PHILLIPS JR","EXECUTIVE VP AND CFO"],
["The Conservation Fund",2013,"$251,615","$16,125","$0","DEAN H CANNON","SENIOR VP/GENERAL COUNSEL"]]

rows1 = [x for x in rows if x[6][0] != '$']
print(rows1)

我得到的正是我所期望的:

[[“保護基金”,2013年,“ 480,674美元”,“ 55,266美元”,“ 0美元”,“ LASELENCE A SELZER”,“ PRESIDENT&CEO”],[“保護基金”,2013年,“ 369,848美元”,“ 54,85​​6美元','$ 0','RICHARD L ERDMANN','EXECUTIVE VICE PRESIDENT'],['The Conservation Fund',2013年,'$ 312,232','$ 44,386','$ 0','DAVID K PHILLIPS JR','EXECUTIVE VP AND CFO'],[“保護基金”,2013,“ $ 251,615”,“ $ 16,125”,“ $ 0”,“ DEAN H CANNON”,“ SENIOR VP / GENERAL COUNSEL']]

現在,當我嘗試從我的抓取器運行類似的列表理解時(我將在此處粘貼一些代碼,因為我合法地無法發布整個內容):

for page in eins:
    rows =[]
    driver.get(page)
    print("Getting {}".format(page))
    soup = BeautifulSoup(driver.page_source, "lxml")
    name = soup.find("h1", {"class" : "centered"})
    print(name.text)
    members = soup.findAll("g", { "transform" : "translate(0,0)"})
    time = soup.find("option", {"selected" : "selected"}).text
    time = int(time)
    for year in members[2:]:
        column = year.find_all("g")
        for thing in column:
            row_info = [name.text, time]
            entries = thing.find_all("text")
            if len(entries) != 5:
                row_info.extend((5 - len(entries)) * [""])
            for entry in entries:
                    row_info.append(entry.text)
            rows.append(row_info)
        time = time - 1
        rows1 = [x for x in rows if x[6][0] != "$"]

現在突然我得到以下錯誤代碼

Traceback (most recent call last):
  File "Board_members.py", line 53, in <module>
    rows1 = [x for x in rows if x[6][0] != "$"]
  File "Board_members.py", line 53, in <listcomp>
    rows1 = [x for x in rows if x[6][0] != "$"]
IndexError: string index out of range

兩種情況下行列表的格式都不相同嗎? 我在這里做錯了。 我嘗試了一個帶有continue函數的for循環,該循環更早,更簡單的if語句,但是一切都歸結為相同的錯誤。

我仍然是一個初學者,所以請原諒我的脆弱代碼。 我在這里四處尋找問題的答案,但是如果他們在那里,我就是無法理解。 非常感謝!

編輯:僅出於上下文考慮,第一個實例中的行來自於我設法使用抓取器創建的csv文件,並且在csv中看起來像這樣。

organization,year,compensation,other,related,name,position
The Conservation Fund,2015,,,,Total Revenue: ,"$215,096,466"
The Conservation Fund,2015,,,,Contributions: ,"$114,351,967"
The Conservation Fund,2015,,,,Gov't Grants: ,"$9,723,802"
The Conservation Fund,2015,,,,Program Services: ,"$90,762,036"
The Conservation Fund,2015,,,,Investments: ,"$220,002"
The Conservation Fund,2015,,,,Special Events: ,$0
The Conservation Fund,2015,,,,Sales: ,$0
The Conservation Fund,2015,,,,Other: ,"$38,659"
The Conservation Fund,2014,,,,Total Expenses: ,"$179,485,477"
The Conservation Fund,2014,,,,Program Services: ,"$174,530,077"
The Conservation Fund,2014,,,,Administration: ,"$2,810,944"
The Conservation Fund,2014,,,,Fundraising: ,"$2,144,456"
The Conservation Fund,2013,"$480,674","$55,266",$0,LAWRENCE A SELZER,PRESIDENT & CEO
The Conservation Fund,2013,"$369,848","$54,856",$0,RICHARD L ERDMANN,EXECUTIVE VICE PRESIDENT
The Conservation Fund,2013,"$312,232","$44,386",$0,DAVID K PHILLIPS JR,EXECUTIVE VP AND CFO

編輯2:這是我從在rows1之前打印行得到的輸出:

[['The Conservation Fund', 2015, '', '', '', 'Total Revenue: ', '$215,096,466'], ['The Conservation Fund', 2015, '', '', '', 'Contributions: ', '$114,351,967'], ['The Conservation Fund', 2015, '', '', '', "Gov't Grants: ", '$9,723,802'], ['The Conservation Fund', 2015, '', '', '', 'Program Services: ', '$90,762,036'], ['The Conservation Fund', 2015, '', '', '', 'Investments: ', '$220,002'], ['The Conservation Fund', 2015, '', '', '', 'Special Events: ', '$0'], ['The Conservation Fund', 2015, '', '', '', 'Sales: ', '$0'], ['The Conservation Fund', 2015, '', '', '', 'Other: ', '$38,659'], ['The Conservation Fund', 2014, '', '', '', 'Total Expenses: ', '$179,485,477'], ['The Conservation Fund', 2014, '', '', '', 'Program Services: ', '$174,530,077']]

您得到的錯誤是

IndexError:字符串索引超出范圍

這意味着您正在嘗試獲取不存在的字符串索引。

請參見下面的示例,以查看什么可能導致IndexError: string index out of range

test = 'abc'
test[2] # Output : c
test[3] # Output :  IndexError: string index out of range

test1 = ''
test1[0] # Output :  IndexError: string index out of range
test1[1] # Output :  IndexError: string index out of range

在您的情況下,在橫排中rows1 = [x for x in rows if x[6][0] != "$"]rows1 = [x for x in rows if x[6][0] != "$"] x[6]沒有值或空字符串; 在語句x[6][0] -您嘗試獲取空字符串的0索引。

請使用以下代碼來修復錯誤,因為以下代碼將首先檢查x空值,然后檢查x[6]

rows1 = [x for x in rows if x and x[6] and x[6][0] != "$"]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM