為什么會出現“列表索引超出范圍”錯誤？

Question

所以我有一個文件列表，我想通讀並打印出這些信息。 它不斷給我錯誤list index out of range 。 不知道出了什么問題。 對於第 2 行，如果我添加matches[:10]它可以用於前 10 個文件。 但我需要它來處理所有文件。 檢查了一些舊帖子，但仍然無法使我的代碼工作。

當我re.findall編寫這段代碼時， re.findall以前工作過。 不確定它不再起作用了。 謝謝。

import re, os
topdir = r'E:\Grad\LIS\LIS590 Text mining\Part1\Part1' # Topdir has to be an object rather than a string, which means that there is no paranthesis.
matches = []
for root, dirnames, filenames in os.walk(topdir):
    for filename in filenames:
        if filename.endswith(('.txt','.pdf')):
            matches.append(os.path.join(root, filename))

capturedorgs = []
capturedfiles = []
capturedabstracts = []
orgAwards={}
for filepath in matches:
with open (filepath,'rt') as mytext:
    mytext=mytext.read()

    matchOrg=re.findall(r'NSF\s+Org\s+\:\s+(\w+)',mytext)[0]
            capturedorgs.append(matchOrg)

    # code to capture files
    matchFile=re.findall(r'File\s+\:\s+(\w\d{7})',mytext)[0]
    capturedfiles.append(matchFile)

    # code to capture abstracts
    matchAbs=re.findall(r'Abstract\s+\:\s+(\w.+)',mytext)[0]
    capturedabstracts.append(matchAbs)

    # total awarded money
    matchAmt=re.findall(r'Total\s+Amt\.\s+\:\s+\$(\d+)',mytext)[0]

    if matchOrg not in orgAwards:
        orgAwards[matchOrg]=[]
    orgAwards[matchOrg].append(int(matchAmt))

for each in capturedorgs:
    print(each,"\n")
for each in capturedfiles:
    print(each,"\n")
for each in capturedabstracts:
    print (each,"\n")

# add code to print what is in your other two lists
from collections import Counter
countOrg=Counter(capturedorgs)
print (countOrg)

for each in orgAwards:
print(each,sum(orgAwards[each]))

錯誤信息：

Traceback (most recent call last):
  File "C:\Python32\Assignment1.py", line 17, in <module>
    matchOrg=re.findall(r'NSF\s+Org\s+\:\s+(\w+)',mytext)[0]
IndexError: list index out of range

Answer 1

如果findall沒有找到匹配項，它將返回一個空列表[] ； 當您嘗試從此空列表中獲取第一項時發生錯誤，導致異常：

>>> import re
>>> i = 'hello'
>>> re.findall('abc', i)
[]
>>> re.findall('abc', i)[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

為了確保您的代碼在找不到匹配項時不會停止，您需要捕獲引發的異常：

try:
    matchOrg=re.findall(r'NSF\s+Org\s+\:\s+(\w+)',mytext)[0]
    capturedorgs.append(matchOrg)
except IndexError:
    print('No organization match for {}'.format(filepath))

您必須為每個re.findall語句執行此re.findall 。

Answer 2

問題在這里：

matchOrg=re.findall(r'NSF\s+Org\s+\:\s+(\w+)',mytext)[0]

顯然，您有一個文件中根本沒有這個文件。 因此，當您尊重 item [0] ，它就不存在了。

您將需要處理這種情況。

一種方法是如果沒有找到它就根本不包括它：

for filepath in matches:
    with open (filepath,'rt') as mytext:
        mytext=mytext.read()

        matchOrg=re.findall(r'NSF\s+Org\s+\:\s+(\w+)',mytext)
        if len(matchOrg) > 0:
            capturedorgs.append(matchOrg[0])

此外，如果文件中可能有多個，並且您想捕獲所有這些，您可能需要使用extend(matchOrg) 。

為什么會出現“列表索引超出范圍”錯誤？

問題描述

2 個解決方案

解決方案1
4 2014-03-07 13:57:10

解決方案2
0 2014-03-07 13:56:53

為什么會出現“列表索引超出范圍”錯誤？

問題描述

2 個解決方案

解決方案1 4 2014-03-07 13:57:10

解決方案2 0 2014-03-07 13:56:53

解決方案1
4 2014-03-07 13:57:10

解決方案2
0 2014-03-07 13:56:53