如何在python中使用for循环搜索文件列表

Question

请参阅下面的代码。 当我在下面运行代码时，我不断收到此错误：

"IndexError: list index out of range"

码：

for x in range(0, numFiles):
    print(fileList[x])

for x in range(0, numFiles):
    f = open(dirName + "/" + fileList[x], 'r')  # open the file for reading
    fileText = f.read()                         # read file contents into string
    f.close()                                   # close file
    if fileText.find(tagName) == -1:            # if the file text doesn't contain the tag
        fileList.remove(fileList[x])            # then remove the file from the file list

第一个for循环在这里用于调试，并且可以按预期工作，但是第二个for循环在我尝试实际打开文件的地方给出了索引超出范围的错误。 任何帮助，将不胜感激。

Answer 1

当执行fileList.remove ，如果fileText.find(tagName) == -1 （ fileText.find(tagName) == -1更改要在for循环中迭代的列表的长度），则使列表变小

请参阅以下简化示例：

test_list = [1, 2, 3, 4, 5]
num_items = len(test_list)

for i in range(0, num_items):
    print("Dealing with i=%s" % i)
    data = test_list[i]
    if data == 2 or data == 3 or data == 4:
        print("Removing i=%s (data=%s)" % (i, data))
        test_list.remove(data)
    print("Now test_list=%s, with %s items" % (test_list, len(test_list)))

哪个输出：

Dealing with i=0
Now test_list=[1, 2, 3, 4, 5], with 5 items
Dealing with i=1
Removing i=1 (data=2)
Now test_list=[1, 3, 4, 5], with 4 items
Dealing with i=2
Removing i=2 (data=4)
Now test_list=[1, 3, 5], with 3 items
Dealing with i=3
Traceback (most recent call last):
  File "./stack_101.py", line 25, in <module>
    data = test_list[i]
IndexError: list index out of range

由于您只需要“访问”一次文件，因此建议您将循环更改为一段while ：

test_list = [1, 2, 3, 4, 5]
num_items = len(test_list)

i = 0
while i < len(test_list):
    data = test_list[i]
    print("Dealing with i=%s (data=%s)" % (i, data))
    if data == 2 or data == 3 or data == 4:
        print("Removing i=%s, data=%s. NOT advancing" % (i, data))
        test_list.remove(data)
    else:
        i += 1
        print("Advancing counter to i=%s because we didn't remove the entry" % i)
    print("Now test_list=%s, with %s items" % (test_list, len(test_list)))
print("After the loop, test_list=%s" % test_list)

正确输出：

Dealing with i=0 (data=1)
Advancing counter to i=1 because we didn't remove the entry
Now test_list=[1, 2, 3, 4, 5], with 5 items
Dealing with i=1 (data=2)
Removing i=1, data=2. NOT advancing
Now test_list=[1, 3, 4, 5], with 4 items
Dealing with i=1 (data=3)
Removing i=1, data=3. NOT advancing
Now test_list=[1, 4, 5], with 3 items
Dealing with i=1 (data=4)
Removing i=1, data=4. NOT advancing
Now test_list=[1, 5], with 2 items
Dealing with i=1 (data=5)
Advancing counter to i=2 because we didn't remove the entry
Now test_list=[1, 5], with 2 items
After the loop, test_list=[1, 5]

但是：您真的需要更改列表吗？ 如您所见，这会弄乱代码并导致复杂化。 只用未删除的文件创建一个新列表怎么样？

就像是：

test_list = [1, 2, 3, 4, 5]
num_items = len(test_list)
new_list = []
for i in range(0, num_items):
    data = test_list[i]
    print("Dealing with i=%s (data=%s)" % (i, data))
    if not(data == 2 or data == 3 or data == 4):
        print("Keeping i=%s (data=%s)" % (i, data))
        new_list.append(data)
print("After the loop, new_list=%s" % new_list)

这将“适当”的值new_list在new_list ：

Dealing with i=0 (data=1)
Keeping i=0 (data=1)
Dealing with i=1 (data=2)
Dealing with i=2 (data=3)
Dealing with i=3 (data=4)
Dealing with i=4 (data=5)
Keeping i=4 (data=5)
After the loop, new_list=[1, 5]

应用于您的代码，我想可能是这样（未经测试）：

found_files = []
for x in range(0, numFiles):
    f = open(dirName + "/" + fileList[x], 'r')  # open the file for reading
    fileText = f.read()                         # read file contents into string
    f.close()                                   # close file
    if fileText.find(tagName) >= 0:             # if the file text contains the tag
        found_files.append(fileList[x])         # then add it to the new list

Answer 2

为了避免超出范围的错误，请尝试以下操作：

for f in fileList:
    < CODE HERE > # f will be the actual file name now

如果要保留索引，请尝试以下操作：

for i, f in enumerate(fileList):
    < CODE HERE > # i will be a counter and f will be the actual file name

编辑-对不起，您甚至没有注意到您正在动态更改列表大小。 那就是索引错误所在！

Answer 3

正如其他人指出的那样，您正在从列表中删除元素，该索引超出了列表的当前大小，该大小小于numFiles。

一种解决方法是：

for x in range(0, numFiles):
    print(fileList[x])
last_index = numFiles
x = 0
while x < last_index:
    f = open(dirName + "/" + fileList[x], 'r')  
    fileText = f.read()                         
    f.close()                                   
    if fileText.find(tagName) == -1:            
        fileList.pop(x)   #pop is better, less ambiguous in case there is duplicates     
        last_index -= 1   #Decrement the end of the loop
     else:
        x += 1  #go to the next index only if you didn't remove an item

Answer 4

感谢BorrajaX和上面的其他类似建议，我决定再次尝试解决方案，但是这次使用的方法略有不同。 我没有从复制的列表中删除，而是创建了一个新的空列表，如果找到了标签名称，则将其追加到列表中。 而且效果很好！ 在这里感谢大家的帮助！ 如果有人感兴趣，这是修改后的代码。

for x in range(0, numFiles):
    print(fileList[x])

resultFileList = []

for x in range(0, numFiles):
    f = open(dirName + "/" + fileList[x], 'r')  # open the file for reading
    fileText = f.read()                         # read file contents into string
    f.close()                                   # close file
    if fileText.find(tagName) >= 0:            # if the file text doesn't contain the tag
        resultFileList.append(fileList[x])      # then remove the file from the file list

如何在python中使用for循环搜索文件列表

问题描述

4 个解决方案

解决方案1
1 已采纳 2018-01-03 06:31:44

解决方案2
0 2018-01-03 06:20:12

解决方案3
0 2018-01-03 06:29:24

解决方案4
0 2018-01-03 21:47:33

如何在python中使用for循环搜索文件列表

问题描述

4 个解决方案

解决方案1 1 已采纳 2018-01-03 06:31:44

解决方案2 0 2018-01-03 06:20:12

解决方案3 0 2018-01-03 06:29:24

解决方案4 0 2018-01-03 21:47:33

解决方案1
1 已采纳 2018-01-03 06:31:44

解决方案2
0 2018-01-03 06:20:12

解决方案3
0 2018-01-03 06:29:24

解决方案4
0 2018-01-03 21:47:33