[英]CS50 Problem Set 6, IndexError: list index out of range
我不知道这里出了什么问题,但是在尝试使用大型数据库时收到错误消息,错误不断弹出。 例如:
dna/ $ python dna.py databases/large.csv sequences/10.txt
Traceback (most recent call last):
File "/workspaces/103840690/dna/dna.py", line 104, in <module>
main()
File "/workspaces/103840690/dna/dna.py", line 47, in main
check[i][j] = False
IndexError: list index out of range
我知道这种类型的错误意味着我正在尝试访问一个不存在的索引,但我尝试的任何方法似乎都不起作用。 同样奇怪的是,我只在使用大型数据库时才得到它。
问题可能在第 40 - 49 行,注释“检查数据库是否匹配配置文件”在哪里,我只是粘贴了上下文的整个代码
import csv
import sys
def main():
# Check for command-line usage
if len(sys.argv) != 3:
print("Two command-line arguments needed. ")
return 1
# Read database file into a variable
with open(sys.argv[1], "r") as csv_file:
csv_database = csv.DictReader(csv_file)
# create a list where we can put dictionaries
database = []
for lines in csv_database:
database.append(lines)
# create a keys list where we can put STRs
STRs = []
for key in database[0].keys():
STRs.append(key)
STRs.remove("name")
# Read DNA sequence file into a variable
with open(sys.argv[2], "r") as txt_file:
sequence = txt_file.read()
# Find longest match of each STR in DNA sequence
matches = {}
for i in range(len(STRs)):
matches[STRs[i]] = longest_match(sequence, STRs[i])
# Check database for matching profiles
check = [[0]*len(database)]*len(STRs)
match = None
for i in range(len(database)):
for j in range(len(STRs)):
if matches[STRs[j]] == int(database[i][STRs[j]]):
check[i][j] = True
else:
check[i][j] = False
if False not in check[i]:
match = i
if match != None:
print(database[match]["name"])
else:
print("No match")
return
def longest_match(sequence, subsequence):
"""Returns length of longest run of subsequence in sequence."""
# Initialize variables
longest_run = 0
subsequence_length = len(subsequence)
sequence_length = len(sequence)
# Check each character in sequence for most consecutive runs of subsequence
for i in range(sequence_length):
# Initialize count of consecutive runs
count = 0
# Check for a subsequence match in a "substring" (a subset of characters) within sequence
# If a match, move substring to next potential match in sequence
# Continue moving substring and checking for matches until out of consecutive matches
while True:
# Adjust substring start and end
start = i + count * subsequence_length
end = start + subsequence_length
# If there is a match in the substring
if sequence[start:end] == subsequence:
count += 1
# If there is no match in the substring
else:
break
# Update most consecutive matches found
longest_run = max(longest_run, count)
# After checking for runs at each character in seqeuence, return longest run found
return longest_run
main()
您的索引顺序错误。 check 是一个 len(STRs) 元素的列表。 每个都是带有 len(database) 元素的列表。
# Check database for matching profiles
check = [[0]*len(database)]*len(STRs)
match = None
for i in range(len(database)):
for j in range(len(STRs)):
if matches[STRs[j]] == int(database[i][STRs[j]]):
check[i][j] = True
else:
check[i][j] = False
if False not in check[i]:
match = i
您正在使用变量 i 遍历数据库,并使用变量 j 遍历 STR。 要将您的设置与 check 匹配,结果应存储在check[j][i]
以匹配check
的初始化。
当你将一个列表相乘时,会发生的是,整个列表被相乘,而不是元素。 请参阅此示例。
a = [[0]*2]*5
print(a)
> [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0]]
print(a[4][1])
> 0
当您使用check = [[0]*len(database)]*len(STRs)
时,列表的索引取决于 len(STRs),如果您还想更深入地了解该列表,您可以根据len(数据库)的值。 您需要通过此修改您的代码。
for i in range(len(STRs)):
for j in range(len(database)):
if matches[STRs[j]] == int(database[i][STRs[j]]):
check[i][j] = True
else:
check[i][j] = False
if False not in check[i]:
match = i
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.