简体   繁体   English

CS50 问题集 6,IndexError: list index out of range

[英]CS50 Problem Set 6, IndexError: list index out of range

I don't know what is wrong here but I get an error message when trying to use a large database, an error keeps popping up.我不知道这里出了什么问题,但是在尝试使用大型数据库时收到错误消息,错误不断弹出。 For example:例如:

dna/ $ python dna.py databases/large.csv sequences/10.txt
Traceback (most recent call last):
  File "/workspaces/103840690/dna/dna.py", line 104, in <module>
    main()
  File "/workspaces/103840690/dna/dna.py", line 47, in main
    check[i][j] = False
IndexError: list index out of range

I know this type of error means that I am trying to get to an index that doesn't exist, but anything I try doesn't seem to work.我知道这种类型的错误意味着我正在尝试访问一个不存在的索引,但我尝试的任何方法似乎都不起作用。 Also it is weird that I only get it when using a large database.同样奇怪的是,我只在使用大型数据库时才得到它。

The problem is in line 40 - 49 probably, where is the comment "Check database for matching profiles", I just pasted the whole code for the context问题可能在第 40 - 49 行,注释“检查数据库是否匹配配置文件”在哪里,我只是粘贴了上下文的整个代码

import csv
import sys


def main():

    # Check for command-line usage
    if len(sys.argv) != 3:
        print("Two command-line arguments needed. ")
        return 1


    # Read database file into a variable
    with open(sys.argv[1], "r") as csv_file:
        csv_database = csv.DictReader(csv_file)

        # create a list where we can put dictionaries
        database = []
        for lines in csv_database:
            database.append(lines)

        # create a keys list where we can put STRs
        STRs = []
        for key in database[0].keys():
            STRs.append(key)
        STRs.remove("name")


    # Read DNA sequence file into a variable
    with open(sys.argv[2], "r") as txt_file:
        sequence = txt_file.read()


    # Find longest match of each STR in DNA sequence
    matches = {}
    for i in range(len(STRs)):
        matches[STRs[i]] = longest_match(sequence, STRs[i])

    # Check database for matching profiles
    check = [[0]*len(database)]*len(STRs)
    match = None
    for i in range(len(database)):
        for j in range(len(STRs)):
            if matches[STRs[j]] == int(database[i][STRs[j]]):
                check[i][j] = True
            else:
                check[i][j] = False
        if False not in check[i]:
            match = i

    if match != None:
        print(database[match]["name"])
    else:
        print("No match")

    return


def longest_match(sequence, subsequence):
    """Returns length of longest run of subsequence in sequence."""

    # Initialize variables
    longest_run = 0
    subsequence_length = len(subsequence)
    sequence_length = len(sequence)

    # Check each character in sequence for most consecutive runs of subsequence
    for i in range(sequence_length):

        # Initialize count of consecutive runs
        count = 0

        # Check for a subsequence match in a "substring" (a subset of characters) within sequence
        # If a match, move substring to next potential match in sequence
        # Continue moving substring and checking for matches until out of consecutive matches
        while True:

            # Adjust substring start and end
            start = i + count * subsequence_length
            end = start + subsequence_length

            # If there is a match in the substring
            if sequence[start:end] == subsequence:
                count += 1

            # If there is no match in the substring
            else:
                break

        # Update most consecutive matches found
        longest_run = max(longest_run, count)

    # After checking for runs at each character in seqeuence, return longest run found
    return longest_run


main()

Your indices are in the wrong order.您的索引顺序错误。 check is a list of len(STRs) Elements. check 是一个 len(STRs) 元素的列表。 Each is a list with len(database) elements.每个都是带有 len(database) 元素的列表。

   # Check database for matching profiles
    check = [[0]*len(database)]*len(STRs)
    match = None
    for i in range(len(database)):
        for j in range(len(STRs)):
            if matches[STRs[j]] == int(database[i][STRs[j]]):
                check[i][j] = True
            else:
                check[i][j] = False
        if False not in check[i]:
            match = i

You are iterating over the databases with the variable i and over the STRs with the variable j.您正在使用变量 i 遍历数据库,并使用变量 j 遍历 STR。 To match your setup with check the result should be stored in check[j][i] to match the initialization of check .要将您的设置与 check 匹配,结果应存储在check[j][i]以匹配check的初始化。

When you multiply a list, what happens is, the whole list gets multiplied not the elements.当你将一个列表相乘时,会发生的是,整个列表被相乘,而不是元素。 See this example.请参阅此示例。

a = [[0]*2]*5
print(a)
> [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0]]
print(a[4][1])
> 0

As you are using check = [[0]*len(database)]*len(STRs) where the index of the list depends on len(STRs), and If you want to go deeper into that list also, you traverse depending on the value of len(database).当您使用check = [[0]*len(database)]*len(STRs)时,列表的索引取决于 len(STRs),如果您还想更深入地了解该列表,您可以根据len(数据库)的值。 You need to modify your code by this one.您需要通过此修改您的代码。

for i in range(len(STRs)):
    for j in range(len(database)):
        if matches[STRs[j]] == int(database[i][STRs[j]]):
            check[i][j] = True
        else:
            check[i][j] = False
    if False not in check[i]:
        match = i

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM