简体   繁体   English

如何在 DNA PSET5 CS50x 中打印人名

[英]How do I print the person name in DNA PSET5 CS50x

I don't know how to print the person's name that matches the numbers (as strings) returned from "list4" (sorry for bad english) So I use print(list4) and I get the right values, but I don't know how to get the name from the person.我不知道如何打印与“list4”返回的数字(作为字符串)相匹配的人名(抱歉英语不好)所以我使用print(list4)并得到正确的值,但我不知道如何从人那里得到名字。 Example: list4 = ['4', '1', '5'] , so how I get 'Bob'?示例: list4 = ['4', '1', '5'] ,那么我如何得到“Bob”? I would appreciate any help!我将不胜感激任何帮助!

import csv
import sys
import itertools
import re
import collections
import json
import functools

def main():

    # TODO: Check for command-line usage
    # not done yet
    filecsv = sys.argv[1]
    filetext = sys.argv[2]
    names = []
    # TODO: Read DNA sequence file into a variable
    with open(filecsv, "r") as csvfile:
        reader = csv.reader(csvfile)
        dict_list = list(reader)
        names.append(dict_list)
    # Open sequences file and convert to list
    with open(filetext, "r") as file:
        sequence = file.read()
    # TODO: Find longest match of each STR in DNA sequence
    find_STR = []
    for i in range(1, len(dict_list[0])):
       find_STR.append(longest_match(sequence, dict_list[0][i]))

    #TODO: Check database for matching profiles
    #convert dict_list to a string
    listToStr = ' '.join([str(elem) for elem in dict_list])
    #convert find_STR to a string
    A = [str(x) for x in find_STR]   
    # compare both strings
    list3 = set(A)&set(listToStr)
    list4 = sorted(list3, key = lambda k : A.index(k))
    if(list4):
       print(name) # how???`
    return


def longest_match(sequence, subsequence):
    """Returns length of longest run of subsequence in sequence."""

    # Initialize variables
    longest_run = 0
    subsequence_length = len(subsequence)
    sequence_length = len(sequence)

    # Check each character in sequence for most consecutive runs of subsequence
    for i in range(sequence_length):

        # Initialize count of consecutive runs
        count = 0

        # Check for a subsequence match in a "substring" (a subset of characters) within sequence
        # If a match, move substring to next potential match in sequence
        # Continue moving substring and checking for matches until out of consecutive matches
        while True:

            # Adjust substring start and end
            start = i + count * subsequence_length
            end = start + subsequence_length

            # If there is a match in the substring
            if sequence[start:end] == subsequence:
                count += 1

            # If there is no match in the substring
            else:
                break

        # Update most consecutive matches found
        longest_run = max(longest_run, count)

    
    return longest_run


main()

Start by inspecting the values in your variables.首先检查变量中的值。 For example, look at dict_list and names for the small.csv file and you will find:例如,查看small.csv文件的dict_listnames ,您会发现:

dict_list:
[['name', 'AGATC', 'AATG', 'TATC'], ['Alice', '2', '8', '3'], ['Bob', '4', '1', '5'], ['Charlie', '3', '2', '5']]
names:
[[['name', 'AGATC', 'AATG', 'TATC'], ['Alice', '2', '8', '3'], ['Bob', '4', '1', '5'], ['Charlie', '3', '2', '5']]]

First observation: dict_list is a list of lists (not dictionaries).第一个观察: dict_list是列表的列表(不是字典)。 This happens when you set dict_list = list(reader) .当您设置dict_list = list(reader)时会发生这种情况。 Use csv.DictReader() if you want to create a list of dictionaries.如果要创建字典列表,请使用csv.DictReader() You don't have to create a list of dictionaries, but you will find it makes it much easier to work with the data.您不必创建字典列表,但您会发现它使处理数据变得更加容易。 Also, there is nothing gained by appending dict_list to another list ( names ).此外,将dict_list附加到另一个列表 ( names ) 也没有任何好处。

Next, look at find_STR .接下来,查看find_STR For sequence 1 is: [4, 1, 5] .对于序列 1 是: [4, 1, 5] However, you didn't save the longest match value with the STR sequence name.但是,您没有保存与 STR 序列名称最长的匹配值。 As a result, you have to reference the first list item in names (or dict_list ).因此,您必须引用names (或dict_list )中的第一个列表项。

Once you have the longest match values (in find_STR ), you need to compare them to the names and sequence counts in dict_list , and find the 1 that matches.一旦您拥有最长的匹配值(在find_STR中),您需要将它们与dict_list中的名称和序列计数进行比较,并找到匹配的 1。 (For this sequence, it will be ['Bob', '4', '1', '5'] .) Once you find the match, the first item in the list is the name you want: Bob . (对于这个序列,它将是['Bob', '4', '1', '5'] 。)找到匹配项后,列表中的第一项就是您想要的名字: Bob

None of the code to create A, list3 or list4 do this.创建A, list3 or list4的代码都没有这样做。 They simply return '4', '1', '5' as different objects: A is a list of strings, list3 is an unsorted set of strings, and list4 is a sorted list of strings that matches A .它们只是将'4', '1', '5'作为不同的对象返回: A是字符串列表, list3是未排序的字符串集, list4是与A匹配的已排序字符串列表。 There isn't a name there to print.那里没有要打印的名称。

Good luck.祝你好运。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM