[英]How do I print the person name in DNA PSET5 CS50x
I don't know how to print the person's name that matches the numbers (as strings) returned from "list4" (sorry for bad english) So I use print(list4)
and I get the right values, but I don't know how to get the name from the person.我不知道如何打印与“list4”返回的数字(作为字符串)相匹配的人名(抱歉英语不好)所以我使用print(list4)
并得到正确的值,但我不知道如何从人那里得到名字。 Example: list4 = ['4', '1', '5']
, so how I get 'Bob'?示例: list4 = ['4', '1', '5']
,那么我如何得到“Bob”? I would appreciate any help!我将不胜感激任何帮助!
import csv
import sys
import itertools
import re
import collections
import json
import functools
def main():
# TODO: Check for command-line usage
# not done yet
filecsv = sys.argv[1]
filetext = sys.argv[2]
names = []
# TODO: Read DNA sequence file into a variable
with open(filecsv, "r") as csvfile:
reader = csv.reader(csvfile)
dict_list = list(reader)
names.append(dict_list)
# Open sequences file and convert to list
with open(filetext, "r") as file:
sequence = file.read()
# TODO: Find longest match of each STR in DNA sequence
find_STR = []
for i in range(1, len(dict_list[0])):
find_STR.append(longest_match(sequence, dict_list[0][i]))
#TODO: Check database for matching profiles
#convert dict_list to a string
listToStr = ' '.join([str(elem) for elem in dict_list])
#convert find_STR to a string
A = [str(x) for x in find_STR]
# compare both strings
list3 = set(A)&set(listToStr)
list4 = sorted(list3, key = lambda k : A.index(k))
if(list4):
print(name) # how???`
return
def longest_match(sequence, subsequence):
"""Returns length of longest run of subsequence in sequence."""
# Initialize variables
longest_run = 0
subsequence_length = len(subsequence)
sequence_length = len(sequence)
# Check each character in sequence for most consecutive runs of subsequence
for i in range(sequence_length):
# Initialize count of consecutive runs
count = 0
# Check for a subsequence match in a "substring" (a subset of characters) within sequence
# If a match, move substring to next potential match in sequence
# Continue moving substring and checking for matches until out of consecutive matches
while True:
# Adjust substring start and end
start = i + count * subsequence_length
end = start + subsequence_length
# If there is a match in the substring
if sequence[start:end] == subsequence:
count += 1
# If there is no match in the substring
else:
break
# Update most consecutive matches found
longest_run = max(longest_run, count)
return longest_run
main()
Start by inspecting the values in your variables.首先检查变量中的值。 For example, look at dict_list
and names
for the small.csv
file and you will find:例如,查看small.csv
文件的dict_list
和names
,您会发现:
dict_list:
[['name', 'AGATC', 'AATG', 'TATC'], ['Alice', '2', '8', '3'], ['Bob', '4', '1', '5'], ['Charlie', '3', '2', '5']]
names:
[[['name', 'AGATC', 'AATG', 'TATC'], ['Alice', '2', '8', '3'], ['Bob', '4', '1', '5'], ['Charlie', '3', '2', '5']]]
First observation: dict_list
is a list of lists (not dictionaries).第一个观察: dict_list
是列表的列表(不是字典)。 This happens when you set dict_list = list(reader)
.当您设置dict_list = list(reader)
时会发生这种情况。 Use csv.DictReader()
if you want to create a list of dictionaries.如果要创建字典列表,请使用csv.DictReader()
。 You don't have to create a list of dictionaries, but you will find it makes it much easier to work with the data.您不必创建字典列表,但您会发现它使处理数据变得更加容易。 Also, there is nothing gained by appending dict_list
to another list ( names
).此外,将dict_list
附加到另一个列表 ( names
) 也没有任何好处。
Next, look at find_STR
.接下来,查看find_STR
。 For sequence 1 is: [4, 1, 5]
.对于序列 1 是: [4, 1, 5]
。 However, you didn't save the longest match value with the STR sequence name.但是,您没有保存与 STR 序列名称最长的匹配值。 As a result, you have to reference the first list item in names
(or dict_list
).因此,您必须引用names
(或dict_list
)中的第一个列表项。
Once you have the longest match values (in find_STR
), you need to compare them to the names and sequence counts in dict_list
, and find the 1 that matches.一旦您拥有最长的匹配值(在find_STR
中),您需要将它们与dict_list
中的名称和序列计数进行比较,并找到匹配的 1。 (For this sequence, it will be ['Bob', '4', '1', '5']
.) Once you find the match, the first item in the list is the name you want: Bob
. (对于这个序列,它将是['Bob', '4', '1', '5']
。)找到匹配项后,列表中的第一项就是您想要的名字: Bob
。
None of the code to create A, list3 or list4
do this.创建A, list3 or list4
的代码都没有这样做。 They simply return '4', '1', '5'
as different objects: A
is a list of strings, list3
is an unsorted set of strings, and list4
is a sorted list of strings that matches A
.它们只是将'4', '1', '5'
作为不同的对象返回: A
是字符串列表, list3
是未排序的字符串集, list4
是与A
匹配的已排序字符串列表。 There isn't a name there to print.那里没有要打印的名称。
Good luck.祝你好运。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.