This is for the CS50 course an assignment called DNA in python, Ive struggled for days researching and trying to work out how to get the final section to work. Im a newbie. Ive uploaded a database of people and their DNA into memory into a list of dict's Ive then read a test sample of DNA into memory as a string and then searched and filtered it looking for DNA sequences. So what I now have is a dict called str_test AND the list of dict's called data holding everyones DNA, I need to somehow compare str_test with data to see if anyone matched and return the persons name.
Like I said Im struggling with this, I have worked out how to loop and address the values in data - the list of dict and also str_test the dict of results but I cant blend them together I apologies for the amount of #d out areas but they are for me for testing, any guidance would be appreciated. The last 5 to 7 lines are me trying to loop through but thats wrong there has to be a simple better way, thanks
import csv
import sys
def main():
# Ensure correct usage
if len(sys.argv) != 3:
sys.exit("Usage: python dna.py data.csv sequence.txt")
data = []
with open(sys.argv[1], "r") as csvfile: #open the file in the command line arguement
reader = csv.DictReader(csvfile)
for row in reader: #lets go loopy
row = (row)
data.append(row)
#print(row)
#print(data)
#print(reader.fieldnames)
#print(data)
with open(sys.argv[2], "r") as file: #open the sample file in the command line arguement
sequence = file.read() #read it to array/memory
print(reader.fieldnames) #test print
print(sequence) #test print
#two files now opened
#sequence is the test DNA sequence
#we want to loop through the STR's here and take the first and then loop that through the sequence
str_test = {} # a dictionary for the counts
for i in range (1, (len(reader.fieldnames ))): #read the header field names ie STR, count them starting after name field
sample = (reader.fieldnames[i])
print(reader.fieldnames)
print(sample) #DNA type to compare to test string called SEQUENCE
str_test[sample] = 0
for j in range(len(sequence)): # this will loop through the long string to be tested for the STR DNA (sample)
step = 0
max_count = 0
while sequence[j + step:j+step+len(sample)] == sample:
step = step + len(sample)
max_count += 1
print(max_count)
j = j + step
#test = (str_test.get(sample))
if max_count > (str_test.get(sample)): #get the existing value of the sample and compare
str_test[sample] = max_count #if count is larger then update field if not continue
print(str_test) #test print to see whats in the dictionary
print(str_test.values())
#print(key.values())
#print(data)
#print(len(data))
#print(type(str_test))
for d in data: #data is a list of dictionaries - so this is cycle through the list [1:] to start at first
for values in (str_test): #start to loop through the dictionary for the test string dna results
for key in d: #this is cycling through the dictionary thats part of the list of dictionaries
strvalue = str_test.get(values)
datavalue = int(d.get(values)
#print ("string value is ", strvalue)
#print ("dictionary value is", d[key]) #test print to see what we get
main()
the information contained in str_test looks like this - {'AGATC': 4, 'TTTTTTCT': 0, 'AATG': 1, 'TCTAG': 0, 'GATA': 1, 'TATC': 5, 'GAAA': 1, 'TCTG': 0}
the information in the list of dict Ive got to search for a match against the str_test above looks like this [{'name': 'Albus', 'AGATC': '15', 'TTTTTTCT': '49', 'AATG': '38', 'TCTAG': '5', 'GATA': '14', 'TATC': '44', 'GAAA': '14', 'TCTG': '12'}, {'name': 'Cedric', 'AGATC': '31', 'TTTTTTCT': '21', 'AATG': '41', 'TCTAG': '28', 'GATA': '30', 'TATC': '9', 'GAAA': '36', 'TCTG': '44'}, {'name': 'Draco', 'AGATC': '9', 'TTTTTTCT': '13', 'AATG': '8', 'TCTAG': '26', 'GATA': '15', 'TATC': '25', 'GAAA': '41', 'TCTG': '39'}, {'name': 'Fred', 'AGATC': '37', 'TTTTTTCT': '40', 'AATG': '10', 'TCTAG': '6', 'GATA': '5', 'TATC': '10', 'GAAA': '28', 'TCTG': '8'}, {'name': 'Ginny', 'AGATC': '37', 'TTTTTTCT': '47', 'AATG': '10', 'TCTAG': '23', 'GATA': '5', 'TATC': '48', 'GAAA': '28', 'TCTG': '23'}, {'name': 'Hagrid', 'AGATC': '25', 'TTTTTTCT': '38', 'AATG': '45', 'TCTA - this is a short extract
This does what you asked, but I'm 100% certain what you asked is not the problem you were asked to solve. As I mentioned, every name in your name list contains every sequence in your search list. It's pretty easy to process, because the keys in your database are the exact sequences, so you don't even have to do string searches.
searches = {'AGATC': 4, 'TTTTTTCT': 0, 'AATG': 1, 'TCTAG': 0, 'GATA': 1, 'TATC': 5, 'GAAA': 1, 'TCTG': 0}
database = [
{'name': 'Albus', 'AGATC': '15', 'TTTTTTCT': '49', 'AATG': '38', 'TCTAG': '5', 'GATA': '14', 'TATC': '44', 'GAAA': '14', 'TCTG': '12'},
{'name': 'Cedric', 'AGATC': '31', 'TTTTTTCT': '21', 'AATG': '41', 'TCTAG': '28', 'GATA': '30', 'TATC': '9', 'GAAA': '36', 'TCTG': '44'},
{'name': 'Draco', 'AGATC': '9', 'TTTTTTCT': '13', 'AATG': '8', 'TCTAG': '26', 'GATA': '15', 'TATC': '25', 'GAAA': '41', 'TCTG': '39'},
{'name': 'Fred', 'AGATC': '37', 'TTTTTTCT': '40', 'AATG': '10', 'TCTAG': '6', 'GATA': '5', 'TATC': '10', 'GAAA': '28', 'TCTG': '8'},
{'name': 'Ginny', 'AGATC': '37', 'TTTTTTCT': '47', 'AATG': '10', 'TCTAG': '23', 'GATA': '5', 'TATC': '48', 'GAAA': '28', 'TCTG': '23'},
{'name': 'Hagrid', 'AGATC': '25', 'TTTTTTCT': '38', 'AATG': '45'}]
for row in database:
for search in searches.keys():
if search in row:
print( row['name'], 'matches', search )
Output:
Albus matches AGATC
Albus matches TTTTTTCT
Albus matches AATG
Albus matches TCTAG
Albus matches GATA
Albus matches TATC
Albus matches GAAA
Albus matches TCTG
Cedric matches AGATC
Cedric matches TTTTTTCT
Cedric matches AATG
Cedric matches TCTAG
Cedric matches GATA
Cedric matches TATC
Cedric matches GAAA
Cedric matches TCTG
Draco matches AGATC
Draco matches TTTTTTCT
Draco matches AATG
Draco matches TCTAG
Draco matches GATA
Draco matches TATC
Draco matches GAAA
Draco matches TCTG
Fred matches AGATC
Fred matches TTTTTTCT
Fred matches AATG
Fred matches TCTAG
Fred matches GATA
Fred matches TATC
Fred matches GAAA
Fred matches TCTG
Ginny matches AGATC
Ginny matches TTTTTTCT
Ginny matches AATG
Ginny matches TCTAG
Ginny matches GATA
Ginny matches TATC
Ginny matches GAAA
Ginny matches TCTG
Hagrid matches AGATC
Hagrid matches TTTTTTCT
Hagrid matches AATG
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.