简体   繁体   中英

Lists, Dict and Lists of Dict - CS50 and Python

This is for the CS50 course an assignment called DNA in python, Ive struggled for days researching and trying to work out how to get the final section to work. Im a newbie. Ive uploaded a database of people and their DNA into memory into a list of dict's Ive then read a test sample of DNA into memory as a string and then searched and filtered it looking for DNA sequences. So what I now have is a dict called str_test AND the list of dict's called data holding everyones DNA, I need to somehow compare str_test with data to see if anyone matched and return the persons name.

Like I said Im struggling with this, I have worked out how to loop and address the values in data - the list of dict and also str_test the dict of results but I cant blend them together I apologies for the amount of #d out areas but they are for me for testing, any guidance would be appreciated. The last 5 to 7 lines are me trying to loop through but thats wrong there has to be a simple better way, thanks

import csv
import sys


def main():

    # Ensure correct usage
    if len(sys.argv) != 3:
        sys.exit("Usage: python dna.py data.csv sequence.txt")

    data = []

    with open(sys.argv[1], "r")  as csvfile:                     #open the file in the command line arguement
        reader = csv.DictReader(csvfile)

        for row in reader:                                     #lets go loopy
            row = (row)
            data.append(row)

            #print(row)
            #print(data)
            #print(reader.fieldnames)
            #print(data)

    with open(sys.argv[2], "r")  as file:                      #open the sample file in the command line arguement

        sequence = file.read()                                  #read it to array/memory

        print(reader.fieldnames)            #test print
        print(sequence)                     #test print

#two files now opened
#sequence is the test DNA sequence


#we want to loop through the STR's here and take the first and then loop that through the sequence


    str_test = {}           # a dictionary for the counts

    for i in range (1, (len(reader.fieldnames ))):              #read the header field names ie STR, count them starting after name field
        sample = (reader.fieldnames[i])
        print(reader.fieldnames)
        print(sample)                                           #DNA type to compare to test string called SEQUENCE
        str_test[sample] = 0

        for j in range(len(sequence)):                          # this will loop through the long string to be tested for the STR DNA (sample)

            step = 0
            max_count = 0

            while sequence[j + step:j+step+len(sample)]  == sample:
                step = step + len(sample)
                max_count += 1
                print(max_count)

            j = j + step
                                                                #test = (str_test.get(sample))
            if max_count > (str_test.get(sample)):            #get the existing value of the sample and compare
                str_test[sample] = max_count                    #if count is larger then update field if not continue


    print(str_test)                                             #test print to see whats in the dictionary
    print(str_test.values())
    #print(key.values())
    #print(data)
    #print(len(data))
    #print(type(str_test))


    for d in data:                                 #data is a list of dictionaries - so this is cycle through the list  [1:] to start at first
        for values in (str_test):          #start to loop through the dictionary for the test string dna results

            for key in d:                   #this is cycling through the dictionary  thats part of the list of dictionaries

                strvalue = str_test.get(values)
                datavalue = int(d.get(values)

                #print ("string value is  ", strvalue)
                #print ("dictionary value is", d[key])              #test print to see what we get




main()

the information contained in str_test looks like this - {'AGATC': 4, 'TTTTTTCT': 0, 'AATG': 1, 'TCTAG': 0, 'GATA': 1, 'TATC': 5, 'GAAA': 1, 'TCTG': 0}

the information in the list of dict Ive got to search for a match against the str_test above looks like this [{'name': 'Albus', 'AGATC': '15', 'TTTTTTCT': '49', 'AATG': '38', 'TCTAG': '5', 'GATA': '14', 'TATC': '44', 'GAAA': '14', 'TCTG': '12'}, {'name': 'Cedric', 'AGATC': '31', 'TTTTTTCT': '21', 'AATG': '41', 'TCTAG': '28', 'GATA': '30', 'TATC': '9', 'GAAA': '36', 'TCTG': '44'}, {'name': 'Draco', 'AGATC': '9', 'TTTTTTCT': '13', 'AATG': '8', 'TCTAG': '26', 'GATA': '15', 'TATC': '25', 'GAAA': '41', 'TCTG': '39'}, {'name': 'Fred', 'AGATC': '37', 'TTTTTTCT': '40', 'AATG': '10', 'TCTAG': '6', 'GATA': '5', 'TATC': '10', 'GAAA': '28', 'TCTG': '8'}, {'name': 'Ginny', 'AGATC': '37', 'TTTTTTCT': '47', 'AATG': '10', 'TCTAG': '23', 'GATA': '5', 'TATC': '48', 'GAAA': '28', 'TCTG': '23'}, {'name': 'Hagrid', 'AGATC': '25', 'TTTTTTCT': '38', 'AATG': '45', 'TCTA - this is a short extract

This does what you asked, but I'm 100% certain what you asked is not the problem you were asked to solve. As I mentioned, every name in your name list contains every sequence in your search list. It's pretty easy to process, because the keys in your database are the exact sequences, so you don't even have to do string searches.

searches = {'AGATC': 4, 'TTTTTTCT': 0, 'AATG': 1, 'TCTAG': 0, 'GATA': 1, 'TATC': 5, 'GAAA': 1, 'TCTG': 0}

database = [
    {'name': 'Albus', 'AGATC': '15', 'TTTTTTCT': '49', 'AATG': '38', 'TCTAG': '5', 'GATA': '14', 'TATC': '44', 'GAAA': '14', 'TCTG': '12'}, 
    {'name': 'Cedric', 'AGATC': '31', 'TTTTTTCT': '21', 'AATG': '41', 'TCTAG': '28', 'GATA': '30', 'TATC': '9', 'GAAA': '36', 'TCTG': '44'}, 
    {'name': 'Draco', 'AGATC': '9', 'TTTTTTCT': '13', 'AATG': '8', 'TCTAG': '26', 'GATA': '15', 'TATC': '25', 'GAAA': '41', 'TCTG': '39'}, 
    {'name': 'Fred', 'AGATC': '37', 'TTTTTTCT': '40', 'AATG': '10', 'TCTAG': '6', 'GATA': '5', 'TATC': '10', 'GAAA': '28', 'TCTG': '8'}, 
    {'name': 'Ginny', 'AGATC': '37', 'TTTTTTCT': '47', 'AATG': '10', 'TCTAG': '23', 'GATA': '5', 'TATC': '48', 'GAAA': '28', 'TCTG': '23'}, 
    {'name': 'Hagrid', 'AGATC': '25', 'TTTTTTCT': '38', 'AATG': '45'}]

for row in database:
    for search in searches.keys():
        if  search in row:
            print( row['name'], 'matches', search )

Output:

Albus matches AGATC
Albus matches TTTTTTCT
Albus matches AATG
Albus matches TCTAG
Albus matches GATA
Albus matches TATC
Albus matches GAAA
Albus matches TCTG
Cedric matches AGATC
Cedric matches TTTTTTCT
Cedric matches AATG
Cedric matches TCTAG
Cedric matches GATA
Cedric matches TATC
Cedric matches GAAA
Cedric matches TCTG
Draco matches AGATC
Draco matches TTTTTTCT
Draco matches AATG
Draco matches TCTAG
Draco matches GATA
Draco matches TATC
Draco matches GAAA
Draco matches TCTG
Fred matches AGATC
Fred matches TTTTTTCT
Fred matches AATG
Fred matches TCTAG
Fred matches GATA
Fred matches TATC
Fred matches GAAA
Fred matches TCTG
Ginny matches AGATC
Ginny matches TTTTTTCT
Ginny matches AATG
Ginny matches TCTAG
Ginny matches GATA
Ginny matches TATC
Ginny matches GAAA
Ginny matches TCTG
Hagrid matches AGATC
Hagrid matches TTTTTTCT
Hagrid matches AATG

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM