About CS50 Pset6 DNA, it overcounts STR for large.cvs

Question

I'm working on pset6, DNA problem. This code is working for small.cvs but when I try the large one it overestimates the STR count. I guess the problem is when it tries to compare strings. But still don't know how to fix it. I checked that the counting is correct for the "TTTTTTCT" sequence but for the remaining STRs, the counting is in all cases larger than it should.

import sys
import csv

def main():
    while (len(sys.argv) != 3):
        print ("ERROR. Usage: python dna.py data.csv sequence.txt")
        break

    list_str = {}

#load the STRs to analyse
    with open(sys.argv[1]) as csvfile:
        readcsv = csv.reader (csvfile)
        ncol = len(next(readcsv))
        csvfile.seek(0)
        header = list()

        for line in readcsv:
            a = sum(1 for line in readcsv)
        for i in range(ncol):
            list_str[line[i]] = 0
            header.insert (i, line [i])
            print (f"{header[i]}")

#open an work with the sequence file
    sequence = open(sys.argv[2], 'r')
    seq_r = sequence.read()

    for k in list_str.keys():
        #print (f"keu {k}")
        p = 0
        seq = len(seq_r)

        while p < seq:
            if seq_r[p:(p + len(k))] == k: 
                list_str[k] += 1
                p += len(k) 
            else: p += 1
                #print (f" sequenci encontrada{list_str[k]} y {k}")

        print (f"nro de {k} {list_str[k]}")

    with open(sys.argv[1]) as csvfile:
        readcsv = csv.reader (csvfile)
        next(csvfile)

        find = False

        for row in readcsv:
            for j in range(1,ncol):
                #print(f"header :{header[j]}")
                if int(row [j]) == int(list_str[header[j]]): 
                    print (f"row {row[j]} list {list_str[header[j]]}")
                    find = True
                else: 
                    find = False
                    break

            if find == True: print (f"{row [0]}")
main()

Answer 1

The same thing happened to me, and then I saw the specifications of the pset.

We need to find the " longest run of consecutive repeats of the STR ". Not the total count of STRs. It works for the small.csv as in my case too, so try to search for the longest consecutive occurrences of the specific STR.

About CS50 Pset6 DNA, it overcounts STR for large.cvs

Question

1 answers

solution1
0 ACCPTED 2020-06-24 16:44:20

About CS50 Pset6 DNA, it overcounts STR for large.cvs

Question

1 answers

solution1 0 ACCPTED 2020-06-24 16:44:20

solution1
0 ACCPTED 2020-06-24 16:44:20