I'm working on pset6, DNA problem. This code is working for small.cvs but when I try the large one it overestimates the STR count. I guess the problem is when it tries to compare strings. But still don't know how to fix it. I checked that the counting is correct for the "TTTTTTCT" sequence but for the remaining STRs, the counting is in all cases larger than it should.
import sys
import csv
def main():
while (len(sys.argv) != 3):
print ("ERROR. Usage: python dna.py data.csv sequence.txt")
break
list_str = {}
#load the STRs to analyse
with open(sys.argv[1]) as csvfile:
readcsv = csv.reader (csvfile)
ncol = len(next(readcsv))
csvfile.seek(0)
header = list()
for line in readcsv:
a = sum(1 for line in readcsv)
for i in range(ncol):
list_str[line[i]] = 0
header.insert (i, line [i])
print (f"{header[i]}")
#open an work with the sequence file
sequence = open(sys.argv[2], 'r')
seq_r = sequence.read()
for k in list_str.keys():
#print (f"keu {k}")
p = 0
seq = len(seq_r)
while p < seq:
if seq_r[p:(p + len(k))] == k:
list_str[k] += 1
p += len(k)
else: p += 1
#print (f" sequenci encontrada{list_str[k]} y {k}")
print (f"nro de {k} {list_str[k]}")
with open(sys.argv[1]) as csvfile:
readcsv = csv.reader (csvfile)
next(csvfile)
find = False
for row in readcsv:
for j in range(1,ncol):
#print(f"header :{header[j]}")
if int(row [j]) == int(list_str[header[j]]):
print (f"row {row[j]} list {list_str[header[j]]}")
find = True
else:
find = False
break
if find == True: print (f"{row [0]}")
main()
The same thing happened to me, and then I saw the specifications of the pset.
We need to find the " longest run of consecutive repeats of the STR ". Not the total count of STRs. It works for the small.csv as in my case too, so try to search for the longest consecutive occurrences of the specific STR.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.