I'm looping through lines in a file to create a dict with the start/stop positions, however, am getting way too many results and I'm unsure why. It looks like every addition of the variable ref_start
and ref_end
is being added multiple times in the dictionary.
def main():
#initialize variables for counts
gb_count = 0
glimmer_count = 0
exact_count = 0
five_prime_count = 0
three_prime_count = 0
no_matches_count = 0
#protein_id list
protein_id = []
#initialize lists for start/stop coordinates
reference = []
prediction = []
#read in GeneBank file
for line in open('file'):
line = line.rstrip()
if "protein_id=" in line:
pro_id = line.split("=")
pro_id = pro_id[1].replace('"','')
protein_id.append(pro_id)
elif "CDS" in line:
if "join" in line:
continue
elif "/translation" in line:
continue
elif "P" in line:
continue
elif "complement" in line:
value = " ".join(line.split()).replace('CDS','').replace("(",'').replace(")",'').split("complement")
newValue = value[1].split("..")
ref_start = newValue[1]
ref_end = newValue[0]
gb_count += 1
else:
test = " ".join(line.split()).replace('CDS','').split("..")
ref_start = test[0]
ref_end = test[1]
gb_count += 1
reference.append({'refstart': ref_start, 'refend': ref_end})
print(reference)
I initially posted something else that was wrong, but I copied over the code and ran a dummy file and I think I figured it out. Your problem is: for line in open('file').
What it is doing (what it did for me) is loading the file up by character. Instead of 'line' = "protein_id=", you're getting 'line' = "p" then 'line' = "r", etc.
The fix is too simple. This is what I did:
file = open('file')
for line in file:
I'm not 100% on this explanation, but I think it has to do with the way python is loading the file. Since it hasn't been established as one long string, it's loading up each individual element. Once it has been made a string, it can break it down by line. Hope this helped.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.