I have come back to python after a far too long of a break from it and now am struggling to do a simple task of comparing number from file A to all numbers in file B, looping through file A to do each number on each line. The number is file A are in column 2 (split by \\t) and these number to be returned must be greater then the exonStart (column 4 of file B) and less then exonStop (column 5 of file B). Eventually I want to write the lines (complete line of file A appended to the lines is File B that match that argument) to a new file.
fileA (trimmed for relevant info and truncated):
1 10678 12641
1 14810 14929
1 14870 14969
fileB (trimmed for relevant info and truncated):
1 processed_transcript exon 10000 12000 2
1 processed_transcript exon 10500 12000 2
1 processed_transcript exon 12613 12721 3
1 processed_transcript exon 14821 14899 4
My code attempt at the code my explain it in more detail.
f = open('fileA')
f2 =open('fileB')
for line in f:
splitLine= line.split("\t")
ReadStart= int(splitLine[1])
print ReadStart
for line2 in f2:
splitLine2=line2.split("\t")
ExonStart = int(splitLine2[3])
ExonStop = int(splitLine2[4])
if ReadStart < ExonStop and ReadStart > ExonStart:
print ReadStart, ExonStart, ExonStop
else:
print "BOO"
f.close()
What I expect is (from my code): Where the first col is ReadStart from file B and the next two are from file A
10678 10000 12000
10678 10500 12000
14870 14821 14899
My code will only return the first line.
Maybe the problem is here:
splitLine2=line.split("\t")
If you are using file 2, it would be
splitLine2=line2.split("\t")
The problem is your file pointer. You open file B at the top of your code, then iterate all the way through it while handling the first line from file A. That means that at the end of the first iteration of your outer loop, your file pointer is now pointed at the end of file B. On the next iteration, there are no more lines to read from file B because the pointer is at the end of the file, so the inner loop is skipped.
One option is to use the seek function on file B at the end of the outer loop to reset the file pointer to the top of the file:
f2.seek(0)
However, I would advocate you change your approach and read file B into memory instead, so you're not reading a file over and over again:
# use context managers to open your files instead of file pointers for
# cleaner exception handling
with open('f2.txt') as f2:
exon_points = []
for line in f2:
split_line = line.split() # notice that the split function will split on
# whitespace by default, so "\t" is not necessary
# append a tuple of the information we care about to the list
exon_points.append(((int(split_line[3]), int(split_line[4]))))
with open('f1.txt') as f1:
for line in f1:
read_start = int(line.split()[1])
for exon_start, exon_stop in zip(exon_starts, exon_stops):
if read_start < exon_stop and read_start > exon_start:
print("{} {} {}".format(read_start, exon_start, exon_stop))
else:
print("BOO")
Output:
10678 10000 12000
10678 10500 12000
BOO
BOO
BOO
BOO
BOO
14830 14821 14899
BOO
BOO
BOO
14870 14821 14899
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.