简体   繁体   中英

Python: Comparing numbers in two files

I have come back to python after a far too long of a break from it and now am struggling to do a simple task of comparing number from file A to all numbers in file B, looping through file A to do each number on each line. The number is file A are in column 2 (split by \\t) and these number to be returned must be greater then the exonStart (column 4 of file B) and less then exonStop (column 5 of file B). Eventually I want to write the lines (complete line of file A appended to the lines is File B that match that argument) to a new file.

fileA (trimmed for relevant info and truncated):
    1       10678   12641
    1       14810   14929 
    1       14870   14969  

fileB (trimmed for relevant info and truncated):
    1       processed_transcript    exon    10000   12000  2
    1       processed_transcript    exon    10500   12000  2
    1       processed_transcript    exon    12613   12721  3     
    1       processed_transcript    exon    14821   14899  4

My code attempt at the code my explain it in more detail.

f = open('fileA')
f2 =open('fileB')

for line in f:
    splitLine= line.split("\t")
    ReadStart= int(splitLine[1])
    print ReadStart
    for line2 in f2:
        splitLine2=line2.split("\t")
        ExonStart = int(splitLine2[3])
        ExonStop = int(splitLine2[4])
        if ReadStart < ExonStop and ReadStart > ExonStart:
            print ReadStart, ExonStart, ExonStop
        else:
            print "BOO"   
f.close()

What I expect is (from my code): Where the first col is ReadStart from file B and the next two are from file A

    10678   10000   12000
    10678   10500   12000
    14870   14821   14899

My code will only return the first line.

Maybe the problem is here:

splitLine2=line.split("\t")

If you are using file 2, it would be

splitLine2=line2.split("\t")

The problem is your file pointer. You open file B at the top of your code, then iterate all the way through it while handling the first line from file A. That means that at the end of the first iteration of your outer loop, your file pointer is now pointed at the end of file B. On the next iteration, there are no more lines to read from file B because the pointer is at the end of the file, so the inner loop is skipped.

One option is to use the seek function on file B at the end of the outer loop to reset the file pointer to the top of the file:

f2.seek(0)

However, I would advocate you change your approach and read file B into memory instead, so you're not reading a file over and over again:

# use context managers to open your files instead of file pointers for
# cleaner exception handling
with open('f2.txt') as f2:

    exon_points = []

    for line in f2:
        split_line = line.split() # notice that the split function will split on
                                  # whitespace by default, so "\t" is not necessary

        # append a tuple of the information we care about to the list
        exon_points.append(((int(split_line[3]), int(split_line[4]))))

with open('f1.txt') as f1:

    for line in f1:
        read_start = int(line.split()[1])  

        for exon_start, exon_stop in zip(exon_starts, exon_stops):

            if read_start < exon_stop and read_start > exon_start:
                print("{} {} {}".format(read_start, exon_start, exon_stop))

             else:
                 print("BOO")

Output:

10678 10000 12000
10678 10500 12000
BOO
BOO
BOO
BOO
BOO
14830 14821 14899
BOO
BOO
BOO
14870 14821 14899

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM