Python：比較兩個文件中的數字

Question

經過一段很長的間隔之后，我又回到了python上，現在我正努力完成一個簡單的任務，將文件A中的數字與文件B中的所有數字進行比較，循環遍歷文件A以在每一行上執行每個數字。 該數字是文件A在第2列中（用\\ t分隔），並且要返回的這些數字必須大於exonStart（文件B的第4列），而小於exonStop（文件B的第5列）。 最終，我想將這些行（文件A的完整行附加到與該參數匹配的文件B上）寫入一個新文件。

fileA (trimmed for relevant info and truncated):
    1       10678   12641
    1       14810   14929 
    1       14870   14969  

fileB (trimmed for relevant info and truncated):
    1       processed_transcript    exon    10000   12000  2
    1       processed_transcript    exon    10500   12000  2
    1       processed_transcript    exon    12613   12721  3     
    1       processed_transcript    exon    14821   14899  4

我的代碼嘗試對代碼進行更詳細的解釋。

f = open('fileA')
f2 =open('fileB')

for line in f:
    splitLine= line.split("\t")
    ReadStart= int(splitLine[1])
    print ReadStart
    for line2 in f2:
        splitLine2=line2.split("\t")
        ExonStart = int(splitLine2[3])
        ExonStop = int(splitLine2[4])
        if ReadStart < ExonStop and ReadStart > ExonStart:
            print ReadStart, ExonStart, ExonStop
        else:
            print "BOO"   
f.close()

我期望的是（來自我的代碼）：第一個col是文件B的ReadStart，接下來的兩個是文件A的ReadStart。

    10678   10000   12000
    10678   10500   12000
    14870   14821   14899

我的代碼只會返回第一行。

Answer 1

也許問題出在這里：

splitLine2=line.split("\t")

如果您使用的是文件2，

splitLine2=line2.split("\t")

Answer 2

問題是您的文件指針。 你在你的代碼的頂部打開文件B，然后遍歷所有的方式，通過它，而處理從A文件的第一行這意味着，在你的外循環的第一次迭代結束后，您的文件指針指向現在指出文件B.在下一迭代的結束，沒有更多的行從文件B讀取，因為指針是在文件的結尾，所以內循環被跳過。

一種選擇是在外部循環末尾使用文件B上的seek函數將文件指針重置為文件頂部：

f2.seek(0)

但是，我建議您更改方法，改為將文件B讀入內存，因此您不會一遍又一遍地讀取文件：

# use context managers to open your files instead of file pointers for
# cleaner exception handling
with open('f2.txt') as f2:

    exon_points = []

    for line in f2:
        split_line = line.split() # notice that the split function will split on
                                  # whitespace by default, so "\t" is not necessary

        # append a tuple of the information we care about to the list
        exon_points.append(((int(split_line[3]), int(split_line[4]))))

with open('f1.txt') as f1:

    for line in f1:
        read_start = int(line.split()[1])  

        for exon_start, exon_stop in zip(exon_starts, exon_stops):

            if read_start < exon_stop and read_start > exon_start:
                print("{} {} {}".format(read_start, exon_start, exon_stop))

             else:
                 print("BOO")

輸出：

10678 10000 12000
10678 10500 12000
BOO
BOO
BOO
BOO
BOO
14830 14821 14899
BOO
BOO
BOO
14870 14821 14899

Python：比較兩個文件中的數字

問題描述

2 個解決方案

解決方案1
2 2015-10-21 21:50:26

解決方案2
2 2015-10-21 22:25:39

Python：比較兩個文件中的數字

問題描述

2 個解決方案

解決方案1 2 2015-10-21 21:50:26

解決方案2 2 2015-10-21 22:25:39

解決方案1
2 2015-10-21 21:50:26

解決方案2
2 2015-10-21 22:25:39