繁体   English   中英

比较非常长的列表之间的元素的有效方法

[英]efficient way to compare elements between very long lists

我在文件中有两个列表:文件1有200000行,看起来像

MAP2K4  FLNC
MYPN    ACTN2
ACVR1   FNTA
UGT2A1  HPGDS
RPA2    STAT3
ARF1    GGA3
ARF3    ARFIP2
ARF3    ARFIP1
AKR1A1  EXOSC4
RPA2    GAS7
APP APPBP2
APLP1   DAB1
CITED2  TFAP2A
EP300   TFAP2A
APOB    MTTP
ARRB2   RALGDS
ARRB2   ZNF807

文件2有700000行,看起来像:

MAP2K4  FLNC
MAP2K4  rs10036867
MAP2K4  ACTN2
MAP2K4  TEP1
ACTN2   MYPN
UGT2A1  NDUFAF6
RPA2    rs10109257
RPA2    rs10151961
GAS7    RPA2
APOB    PDZRN4
APOB    BICD1
ARRB2   ZNF807
ARRB2   FAM107B

无论元素的顺序如何,我都需要获取这两个列表之间的匹配行。 例如,在上面的示例中,它应如下所示:

MAP2K4  FLNC
ACTN2   MYPN
RPA2    GAS7
ARRB2   ZNF807

我写了以下内容,但这似乎是永远的!

col0_file1 = []
col1_file1 = []
col0_file2 = []
col1_file2 = []
with open('File1') as f1, open('File2') as f2:
    for line in f1:
        col0,col1 = line.split()
        col0_file1.append(col0)
        col1_file1.append(col1)
    for line in f2:
        col0,col1 = line.split()
        col0_file2.append(col0)
        col1_file2.append(col1)

result = []
for x in range(len(col0_file1)):
    for i, j in map(None, col0_file2, col1_file2):
        if i == col0_file1[x] and j == col1_file1[x]:
            result.append([i,j])
        elif j == col0_file1[x] and i == col1_file[x]:
            result.append([i,j])

with open('matching', 'w') as out:
    for elem in result:
        out.write('{a} \n'.format(a = '\t'.join(elem)))

有什么办法可以简化复杂性? 还是更好的方法呢?

我说, set两个set并求交:

with open('File1') as f1, open('File2') as f2:
    columns_a = set(tuple(sorted(l.split())) for l in f1)
    columns_b = set(tuple(sorted(l.split())) for l in f2)

with open('matching', 'w') as out:
    for elem in columns_a  & columns_b:
        out.write('{a} \n'.format(a = '\t'.join(elem)))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM