简体   繁体   English

在Python中获取两个列表的交集

[英]Getting intersection of two lists in python

I have two lists of genes that i'm analyzing. 我有两个要分析的基因清单。 Essentially I want to sort the elements of these lists much in the same way as a Venn diagram, ie elements that only occur in list 1 are placed in one list, those only in list 2 are in another and those occurring in both are in a third. 本质上,我想对这些列表中的元素进行排序的方式与Venn图大致相同,即仅将出现在列表1中的元素放在一个列表中,将仅出现在列表2中的元素放在另一个列表中,将同时出现在两个列表中的元素放在一个列表中。第三。

My code so far: 到目前为止,我的代码:

from Identify_Gene import Retrieve_Data #custom class
import argparse
import os

#enable use from command line
parser = argparse.ArgumentParser(description='''\n\nFind the intersection between two lists of genes\n ''')
parser.add_argument('filename1',help='first list of genes to compare')
parser.add_argument('filename2',help='second list of genes to compare')
parser.add_argument('--output_path',help='provide an output filename')
args = parser.parse_args()

os.chdir(args.output_path)

a = Retrieve_Data() # custom class, simply produces a python list
list1 = a.parse_gene_list(args.filename1)
list2 = a.parse_gene_list(args.filename2)

intersection = []
list1_only = []
list2_only = []
if len(list1)>len(list2):
    for i in range(0,len(list1)):
        if list1[i] in list2:
            intersection.append(list1[i])
        else:
            list1_only.append(list1[i])
    for i in range(0,len(list2)):
        if list2[i] not in list1:
            list2_only.append(list2[i])
else:
    for i in range(0,len(list2)):
        if list2[i] in list1:
            intersection.append(list2[i])
        else:
            list2_only.append(list2[i])
    for i in range(0,len(list1)):
        if list1[i] not in list2:
            list1_only.append(list2[i])




filenames = {}
filenames['filename1'] = 'list1_only.txt'
filenames['filename2'] = 'list2_only.txt'
filenames['intersection'] = 'intersection.txt'                

with open(filenames['filename1'],'w') as f:
    for i in range(0,len(list1_only)):
        f.write(list1_only[i]+'\n')

with open(filenames['filename2'],'w') as f:
    for i in range(0,len(list2_only)):
        f.write(list2_only[i]+'\n')

with open(filenames['intersection'],'w') as f:
    for i in range(0,len(intersection)):
        f.write(intersection[i]+'\n')

This program currently gives me two identical lists as list1_only and list2_only where they should be mutually exclusive. 该程序当前为我提供了两个相同的列表,分别为list1_only和list2_only,它们应该互斥。 The intersection file produced is different, though i don't feel it can be trusted since the other two lists are not behaving as expected. 生成的相交文件是不同的,尽管我不认为它可以被信任,因为其他两个列表的行为不符合预期。

I have been informed (since posting this question) that this operation can easily be done via the python 'Sets' module however, for educational purposes, i'd still quite like to fix this program 我已经被告知(因为发布了这个问题),可以通过python'Sets'模块轻松完成此操作,但是,出于教育目的,我还是很想修复此程序

There is a bug in the construction of the lists. 列表的构造中存在错误。

In the section: 在此部分中:

for i in range(0,len(list1)):
    if list1[i] not in list2:
        list1_only.append(list2[i])

the last line should be: 最后一行应该是:

        list1_only.append(list1[i])

You might also want to checkout this handy website: 您可能还想签出这个方便的网站:

http://jura.wi.mit.edu/bioc/tools/compare.php http://jura.wi.mit.edu/bioc/tools/compare.php

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM