简体   繁体   中英

Getting intersection of two lists in python

I have two lists of genes that i'm analyzing. Essentially I want to sort the elements of these lists much in the same way as a Venn diagram, ie elements that only occur in list 1 are placed in one list, those only in list 2 are in another and those occurring in both are in a third.

My code so far:

from Identify_Gene import Retrieve_Data #custom class
import argparse
import os

#enable use from command line
parser = argparse.ArgumentParser(description='''\n\nFind the intersection between two lists of genes\n ''')
parser.add_argument('filename1',help='first list of genes to compare')
parser.add_argument('filename2',help='second list of genes to compare')
parser.add_argument('--output_path',help='provide an output filename')
args = parser.parse_args()

os.chdir(args.output_path)

a = Retrieve_Data() # custom class, simply produces a python list
list1 = a.parse_gene_list(args.filename1)
list2 = a.parse_gene_list(args.filename2)

intersection = []
list1_only = []
list2_only = []
if len(list1)>len(list2):
    for i in range(0,len(list1)):
        if list1[i] in list2:
            intersection.append(list1[i])
        else:
            list1_only.append(list1[i])
    for i in range(0,len(list2)):
        if list2[i] not in list1:
            list2_only.append(list2[i])
else:
    for i in range(0,len(list2)):
        if list2[i] in list1:
            intersection.append(list2[i])
        else:
            list2_only.append(list2[i])
    for i in range(0,len(list1)):
        if list1[i] not in list2:
            list1_only.append(list2[i])




filenames = {}
filenames['filename1'] = 'list1_only.txt'
filenames['filename2'] = 'list2_only.txt'
filenames['intersection'] = 'intersection.txt'                

with open(filenames['filename1'],'w') as f:
    for i in range(0,len(list1_only)):
        f.write(list1_only[i]+'\n')

with open(filenames['filename2'],'w') as f:
    for i in range(0,len(list2_only)):
        f.write(list2_only[i]+'\n')

with open(filenames['intersection'],'w') as f:
    for i in range(0,len(intersection)):
        f.write(intersection[i]+'\n')

This program currently gives me two identical lists as list1_only and list2_only where they should be mutually exclusive. The intersection file produced is different, though i don't feel it can be trusted since the other two lists are not behaving as expected.

I have been informed (since posting this question) that this operation can easily be done via the python 'Sets' module however, for educational purposes, i'd still quite like to fix this program

There is a bug in the construction of the lists.

In the section:

for i in range(0,len(list1)):
    if list1[i] not in list2:
        list1_only.append(list2[i])

the last line should be:

        list1_only.append(list1[i])

You might also want to checkout this handy website:

http://jura.wi.mit.edu/bioc/tools/compare.php

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM