简体   繁体   中英

Comparing elements from different lists using python

I have generated two multi-component lists with the following script:

list1 = list()
for line in infile1.readlines():
    list1.append(line.split('\t'))

list2 = list()
for line in infile2.readlines():
    list2.append(line.split(‘\t’))

The lists look like this:

list1 = ('1960', 'chr17', '+', 'RNF213'), ('1963', 'chr16', '+', 'SF3B3'), ('1964', 'chr4', '-', 'GPRIN3')...

list2 = ('1482', 'miR-K12-1'), ('1018', 'miR-K12-4-5p'), ('1960', 'miR-K12-12')...

The first element from the first entry in list1 (in this case "1960") will match the first element of one or more entries in list2. What I would like to do is locate each match and then add the last element of the list2 entry to the list1 entry. An example of the desired output would be:

('1960', 'chr17', '+', 'RNF213', 'miR-K12-12')

I have tried this, but it returns nothing:

result = []
for list1[0] in list1:
    if list1[0] == list2[0]:
        result.append((list1[0:], list2[1]))

Put the values from list 2 into a dictionary; each unique value in the first column pointing to a list of values from the second column. Because you have tab-separated values, you should really use the csv module here:

import csv

lines2 = {}

with open(filename2, 'rb') as infile2:
    reader = csv.reader(infile2, delimiter='\t')
    for row in reader:
        lines2.setdefault(row[0], []).append(row[1])

dict.setdefault() sets a default value (a list object here) if the key is not yet present in the dictionary. This allows us to append to an empty list for the first value, then subsequently to the already-existing list for the rest.

Now you can trivially look up matching lines when processing the other file:

with open(filename1, 'rb') as infile1:
    reader = csv.reader(infile1, delimiter='\t')
    for row in reader:
        row += lines2.get(row[0], [])
        print row

Demo:

>>> import csv
>>> list1 = ['\t'.join(r) for r in [('1960', 'chr17', '+', 'RNF213'), ('1963', 'chr16', '+', 'SF3B3'), ('1964', 'chr4', '-', 'GPRIN3')]]
>>> list2 = ['\t'.join(r) for r in [('1482', 'miR-K12-1'), ('1018', 'miR-K12-4-5p'), ('1960', 'miR-K12-12')]]
>>> lines2 = {}
>>> reader = csv.reader(list2, delimiter='\t')
>>> for row in reader:
...     lines2.setdefault(row[0], []).append(row[1])
... 
>>> lines2
{'1482': ['miR-K12-1'], '1960': ['miR-K12-12'], '1018': ['miR-K12-4-5p']}
>>> reader = csv.reader(list1, delimiter='\t')
>>> for row in reader:
...     row += lines2.get(row[0], [])
...     print row
... 
['1960', 'chr17', '+', 'RNF213', 'miR-K12-12']
['1963', 'chr16', '+', 'SF3B3']
['1964', 'chr4', '-', 'GPRIN3']

EDIT: Don't use this method. I'm leaving it up though because someone else might be able to learn from @Martijn's comments.

list1 = [('1960', 'chr17', '+', 'RNF213'), ('1963', 'chr16', '+', 'SF3B3'), ('1964', 'chr4', '-', 'GPRIN3')]
list2 = [('1482', 'miR-K12-1'), ('1018', 'miR-K12-4-5p'), ('1960', 'miR-K12-12')]

results = []
for x in list1:
    for y in list2:
        if x[0] == y[0]:
            results.append( x + (y[-1], ))
print results
>>>
[('1960', 'chr17', '+', 'RNF213', 'miR-K12-12')]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM