简体   繁体   中英

Python merge lists by common element

Im trying to merge two lists that have a common thing between them (in that case is a the id parameter). I have something like this:

list1=[(id1,host1),(id2,host2),(id1,host5),(id3,host4),(id4,host6),(id5,host8)]

list2=[(id1,IP1),(id2,IP2),(id3,IP3),(id4,IP4),(id5,IP5)]

The host is unique but the id in the list1 can be repeated like you can see. I want a output that relates the id parameter that is the common thing to both lists:

Some output like:

IP1(host1,host5), IP2(host2), IP3(host4), IP4(host6), IP5(host8)

As you can see the IP1 has two host associated.

Is there any fast way to do it?

Thank you

>>> from collections import defaultdict
>>> list1 = [('id1','host1'),('id2','host2'),('id1','host5'),('id3','host4'),('id4','host6'),('id5','host8')]
>>> list2 = [('id1','IP1'),('id2','IP2'),('id3','IP3'),('id4','IP4'),('id5','IP5')]
>>> d1 = defaultdict(list)
>>> for k,v in list1:
...     d1[k].append(v)
... 

You can print the items like this

>>> for k, s in list2:
...     print s, d1[k]
... 
IP1 ['host1', 'host5']
IP2 ['host2']
IP3 ['host4']
IP4 ['host6']
IP5 ['host8']

You can use a list comprehension to put the results into a list

>>> res = [(s, d1[k]) for k, s in list2]
>>> res
[('IP1', ['host1', 'host5']), ('IP2', ['host2']), ('IP3', ['host4']), ('IP4', ['host6']), ('IP5', ['host8'])]
  1. use collections.defaultdict to map id->ip
  2. then map id -> ip
>>> d = defaultdict(set)
>>> d['id'].add('host1')
>>> d['id'].add('host2')
>>> d['id'].add('host1')
>>> d
defaultdict(<type 'set'>, {'id': set(['host2', 'host1'])})

Maybe something like this?

#!/usr/local/cpython-3.3/bin/python

import pprint
import collections

class Host_data:
    def __init__(self, ip_address, hostnames):
        self.ip_address = ip_address
        self.hostnames = hostnames
        pass

    def __str__(self):
        return '{}({})'.format(self.ip_address, ','.join(self.hostnames))

    __repr__ = __str__

    # The python 2.x way
    def __cmp__(self, other):
        if self.ip_address < other.ip_address:
            return -1
        elif self.ip_address > other.ip_address:
            return 1
        else:
            if self.hostnames < other.hostnames:
                return -1
            elif self.hostnames > other.hostnames:
                return 1
            else:
                return 0

    # The python 3.x way
    def __lt__(self, other):
        if self.__cmp__(other) < 0:
            return True
        else:
            return False


def main():
    list1=[('id1','host1'),('id2','host2'),('id1','host5'),('id3','host4'),('id4','host6'),('id5','host8')]

    list2=[('id1','IP1'),('id2','IP2'),('id3','IP3'),('id4','IP4'),('id5','IP5')]

    keys1 = set(tuple_[0] for tuple_ in list1)
    keys2 = set(tuple_[0] for tuple_ in list2)
    keys = keys1 | keys2

    dict1 = collections.defaultdict(list)
    dict2 = {}

    for tuple_ in list1:
        id_str = tuple_[0]
        hostname = tuple_[1]
        dict1[id_str].append(hostname)

    for tuple_ in list2:
        id_str = tuple_[0]
        ip_address = tuple_[1]
        dict2[id_str] = ip_address

    result_dict = {}
    for key in keys:
        hostnames = []
        ip_address = ''
        if key in dict1:
            hostnames = dict1[key]
        if key in dict2:
            ip_address = dict2[key]
        host_data = Host_data(ip_address, hostnames)
        result_dict[key] = host_data

    pprint.pprint(result_dict)
    print('actual output:')
    values = list(result_dict.values())
    values.sort()
    print(', '.join(str(value) for value in values))

    print('desired output:')
    print('IP1(host1,host5), IP2(host2), IP3(host4), IP4(host6), IP5(host8)')


main()

Code:

list1=[('id1','host1'),('id2','host2'),('id1','host5'),('id3','host4'),('id4','host6'),('id5','host8')]
list1 = map(list,list1)
list2=[('id1','IP1'),('id2','IP2'),('id3','IP3'),('id4','IP4'),('id5','IP5')]
list2 = map(list,list2)

for item in list1:
    item += [x[1] for x in list2 if x[0]==item[0]]

list1 += [x for x in list2 if not any(i for i in list1 if x[0]==i[0])]

print list1

Ouptut:

[['id1', 'host1', 'IP1'], ['id2', 'host2', 'IP2'], ['id1', 'host5', 'IP1'], ['id3', 'host4', 'IP3'], ['id4', 'host6', 'IP4'], ['id5', 'host8', 'IP5']]  

Hope This helps :)

from collections import defaultdict
list1 = [("id1","host1"),("id2","host2"),("id1","host5"),("id3","host4"),("id4","host6"),("id5","host8")]
list2 = [("id1","IP1"),("id2","IP2"),("id3","IP3"),("id4","IP4"),("id5","IP5")]
host = defaultdict(list)
IP4id = {}
for k, v in list2:
    IP4id[v] = {"id" : k, "host" : []}

for k, v in list1:
    host[k].append(v)

for item in IP4id:
    IP4id[item]["host"] = host[IP4id[item]["id"]]
print IP4id

You'll want to go through each of the two lists of lists and add their contents to a new defaultdict with elements of type list .

This will have the effect of creating a dictionary with contents like {id1: (host1, host5), id2: host2, ...} .

You can then go through and map the id values to their corresponding IP values.

Note that in order for this to work, the id values have to be hashable . Strings, numbers, and other basic types are hashable.

If the id values are objects of a class you've defined, you can have that class inherit from the collections.Hashable abstract base class.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM