简体   繁体   中英

Merge tuples having atleast one common element to form a common tuple

I have a list of tuples like this

l = [('hin1','ie2',2),('hin1','ie3',2),('hin4','ie5',2),('hin6','ie22',2),('hin1','ie32',2),('hin31','ie2',2),('hin61','ie62',2)]

I want to merge the tuples which share atleast one common element between the two

Hence - two tuples like this :

('hin1','ie2',2),('hin1','ie3',2) should result in 
(('hin1','ie2', 'ie3') 

For the above list l, my final output should be like this

output - [(hin1,ie2,ie3,ie32,hin31),(hin4,ie5),(hin6,ie22),(hin61,ie62)]

Note - The 3rd element of every tuple can be ignored

Any starting points?

This is a network analysis problem. You can use igraph package if you do not intend to write your own algorithm to solve it:

import igraph
​
# build the graph object
g = igraph.Graph()
edges, vertices = set(), set()
for e in l:
    vertices.update(e[:2])
    edges.add(e[:2])

g.add_vertices(list(vertices))
g.add_edges(list(edges))
​
# decompose the graph into sub graphs based on vertices connection
[[v['name'] for v in sg.vs()] for sg in g.decompose(mode="weak")]

#[['ie2', 'hin1', 'ie32', 'hin31', 'ie3'],
# ['hin6', 'ie22'],
# ['hin61', 'ie62'],
# ['hin4', 'ie5']]

I've written a start here, not the entire function. But take a look at the approach and I think you can extrapolate from it. It returns the correct output when there's only an overlap of 2 which satisfies all the options except the 'hin1' which has 4 different overlaps. If you repeat the general concept and tweak a little, I think you can figure it out!

tuples_list = [('hin1','ie2',2),('hin1','ie3',2),('hin4','ie5',2),('hin6','ie22',2),('hin1','ie32',2),('hin31','ie2',2),('hin61','ie62',2)]

for tuple in tuples_list:
    for tup in tuples_list:
        if tuple[0] == tup[0]:
            new_tup = (tuple[0],tuple[1],tup[1])
            print new_tup

This returns this list:

('hin1', 'ie2', 'ie2')
('hin1', 'ie2', 'ie3')
('hin1', 'ie2', 'ie32')
('hin1', 'ie3', 'ie2')
('hin1', 'ie3', 'ie3')
('hin1', 'ie3', 'ie32')
('hin4', 'ie5', 'ie5')
('hin6', 'ie22', 'ie22')
('hin1', 'ie32', 'ie2')
('hin1', 'ie32', 'ie3')
('hin1', 'ie32', 'ie32')
('hin31', 'ie2', 'ie2')
('hin61', 'ie62', 'ie62')

Which you can then use as input to your second go-through. This is just one approach. I'm sure there are more elegant solutions out there but I hope it's a decent start for you!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM