Say I have for example the following nested list:
L = [['John','Sayyed'], ['John' , 'Simon'] ,['bush','trump'],
['Sam','Suri','NewYork'],['Suri','Orlando','Canada']]
How can I group these sublists, by getting the union of sublists which have a common element with at least another sublist within the group? So for the previous example, the result should be:
[['John','Sayyed','Simon'] ,['bush','trump'],
['Sam','Suri','NewYork','Orlando','Canada']]
Thus, the first two sublists are joined as they share 'John'
. Could someone please share their valuable thoughts?
In many cases, modeling a problem as a graph, can make make fairly complicated tasks much easier. In this case, what we'd be looking for from a graph theory point of view, are the connected components of the graph.
So a simple way to go about this, is to generate a graph with NetworkX , and add your list as the graph edges using add_edges_from
. Then use connected_components
, which will precisely give you a list of sets of the connected components in the graph:
import networkx as nx
L = [['John','Sayyed'], ['John' , 'Simon'] ,['bush','trump']]
G=nx.Graph()
G.add_edges_from(L)
list(nx.connected_components(G))
[{'John', 'Sayyed', 'Simon'}, {'bush', 'trump'}]
In the case of having sublists with more than 2
elements, you can add them as paths instead of nodes using nx.add_path
, since they can connect multiple nodes:
L = [['John','Sayyed'], ['John' , 'Simon'] ,['bush','trump'],
['Sam','Suri','NewYork'],['Suri','Orlando','Canada']]
G=nx.Graph()
for l in L:
nx.add_path(G, l)
list(nx.connected_components(G))
[{'John', 'Sayyed', 'Simon'},
{'bush', 'trump'},
{'Canada', 'NewYork', 'Orlando', 'Sam', 'Suri'}]
We can also vivisualize these connected components with nx.draw
:
pos = nx.spring_layout(G, scale=20, k=2/np.sqrt(G.order()))
nx.draw(G, pos, node_color='lightgreen', node_size=1000, with_labels=True)
More detailed explanation on connected components :
In graph theory, a connected component (or just component) of an undirected graph is a subgraph in which any two vertices are connected to each other by paths, and which is connected to no additional vertices in the supergraph
So essentially, this code creates a graph, with edges from the list, where each edge is composed by two values u,v
where u
and v
will be nodes connected by this edge.
And hence, the union of sublists with at least one sublist with a common element can be translated into a Graph Theory problem as all nodes that are reachable between each other through the existing paths.
A simple approach
L = [['John','Sayyed'], [ 'John' , 'Simon'] ,['bush','trump']]
L[0].extend([x for x in L[1] if x not in L[0]])
L.pop(1)
print(L)
See
You can use the function connected_components
in networkx
:
import networkx as nx
L = [['John','Sayyed'], ['John' , 'Simon'] ,['bush','trump'],
['Sam','Suri','NewYork'],['Suri','Orlando','Canada']]
G = nx.Graph()
for i in L:
G.add_path(i)
lst = list(nx.connected_components(G))
print(lst)
Output:
[{'John', 'Sayyed', 'Simon'},
{'bush', 'trump'},
{'Canada', 'NewYork', 'Orlando', 'Sam', 'Suri'}]
If order is important and the list are large, you can use this two pronged method:
l = [['john', 'sayyid'], ['john', 'simon'], ['b', 't']]
def join(l1, l2):
mset = set(l1)
result = l1[:] # deep copy
for each in l2:
if each in mset:
continue
else:
result.append(each)
return result
To merge within the master list, you can just call the list by their rank and pop the original list:
l1 = l.pop(0)
l2 = l.pop(0)
l.insert(0, join(l1, l2))
>>> l:
[['john', 'sayyid', 'simon'], ['b', 't']]
To merge 2 lists:
merge = lambda l1, l2: l1 + [ x for x in l2 if x not in l1 ]
To be more efficient, create a set
on l1
;
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.