I have a list of lists of the following form:
list = [ [['item1'], ['property1','property2','property3']], [['item2'],['property1', 'property4']], ..., [['itemN'],['property5']]]
I want to construct another list of lists with all the above items grouped together with those they share at least one property with. Eg:
new_list = [['item1','item2'], .., ['itemN']]
Note that items should be grouped together even if they share properties indirectly. If eg item1 has a common property with item2 that has a common property with item3 and item1 does not share any properties with item3 they should be still all grouped together.
My try has been with an added boolean to the original list (so that I do not re-iterate if not needed) and with the function below:
list = [ [['item1'], ['property1','property2','property3'], True], [['item2'],['property1', 'property4'], True], [['itemN'],['property5'], True]]
def group_correlates(list):
result = []
for i, entry in enumerate(list):
correlates = []
items = entry[0]
properties = entry[1]
if entry[2]: # if not already grouped (True)
correlates.append(items)
for j, other_entry in enumerate(list):
flag = other_entry[2]
if not i == j:
if flag:
other_properties = other_entry[1]
other_items = other_entry[0]
for property in properties:
if property in other_properties:
other_entry[2] = False # do not visit again
correlates.append(other_items)
result.append(correlates)
return result
but I get this:
[[['item1'], ['item2']], [['item1']]]
An even if I could do it this way, I am sure there is a much more elegant way to accomplish the same
Why not using a dict
then using groupby
from itertools
module ?
This is an example of how you can do it:
from itertools import groupby
data = [[['item1'], ['property1','property2','property3']], [['item2'],['property1', 'property4']], [['itemN'],['property5']]]
aa = {}
for k, v in data:
for j in v:
try:
aa[j] += k
except KeyError:
aa[j] = k
new_list = [k for k,_ in groupby(sorted(aa.values()), lambda x: x)]
print(new_list)
Or, you can use defaultdict
from collections
module:
from collections import defaultdict
from itertools import groupby
data = [[['item1'], ['property1','property2','property3']], [['item2'],['property1', 'property4']], [['itemN'],['property5']]]
bb = defaultdict(None)
for k, v in data:
for j in v:
bb[j] = k
new_list = [k for k,_ in groupby(sorted(bb.values()), lambda x: x)]
print(new_list)
Both will output:
[['item1', 'item2'], ['item2'], ['itemN']]
First convert your list to a dictionary as mentioned.
list1 = [ [['item1'], ['property1','property2','property3']],
[['item2'], ['property1', 'property4']],
[['item3'], ['property5', 'property6']]
]
dict1 = {item[0][0]: item[1] for item in list1}
Then:
new_list = []
for key in dict1:
target = dict1[key]
for k, v in dict1.items():
if k != key and len(set(target).intersection(set(v))) != 0:
new_list.append([key, k])
new_list = [sorted(i) for i in new_list] # sort sublists
new_list = [list(t) for t in set(map(tuple, new_list))] # remove dupes
flat = [item for sublist in new_list for item in sublist] # flatten list
unique = list(set(dict1.keys()).difference(set(flat)))
new_list.append(unique) # add unique keys
new_list
Out[76]: [['item1', 'item2'], ['item3']]
"Bipartite" is mainly nomenclature. The main point is that you're finding sub-graphs that are connected.
Put all of your nested lists into an "open" list ... you need to process everything in this list. When it's empty, you're done. Start a new list of sub-graphs -- this is the "list of lists" you mentioned.
Initialize an item list and a properties list to empty lists.
Pick an item and put it into a sub-graph list. Now, alternate between properties and items until nothing gets added:
At this point, the items list and properties list described a closed sub-graph. Add the pair to your master list of sub-graphs.
Go back, reset your items and properties lists to empty lists, and start with a new initial item. Continue this until you've exhausted all items. The "open" list is now empty; all items and properties are now represented in the sub-graphs list.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.