简体   繁体   中英

group correlate items in python list of lists

I have a list of lists of the following form:

list = [ [['item1'], ['property1','property2','property3']], [['item2'],['property1', 'property4']], ..., [['itemN'],['property5']]]

I want to construct another list of lists with all the above items grouped together with those they share at least one property with. Eg:

new_list = [['item1','item2'], .., ['itemN']]

Note that items should be grouped together even if they share properties indirectly. If eg item1 has a common property with item2 that has a common property with item3 and item1 does not share any properties with item3 they should be still all grouped together.

My try has been with an added boolean to the original list (so that I do not re-iterate if not needed) and with the function below:

list = [ [['item1'], ['property1','property2','property3'], True], [['item2'],['property1', 'property4'], True], [['itemN'],['property5'], True]]

def group_correlates(list):
    result = []
    for i, entry in enumerate(list):
        correlates = []
        items = entry[0]
        properties = entry[1]
        if entry[2]: # if not already grouped (True)
            correlates.append(items)
        for j, other_entry in enumerate(list):
            flag = other_entry[2]
            if not i == j:
                if flag:
                    other_properties = other_entry[1]
                    other_items = other_entry[0]
                    for property in properties:
                        if property in other_properties:
                            other_entry[2] = False # do not visit again
                            correlates.append(other_items)
                            result.append(correlates)
    return result

but I get this:

[[['item1'], ['item2']], [['item1']]]

An even if I could do it this way, I am sure there is a much more elegant way to accomplish the same

Why not using a dict then using groupby from itertools module ?

This is an example of how you can do it:

from itertools import groupby

data = [[['item1'], ['property1','property2','property3']], [['item2'],['property1', 'property4']], [['itemN'],['property5']]]

aa = {}
for k, v in data:
    for j in v:
        try:
            aa[j] += k
        except KeyError:
            aa[j] = k


new_list = [k for k,_ in groupby(sorted(aa.values()), lambda x: x)]
print(new_list)

Or, you can use defaultdict from collections module:

from collections import defaultdict
from itertools import groupby

data = [[['item1'], ['property1','property2','property3']], [['item2'],['property1', 'property4']], [['itemN'],['property5']]]

bb = defaultdict(None)

for k, v in data:
    for j in v:
        bb[j] = k


new_list = [k for k,_ in groupby(sorted(bb.values()), lambda x: x)]
print(new_list) 

Both will output:

[['item1', 'item2'], ['item2'], ['itemN']]

First convert your list to a dictionary as mentioned.

list1 = [ [['item1'], ['property1','property2','property3']], 
          [['item2'], ['property1', 'property4']],
          [['item3'], ['property5', 'property6']]
        ]

dict1 = {item[0][0]: item[1] for item in list1}

Then:

new_list = []

for key in dict1:
    target = dict1[key]
    for k, v in dict1.items():
        if k != key and len(set(target).intersection(set(v))) != 0:
            new_list.append([key, k])
    new_list = [sorted(i) for i in new_list] # sort sublists
    new_list = [list(t) for t in set(map(tuple, new_list))] # remove dupes

flat = [item for sublist in new_list for item in sublist] # flatten list
unique = list(set(dict1.keys()).difference(set(flat)))
new_list.append(unique) # add unique keys

new_list
Out[76]: [['item1', 'item2'], ['item3']]

"Bipartite" is mainly nomenclature. The main point is that you're finding sub-graphs that are connected.

Put all of your nested lists into an "open" list ... you need to process everything in this list. When it's empty, you're done. Start a new list of sub-graphs -- this is the "list of lists" you mentioned.

Initialize an item list and a properties list to empty lists.

Pick an item and put it into a sub-graph list. Now, alternate between properties and items until nothing gets added:

  1. For each new item you just added (only that initial one the first time), add (to the properties list) all properties of that item. Keep a list of which properties are new to the properties list.
  2. Delete those items (and their properties) from your "open" list.
  3. For each property just added, add (to the items list) each item that has that property. Keep a list of the items just added.
  4. Repeat steps 1-3 until nothing new is added.

At this point, the items list and properties list described a closed sub-graph. Add the pair to your master list of sub-graphs.

Go back, reset your items and properties lists to empty lists, and start with a new initial item. Continue this until you've exhausted all items. The "open" list is now empty; all items and properties are now represented in the sub-graphs list.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM