简体   繁体   中英

python itertools groupby several attributes

I'm trying to group objects by attributes. I know how to do for one but how to include several attributes, for example:

class Node:
    def __init__(self, name1, name2, id_):
        self.name1 = name1
        self.name2 = name2
        self.id_= id_
 def __repr__(self):
        return f"<{self.id_}>{self.name1}-{self.name2}"

l = [Node('WWA', 'KATO', 1), Node('WWA', 'POZ', 2), Node('SZCZ', 'GDN', 3), Node('GDN', 'BYD', 4), Node('BIA', 'OLS', 5), Node('LUB', 'WWA',6 )]

get_attr = attrgetter("name1")
l= [list(g) for k, g in groupby(sorted(l, key=get_attr), get_attr )]

result:

[[<5>BIA-OLSZ],
 [<4>GDN-BYD],
 [<6>LUB-WWA],
 [<3>SZCZ-GDN],
 [<1>WWA-KATO, <2>WWA-POZ]]

I'd like to group by name1 or by name2 and get it this:

[[<5>BIA-OLSZ],
 [<4>GDN-BYD],<3>SZCZ-GDN],
 [<1>WWA-KATO, <2>WWA-POZ, <6>LUB-WWA]]

You wont be able do that using groupby because you'll need to have two identifiers for each group.

To have multiple keys for each group, you can use a dictionary where the list of nodes is associated to two keys (name1 and name2). Adding each node to the group corresponding to one of its names will produce a (redundant) list of groups as the dictionary's values(). You can then get the distinct (non-empty) groups out of those:

d = dict()
for n in l:
    d.setdefault(n.name1, d.setdefault(n.name2,[])).append(n) # link both names
l = [*{id(g):g for g in d.values() if g}.values()]            # distinct groups

print(l)

[[<1>WWA-KATO, <2>WWA-POZ, <6>LUB-WWA], 
 [<3>SZCZ-GDN, <4>GDN-BYD], 
 [<5>BIA-OLS]]

Note that this doesn't formally address my original question about overlapping group attributes, so if we add Node('LUB','GDN',7) to the list, it will end up in one of the groups which may or may not be where you want it to be:

l = [Node('WWA', 'KATO', 1), Node('WWA', 'POZ', 2), Node('SZCZ', 'GDN', 3), 
     Node('GDN', 'BYD', 4), Node('BIA', 'OLS', 5), Node('LUB', 'WWA',6 ), 
     Node('LUB', 'GDN',7 )]

d = dict()
for n in l:
    d.setdefault(n.name1, d.setdefault(n.name2,[])).append(n) 
l = [*{id(g):g for g in d.values() if g}.values()]            

print(l)

[[<1>WWA-KATO, <2>WWA-POZ, <6>LUB-WWA, <7>LUB-GDN], 
 [<3>SZCZ-GDN, <4>GDN-BYD], 
 [<5>BIA-OLS]]

This can be addressed by selecting the attribute with the largest number of nodes as the grouping key. The Counter class (from collections) can help with determining the attribute frequencies.

from collections import Counter
f = Counter(n.name1 for n in l) + Counter(n.name2 for n in l) # frequencies
d = dict()
for n in l:
    k = (n.name1,n.name2)[(f[n.name2],n.name2)>(f[n.name1],n.name1)]
    d.setdefault(k,[]).append(n)  # group with most frequent attrib.
l = list(d.values())

print(l)
[[<1>WWA-KATO, <2>WWA-POZ, <6>LUB-WWA],
 [<3>SZCZ-GDN, <4>GDN-BYD, <7>LUB-GDN],
 [<5>BIA-OLS]]

Note that I also use the attributes themselves in the frequency comparison so that the tie-breaker is applied consistently (based on larger name)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM