简体   繁体   English

python itertools groupby 几个属性

[英]python itertools groupby several attributes

I'm trying to group objects by attributes.我正在尝试按属性对对象进行分组。 I know how to do for one but how to include several attributes, for example:我知道如何做一个,但如何包含几个属性,例如:

class Node:
    def __init__(self, name1, name2, id_):
        self.name1 = name1
        self.name2 = name2
        self.id_= id_
 def __repr__(self):
        return f"<{self.id_}>{self.name1}-{self.name2}"

l = [Node('WWA', 'KATO', 1), Node('WWA', 'POZ', 2), Node('SZCZ', 'GDN', 3), Node('GDN', 'BYD', 4), Node('BIA', 'OLS', 5), Node('LUB', 'WWA',6 )]

get_attr = attrgetter("name1")
l= [list(g) for k, g in groupby(sorted(l, key=get_attr), get_attr )]

result:结果:

[[<5>BIA-OLSZ],
 [<4>GDN-BYD],
 [<6>LUB-WWA],
 [<3>SZCZ-GDN],
 [<1>WWA-KATO, <2>WWA-POZ]]

I'd like to group by name1 or by name2 and get it this:我想按 name1 或 name2 分组并得到它:

[[<5>BIA-OLSZ],
 [<4>GDN-BYD],<3>SZCZ-GDN],
 [<1>WWA-KATO, <2>WWA-POZ, <6>LUB-WWA]]

You wont be able do that using groupby because you'll need to have two identifiers for each group.您将无法使用 groupby 执行此操作,因为您需要为每个组提供两个标识符。

To have multiple keys for each group, you can use a dictionary where the list of nodes is associated to two keys (name1 and name2).要为每个组设置多个键,您可以使用字典,其中节点列表与两个键(名称 1 和名称 2)相关联。 Adding each node to the group corresponding to one of its names will produce a (redundant) list of groups as the dictionary's values().将每个节点添加到与其名称之一相对应的组中,将生成一个(冗余)组列表作为字典的 values()。 You can then get the distinct (non-empty) groups out of those:然后,您可以从中获取不同的(非空)组:

d = dict()
for n in l:
    d.setdefault(n.name1, d.setdefault(n.name2,[])).append(n) # link both names
l = [*{id(g):g for g in d.values() if g}.values()]            # distinct groups

print(l)

[[<1>WWA-KATO, <2>WWA-POZ, <6>LUB-WWA], 
 [<3>SZCZ-GDN, <4>GDN-BYD], 
 [<5>BIA-OLS]]

Note that this doesn't formally address my original question about overlapping group attributes, so if we add Node('LUB','GDN',7) to the list, it will end up in one of the groups which may or may not be where you want it to be:请注意,这并没有正式解决我关于重叠组属性的原始问题,因此如果我们将 Node('LUB','GDN',7) 添加到列表中,它将最终出现在可能或可能不存在的组之一中成为你想要的地方:

l = [Node('WWA', 'KATO', 1), Node('WWA', 'POZ', 2), Node('SZCZ', 'GDN', 3), 
     Node('GDN', 'BYD', 4), Node('BIA', 'OLS', 5), Node('LUB', 'WWA',6 ), 
     Node('LUB', 'GDN',7 )]

d = dict()
for n in l:
    d.setdefault(n.name1, d.setdefault(n.name2,[])).append(n) 
l = [*{id(g):g for g in d.values() if g}.values()]            

print(l)

[[<1>WWA-KATO, <2>WWA-POZ, <6>LUB-WWA, <7>LUB-GDN], 
 [<3>SZCZ-GDN, <4>GDN-BYD], 
 [<5>BIA-OLS]]

This can be addressed by selecting the attribute with the largest number of nodes as the grouping key.这可以通过选择具有最多节点数的属性作为分组键来解决。 The Counter class (from collections) can help with determining the attribute frequencies.计数器 class(来自集合)可以帮助确定属性频率。

from collections import Counter
f = Counter(n.name1 for n in l) + Counter(n.name2 for n in l) # frequencies
d = dict()
for n in l:
    k = (n.name1,n.name2)[(f[n.name2],n.name2)>(f[n.name1],n.name1)]
    d.setdefault(k,[]).append(n)  # group with most frequent attrib.
l = list(d.values())

print(l)
[[<1>WWA-KATO, <2>WWA-POZ, <6>LUB-WWA],
 [<3>SZCZ-GDN, <4>GDN-BYD, <7>LUB-GDN],
 [<5>BIA-OLS]]

Note that I also use the attributes themselves in the frequency comparison so that the tie-breaker is applied consistently (based on larger name)请注意,我还在频率比较中使用属性本身,以便一致地应用决胜局(基于较大的名称)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM