字典键的子集

Question

I've got a python dictionary of the form {'ip1:port1' : <value>, 'ip1:port2' : <value>, 'ip2:port1' : <value>, ...} . 我有一个形式为{'ip1:port1' : <value>, 'ip1:port2' : <value>, 'ip2:port1' : <value>, ...}的python字典。 Dictionary keys are strings, consisting of ip:port pairs. 字典键是字符串，由ip：端口对组成。 Values are not important for this task. 值对此任务并不重要。

I need a list of ip:port combinations with unique IP addresses, ports can be any of those that appear among original keys. 我需要一个具有唯一IP地址的ip:port组合列表，端口可以是原始密钥中出现的任何端口。 For example above, two variants are acceptable: ['ip1:port1', ip2:port1'] and ['ip1:port2', ip2:port1'] . 例如，上面可以接受两种变体： ['ip1:port1', ip2:port1']和['ip1:port2', ip2:port1'] 。

What is the most pythonic way for doing it? 这种方式最蟒蛇的方式是什么？

Currently my solution is 目前我的解决方案是

def get_uniq_worker_ips(workers):
    wip = set(w.split(':')[0] for w in workers.iterkeys())
    return [[worker for worker in workers.iterkeys() if worker.startswith(w)][0] for w in wip]

I don't like it, because it creates additional lists and then discards them. 我不喜欢它，因为它创建了额外的列表然后丢弃它们。

Answer 1

You can use itertools.groupby to group by same IP addresses: 您可以使用itertools.groupby按相同的IP地址分组：

data = {'ip1:port1' : "value1", 'ip1:port2' : "value2", 'ip2:port1' : "value3", 'ip2:port2': "value4"}
by_ip = {k: list(g) for k, g in itertools.groupby(sorted(data), key=lambda s: s.split(":")[0])}
by_ip
# {'ip1': ['ip1:port1', 'ip1:port2'], 'ip2': ['ip2:port1', 'ip2:port2']}

Then just pick any one from the different groups of IPs. 然后从不同的IP组中选择任何一个。

{v[0]: data[v[0]] for v in by_ip.values()}
# {'ip1:port1': 'value1', 'ip2:port1': 'value3'}

Or shorter, making a generator expression for just the first key from the groups: 或者更短，只为组中的第一个键生成一个生成器表达式：

one_by_ip = (next(g) for k, g in itertools.groupby(sorted(data), key=lambda s: s.split(":")[0]))
{key: data[key] for key in one_by_ip}
# {'ip1:port1': 'value1', 'ip2:port1': 'value3'}

However, note that groupby requires the input data to be sorted. 但请注意， groupby要求对输入数据进行排序。 So if you want to avoid sorting all the keys in the dict, you should instead just use a set of already seen keys. 因此，如果您想避免对dict中的所有键进行排序，则应该只使用一set已经看过的键。

seen = set()
not_seen = lambda x: not(x in seen or seen.add(x))
{key: data[key] for key in data if not_seen(key.split(":")[0])}
# {'ip1:port1': 'value1', 'ip2:port1': 'value3'}

This is similar to your solution, but instead of looping the unique keys and finding a matching key in the dict for each, you loop the keys and check whether you've already seen the IP. 这与您的解决方案类似，但不是循环使用唯一键并在每个dict中找到匹配键，而是循环键并检查您是否已经看过IP。

Answer 2

One way to do this is to transform your keys into a custom class that only looks at the IP part of the string when doing an equality test. 实现此目的的一种方法是将密钥转换为自定义类，该类仅在执行相等性测试时查看字符串的IP部分。 It also needs to supply an appropriate __hash__ method. 它还需要提供适当的__hash__方法。

The logic here is that the set constructor will "see" keys with the same IP as identical, ignoring the port part in the comparison, so it will avoid adding a key to the set if a key with that IP is already present in the set. 这里的逻辑是set构造函数将“看到”具有相同IP的密钥相同，忽略比较中的端口部分，因此如果具有该IP的密钥已存在于集合中，则将避免向该集合添加密钥。

Here's some code that runs on Python 2 or Python 3. 这是一些在Python 2或Python 3上运行的代码。

class IPKey(object):
    def __init__(self, s):
        self.key = s
        self.ip, self.port = s.split(':', 1)

    def __eq__(self, other):
        return self.ip == other.ip

    def __hash__(self):
        return hash(self.ip)

    def __repr__(self):
        return 'IPKey({}:{})'.format(self.ip, self.port)

def get_uniq_worker_ips(workers):
    return [k.key for k in set(IPKey(k) for k in workers)]

# Test

workers = {
    'ip1:port1' : "val", 
    'ip1:port2' : "val", 
    'ip2:port1' : "val", 
    'ip2:port2' : "val", 
}

print(get_uniq_worker_ips(workers))

output 产量

['ip2:port1', 'ip1:port1']

If you are running Python 2.7 or later, the function can use a set comprehension instead of that generator expression inside the set() constructor call. 如果您运行的是Python 2.7或更高版本，则该函数可以使用set comprehension而不是set()构造函数调用中的该生成器表达式。

def get_uniq_worker_ips(workers):
    return [k.key for k in {IPKey(k) for k in workers}]

The IPKey.__repr__ method isn't strictly necessary, but I like to give all my classes a __repr__ since it can be handy during development. IPKey.__repr__方法并不是绝对必要的，但我喜欢给我所有的类__repr__因为它在开发过程中很方便。

Here's a much more succinct solution which is very efficient, courtesy of Jon Clements . 这是一个更加简洁的解决方案，非常有效，由Jon Clements提供。 It builds the desired list via a dictionary comprehension. 它通过字典理解构建所需的列表。

def get_uniq_worker_ips(workers):
    return list({k.partition(':')[0]:k for k in workers}.values())

Answer 3

I've changed few characters in my solution and now am satisfied with it. 我在我的解决方案中改变了几个字符，现在对它感到满意。

def get_uniq_worker_ips(workers):
    wip = set(w.split(':')[0] for w in workers.iterkeys())
    return [next(worker for worker in workers.iterkeys() if worker.startswith(w)) for w in wip]

Thanks to @Ignacio Vazquez-Abrams and @MT for explanations. 感谢@Ignacio Vazquez-Abrams和@MT的解释。

字典键的子集

问题描述

3 个解决方案

解决方案1
7 已采纳 2016-07-25 10:59:44

解决方案2
4 2016-07-25 11:21:03

解决方案3
0 2016-07-25 11:19:28

字典键的子集

问题描述

3 个解决方案

解决方案1 7 已采纳 2016-07-25 10:59:44

解决方案2 4 2016-07-25 11:21:03

解决方案3 0 2016-07-25 11:19:28

解决方案1
7 已采纳 2016-07-25 10:59:44

解决方案2
4 2016-07-25 11:21:03

解决方案3
0 2016-07-25 11:19:28