[英]Subset of dictionary keys
I've got a python dictionary of the form {'ip1:port1' : <value>, 'ip1:port2' : <value>, 'ip2:port1' : <value>, ...}
. 我有一个形式为{'ip1:port1' : <value>, 'ip1:port2' : <value>, 'ip2:port1' : <value>, ...}
的python字典。 Dictionary keys are strings, consisting of ip:port pairs. 字典键是字符串,由ip:端口对组成。 Values are not important for this task. 值对此任务并不重要。
I need a list of ip:port
combinations with unique IP addresses, ports can be any of those that appear among original keys. 我需要一个具有唯一IP地址的ip:port
组合列表,端口可以是原始密钥中出现的任何端口。 For example above, two variants are acceptable: ['ip1:port1', ip2:port1']
and ['ip1:port2', ip2:port1']
. 例如,上面可以接受两种变体: ['ip1:port1', ip2:port1']
和['ip1:port2', ip2:port1']
。
What is the most pythonic way for doing it? 这种方式最蟒蛇的方式是什么?
Currently my solution is 目前我的解决方案是
def get_uniq_worker_ips(workers):
wip = set(w.split(':')[0] for w in workers.iterkeys())
return [[worker for worker in workers.iterkeys() if worker.startswith(w)][0] for w in wip]
I don't like it, because it creates additional lists and then discards them. 我不喜欢它,因为它创建了额外的列表然后丢弃它们。
You can use itertools.groupby
to group by same IP addresses: 您可以使用itertools.groupby
按相同的IP地址分组:
data = {'ip1:port1' : "value1", 'ip1:port2' : "value2", 'ip2:port1' : "value3", 'ip2:port2': "value4"}
by_ip = {k: list(g) for k, g in itertools.groupby(sorted(data), key=lambda s: s.split(":")[0])}
by_ip
# {'ip1': ['ip1:port1', 'ip1:port2'], 'ip2': ['ip2:port1', 'ip2:port2']}
Then just pick any one from the different groups of IPs. 然后从不同的IP组中选择任何一个。
{v[0]: data[v[0]] for v in by_ip.values()}
# {'ip1:port1': 'value1', 'ip2:port1': 'value3'}
Or shorter, making a generator expression for just the first key from the groups: 或者更短,只为组中的第一个键生成一个生成器表达式:
one_by_ip = (next(g) for k, g in itertools.groupby(sorted(data), key=lambda s: s.split(":")[0]))
{key: data[key] for key in one_by_ip}
# {'ip1:port1': 'value1', 'ip2:port1': 'value3'}
However, note that groupby
requires the input data to be sorted. 但请注意, groupby
要求对输入数据进行排序。 So if you want to avoid sorting all the keys in the dict, you should instead just use a set
of already seen keys. 因此,如果您想避免对dict中的所有键进行排序,则应该只使用一set
已经看过的键。
seen = set()
not_seen = lambda x: not(x in seen or seen.add(x))
{key: data[key] for key in data if not_seen(key.split(":")[0])}
# {'ip1:port1': 'value1', 'ip2:port1': 'value3'}
This is similar to your solution, but instead of looping the unique keys and finding a matching key in the dict for each, you loop the keys and check whether you've already seen the IP. 这与您的解决方案类似,但不是循环使用唯一键并在每个dict中找到匹配键,而是循环键并检查您是否已经看过IP。
One way to do this is to transform your keys into a custom class that only looks at the IP part of the string when doing an equality test. 实现此目的的一种方法是将密钥转换为自定义类,该类仅在执行相等性测试时查看字符串的IP部分。 It also needs to supply an appropriate __hash__
method. 它还需要提供适当的__hash__
方法。
The logic here is that the set
constructor will "see" keys with the same IP as identical, ignoring the port part in the comparison, so it will avoid adding a key to the set if a key with that IP is already present in the set. 这里的逻辑是set
构造函数将“看到”具有相同IP的密钥相同,忽略比较中的端口部分,因此如果具有该IP的密钥已存在于集合中,则将避免向该集合添加密钥。
Here's some code that runs on Python 2 or Python 3. 这是一些在Python 2或Python 3上运行的代码。
class IPKey(object):
def __init__(self, s):
self.key = s
self.ip, self.port = s.split(':', 1)
def __eq__(self, other):
return self.ip == other.ip
def __hash__(self):
return hash(self.ip)
def __repr__(self):
return 'IPKey({}:{})'.format(self.ip, self.port)
def get_uniq_worker_ips(workers):
return [k.key for k in set(IPKey(k) for k in workers)]
# Test
workers = {
'ip1:port1' : "val",
'ip1:port2' : "val",
'ip2:port1' : "val",
'ip2:port2' : "val",
}
print(get_uniq_worker_ips(workers))
output 产量
['ip2:port1', 'ip1:port1']
If you are running Python 2.7 or later, the function can use a set comprehension instead of that generator expression inside the set()
constructor call. 如果您运行的是Python 2.7或更高版本,则该函数可以使用set comprehension而不是set()
构造函数调用中的该生成器表达式。
def get_uniq_worker_ips(workers):
return [k.key for k in {IPKey(k) for k in workers}]
The IPKey.__repr__
method isn't strictly necessary, but I like to give all my classes a __repr__
since it can be handy during development. IPKey.__repr__
方法并不是绝对必要的,但我喜欢给我所有的类__repr__
因为它在开发过程中很方便。
Here's a much more succinct solution which is very efficient, courtesy of Jon Clements . 这是一个更加简洁的解决方案,非常有效,由Jon Clements提供 。 It builds the desired list via a dictionary comprehension. 它通过字典理解构建所需的列表。
def get_uniq_worker_ips(workers):
return list({k.partition(':')[0]:k for k in workers}.values())
I've changed few characters in my solution and now am satisfied with it. 我在我的解决方案中改变了几个字符,现在对它感到满意。
def get_uniq_worker_ips(workers):
wip = set(w.split(':')[0] for w in workers.iterkeys())
return [next(worker for worker in workers.iterkeys() if worker.startswith(w)) for w in wip]
Thanks to @Ignacio Vazquez-Abrams and @MT for explanations. 感谢@Ignacio Vazquez-Abrams和@MT的解释。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.