简体   繁体   中英

Subset of dictionary keys

I've got a python dictionary of the form {'ip1:port1' : <value>, 'ip1:port2' : <value>, 'ip2:port1' : <value>, ...} . Dictionary keys are strings, consisting of ip:port pairs. Values are not important for this task.

I need a list of ip:port combinations with unique IP addresses, ports can be any of those that appear among original keys. For example above, two variants are acceptable: ['ip1:port1', ip2:port1'] and ['ip1:port2', ip2:port1'] .

What is the most pythonic way for doing it?

Currently my solution is

def get_uniq_worker_ips(workers):
    wip = set(w.split(':')[0] for w in workers.iterkeys())
    return [[worker for worker in workers.iterkeys() if worker.startswith(w)][0] for w in wip]

I don't like it, because it creates additional lists and then discards them.

You can use itertools.groupby to group by same IP addresses:

data = {'ip1:port1' : "value1", 'ip1:port2' : "value2", 'ip2:port1' : "value3", 'ip2:port2': "value4"}
by_ip = {k: list(g) for k, g in itertools.groupby(sorted(data), key=lambda s: s.split(":")[0])}
by_ip
# {'ip1': ['ip1:port1', 'ip1:port2'], 'ip2': ['ip2:port1', 'ip2:port2']}

Then just pick any one from the different groups of IPs.

{v[0]: data[v[0]] for v in by_ip.values()}
# {'ip1:port1': 'value1', 'ip2:port1': 'value3'}

Or shorter, making a generator expression for just the first key from the groups:

one_by_ip = (next(g) for k, g in itertools.groupby(sorted(data), key=lambda s: s.split(":")[0]))
{key: data[key] for key in one_by_ip}
# {'ip1:port1': 'value1', 'ip2:port1': 'value3'}

However, note that groupby requires the input data to be sorted. So if you want to avoid sorting all the keys in the dict, you should instead just use a set of already seen keys.

seen = set()
not_seen = lambda x: not(x in seen or seen.add(x))
{key: data[key] for key in data if not_seen(key.split(":")[0])}
# {'ip1:port1': 'value1', 'ip2:port1': 'value3'}

This is similar to your solution, but instead of looping the unique keys and finding a matching key in the dict for each, you loop the keys and check whether you've already seen the IP.

One way to do this is to transform your keys into a custom class that only looks at the IP part of the string when doing an equality test. It also needs to supply an appropriate __hash__ method.

The logic here is that the set constructor will "see" keys with the same IP as identical, ignoring the port part in the comparison, so it will avoid adding a key to the set if a key with that IP is already present in the set.

Here's some code that runs on Python 2 or Python 3.

class IPKey(object):
    def __init__(self, s):
        self.key = s
        self.ip, self.port = s.split(':', 1)

    def __eq__(self, other):
        return self.ip == other.ip

    def __hash__(self):
        return hash(self.ip)

    def __repr__(self):
        return 'IPKey({}:{})'.format(self.ip, self.port)

def get_uniq_worker_ips(workers):
    return [k.key for k in set(IPKey(k) for k in workers)]

# Test

workers = {
    'ip1:port1' : "val", 
    'ip1:port2' : "val", 
    'ip2:port1' : "val", 
    'ip2:port2' : "val", 
}

print(get_uniq_worker_ips(workers))    

output

['ip2:port1', 'ip1:port1']

If you are running Python 2.7 or later, the function can use a set comprehension instead of that generator expression inside the set() constructor call.

def get_uniq_worker_ips(workers):
    return [k.key for k in {IPKey(k) for k in workers}]

The IPKey.__repr__ method isn't strictly necessary, but I like to give all my classes a __repr__ since it can be handy during development.


Here's a much more succinct solution which is very efficient, courtesy of Jon Clements . It builds the desired list via a dictionary comprehension.

def get_uniq_worker_ips(workers):
    return list({k.partition(':')[0]:k for k in workers}.values())

I've changed few characters in my solution and now am satisfied with it.

def get_uniq_worker_ips(workers):
    wip = set(w.split(':')[0] for w in workers.iterkeys())
    return [next(worker for worker in workers.iterkeys() if worker.startswith(w)) for w in wip]

Thanks to @Ignacio Vazquez-Abrams and @MT for explanations.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM