简体   繁体   中英

Value-sorted dict for Python?

I am interested in a dict implementation for Python that provides an iterating interface to sorted values. Ie, a dict with a " sortedvalues() " function.

Naively one can do sorted(dict.values()) but that's not what I want. Every time items are inserted or deleted, one has to run a full sorting which isn't efficient.

Note that I am not asking about key-sorted dict either (for that question, there are excellent answers in Key-ordered dict in Python and Python 2.6 TreeMap/SortedDictionary? ).

One solution is to write a class that inherits from dict but also maintains a list of keys sorted by their value ( sorted_keys ), along with the list of corresponding (sorted) values ( sorted_values ).

You can then define a __setitem__() method that uses the bisect module in order to know quickly the position k where the new (key, value) pair should be inserted in the two lists. You can then insert the new key and the new value both in the dictionary itself, and in the two lists that you maintain, with sorted_values[k:k] = [new_value] and sorted_keys[k:k] = [new_key] ; unfortunately, the time complexity of such an insertion is O(n) (so O(n^2) for the whole dictionary).

Another approach to the ordered element insertion would be to use the heapq module and insert (value, key) pairs in it. This works in O(log n) instead of the list-based approach of the previous paragraph.

Iterating over the dictionary can then simply done by iterating over the list of keys ( sorted_keys ) that you maintain.

This method saves you the time it would take to sort the keys each time you want to iterate over the dictionary (with sorted values), by basically shifting (and increasing, unfortunately) this time cost to the construction of the sorted lists of keys and values.

The problem is that you need to sort or hash it by keys to get reasonable insert and lookup performance. A naive way of implementing it would be a value-sorted tree structure of entries, and a dict to lookup the tree position for a key. You need to get deep into updating the tree though, as this lookup dictionary needs to be kept correct. Essentially, as you would do for an updatable heap.

I figure there are too many options to make a resonable standard library option out of such a structure, while it is too rarely needed.

Update : a trick that might work for you is to use a dual structure:

  1. a regular dict storing the key-value pairs as usual

  2. any kind of sorted list, for example using bisect

Then you have to implement the common operations on both: a new value is inserted into both structures. The tricky part are the update and delete operations. You use the first structure to look up the old value, delete the old value from the second structure, then (when updating) reinsert as before.

If you need to know the keys too, store (value, key) pairs in your b list.

Update 2 : Try this class:

import bisect
class dictvs(dict):
    def __init__(self):
        self._list = []

    def __setitem__(self, key, value):
        old = self.get(key)
        if old is None:
            bisect.insort(self._list, value)
            dict.__setitem__(self, key, value)
        else:
            oldpos = bisect.bisect_left(self._list, old)
            newpos = bisect.bisect_left(self._list, value)
            if newpos > oldpos:
                newpos -= 1
                for i in xrange(oldpos, newpos):
                    self._list[i] = self._list[i + 1]
            else:
                for i in xrange(oldpos, newpos, -1):
                    self._list[i] = self._list[i - 1]
            self._list[newpos] = value
            dict.__setitem__(self, key, value)

    def __delitem__(self, key):
        old = self.get(key)
        if old is not None:
            oldpos = bisect.bisect(self._list, old)
            del self._list[oldpos]
        dict.__delitem__(self, key)

    def values(self):
        return list(self._list)

It's not a complete dict yet I guess. I havn't tested deletions, and just a tiny update set. You should make a larger unit test for it, and compare the return of values() with that of sorted(dict.values(instance)) there. This is just to show how to update the sorted list with bisect

Here is another, simpler idea:

  • You create a class that inherits from dict .
  • You use a cache: you only sort the keys when iterating over the dictionary, and you mark the dictionary as being sorted; insertions should simply append to the list of keys.

kindall mention in a comment that sorting lists that are almost sorted is fast, so this approach should be quite fast.

You can use a skip dict . It is a Python dictionary that is permanently sorted by value.

Insertion is slightly more expensive than a regular dictionary, but it is well worth the cost if you frequently need to iterate in order, or perform value-based queries such as:

  1. What's the highest / lowest item?
  2. Which items have a value between X and Y?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM