简体   繁体   English

Python的值排序字典?

[英]Value-sorted dict for Python?

I am interested in a dict implementation for Python that provides an iterating interface to sorted values. 我对Python的dict实现很感兴趣,它为排序值提供了一个迭代接口。 Ie, a dict with a " sortedvalues() " function. 即,带有“ sortedvalues() ”函数的dict

Naively one can do sorted(dict.values()) but that's not what I want. 天真的人可以做sorted(dict.values())但这不是我想要的。 Every time items are inserted or deleted, one has to run a full sorting which isn't efficient. 每次插入或删除项目时,都必须运行完全排序,这是无效的。

Note that I am not asking about key-sorted dict either (for that question, there are excellent answers in Key-ordered dict in Python and Python 2.6 TreeMap/SortedDictionary? ). 请注意,我也没有询问关键字排序的字典(对于那个问题, 在PythonPython 2.6 TreeMap / SortedDictionary中的Key-ordered dict中有很好的答案 )。

One solution is to write a class that inherits from dict but also maintains a list of keys sorted by their value ( sorted_keys ), along with the list of corresponding (sorted) values ( sorted_values ). 一种解决方案是编写一个继承自dict的类,但也维护一个按其值( sorted_keys )排序的键列表,以及相应(排序)值列表( sorted_values )。

You can then define a __setitem__() method that uses the bisect module in order to know quickly the position k where the new (key, value) pair should be inserted in the two lists. 然后,您可以定义一个使用bisect模块的__setitem__()方法,以便快速了解应在两个列表中插入新(键,值)对的位置k You can then insert the new key and the new value both in the dictionary itself, and in the two lists that you maintain, with sorted_values[k:k] = [new_value] and sorted_keys[k:k] = [new_key] ; 然后,您可以在字典本身和您维护的两个列表中插入新密钥和新值,并使用sorted_values[k:k] = [new_value]sorted_keys[k:k] = [new_key] ; unfortunately, the time complexity of such an insertion is O(n) (so O(n^2) for the whole dictionary). 遗憾的是,这种插入的时间复杂度是O(n) (因此整个字典的O(n^2) )。

Another approach to the ordered element insertion would be to use the heapq module and insert (value, key) pairs in it. 有序元素插入的另一种方法是使用heapq模块并在其中插入(value, key)对。 This works in O(log n) instead of the list-based approach of the previous paragraph. 这适用于O(log n)而不是前一段的基于列表的方法。

Iterating over the dictionary can then simply done by iterating over the list of keys ( sorted_keys ) that you maintain. 然后,迭代字典可以通过迭代您维护的键列表( sorted_keys )来完成。

This method saves you the time it would take to sort the keys each time you want to iterate over the dictionary (with sorted values), by basically shifting (and increasing, unfortunately) this time cost to the construction of the sorted lists of keys and values. 这种方法可以节省您每次要对字典进行迭代(具有排序值)时对键进行排序所需的时间,通过基本上将这个时间成本转移(并且不幸地增加)来构造排序的键列表和值。

The problem is that you need to sort or hash it by keys to get reasonable insert and lookup performance. 问题是您需要按键对其进行排序或散列以获得合理的插入和查找性能。 A naive way of implementing it would be a value-sorted tree structure of entries, and a dict to lookup the tree position for a key. 实现它的一种天真的方式是条目的值排序树结构,以及查找键的树位置的字典。 You need to get deep into updating the tree though, as this lookup dictionary needs to be kept correct. 您需要深入更新树,因为这个查找字典需要保持正确。 Essentially, as you would do for an updatable heap. 基本上,就像你可以为可更新堆做的那样。

I figure there are too many options to make a resonable standard library option out of such a structure, while it is too rarely needed. 我认为有太多的选择可以从这样的结构中制作出一个合理的标准库选项,而它却很少需要。

Update : a trick that might work for you is to use a dual structure: 更新 :可能适合您的技巧是使用双重结构:

  1. a regular dict storing the key-value pairs as usual 像往常一样存储键值对的常规dict

  2. any kind of sorted list, for example using bisect 任何类型的排序列表,例如使用bisect

Then you have to implement the common operations on both: a new value is inserted into both structures. 然后,您必须在两者上实现常见操作:将新值插入到两个结构中。 The tricky part are the update and delete operations. 棘手的部分是更新和删除操作。 You use the first structure to look up the old value, delete the old value from the second structure, then (when updating) reinsert as before. 您使用第一个结构查找旧值,从第二个结构中删除旧值,然后(更新时)像以前一样重新插入。

If you need to know the keys too, store (value, key) pairs in your b list. 如果您还需要知道密钥,请在b列表中存储(值,密钥)对。

Update 2 : Try this class: 更新2 :尝试这个类:

import bisect
class dictvs(dict):
    def __init__(self):
        self._list = []

    def __setitem__(self, key, value):
        old = self.get(key)
        if old is None:
            bisect.insort(self._list, value)
            dict.__setitem__(self, key, value)
        else:
            oldpos = bisect.bisect_left(self._list, old)
            newpos = bisect.bisect_left(self._list, value)
            if newpos > oldpos:
                newpos -= 1
                for i in xrange(oldpos, newpos):
                    self._list[i] = self._list[i + 1]
            else:
                for i in xrange(oldpos, newpos, -1):
                    self._list[i] = self._list[i - 1]
            self._list[newpos] = value
            dict.__setitem__(self, key, value)

    def __delitem__(self, key):
        old = self.get(key)
        if old is not None:
            oldpos = bisect.bisect(self._list, old)
            del self._list[oldpos]
        dict.__delitem__(self, key)

    def values(self):
        return list(self._list)

It's not a complete dict yet I guess. 我想这不是一个完整的dict I havn't tested deletions, and just a tiny update set. 我没有测试删除,只是一个小的更新集。 You should make a larger unit test for it, and compare the return of values() with that of sorted(dict.values(instance)) there. 你应该对它进行更大的单元测试,并将values()的返回values()sorted(dict.values(instance))的返回values()进行比较。 This is just to show how to update the sorted list with bisect 这只是为了说明如何使用bisect更新排序列表

Here is another, simpler idea: 这是另一个更简单的想法:

  • You create a class that inherits from dict . 您创建一个继承自dict的类。
  • You use a cache: you only sort the keys when iterating over the dictionary, and you mark the dictionary as being sorted; 您使用缓存:您只在迭代字典时对键进行排序,并将字典标记为已排序; insertions should simply append to the list of keys. 插入应该只是附加到键列表。

kindall mention in a comment that sorting lists that are almost sorted is fast, so this approach should be quite fast. 在评论中提到几乎排序的排序列表很快,所以这种方法应该非常快。

You can use a skip dict . 你可以使用跳过词典 It is a Python dictionary that is permanently sorted by value. 它是一个按值永久排序的Python字典。

Insertion is slightly more expensive than a regular dictionary, but it is well worth the cost if you frequently need to iterate in order, or perform value-based queries such as: 插入比常规字典略贵,但如果您经常需要按顺序迭代,或执行基于值的查询,例如:

  1. What's the highest / lowest item? 什么是最高/最低的项目?
  2. Which items have a value between X and Y? 哪些项目的值在X和Y之间?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM