简体   繁体   English

使用OrderedDict还是有序列表?(新手)

[英]Use OrderedDict or ordered list?(novice)

(Using Python 3.4.3) Here's what I want to do: I have a dictionary where the keys are strings and the values are the number of times that string occurs in file. (使用Python 3.4.3)这是我想做的:我有一本字典,其中的键是字符串,值是文件中字符串出现的次数。 I need to output which string(s) occur with the greatest frequency, along with their frequencies (if there's a tie for the most-frequent, output all of the most-frequent). 我需要输出出现频率最高的字符串以及它们的频率(如果最频繁的出现并发,则输出所有最频繁的字符串)。

I had tried to use OrderedDict. 我曾尝试使用OrderedDict。 I can create it fine, but I struggle to get it to output specifically the most frequently occurring. 我可以很好地创建它,但是我很难使它专门输出最频繁出现的输出。 I can keep trying, but I'm not sure an OrderedDict is really what I should be using, since I'll never need the actual OrderedDict once I've determined and output the most-frequent strings and their frequency. 我可以继续尝试,但是我不确定OrderedDict确实是我应该使用的,因为一旦确定并输出最频繁的字符串及其频率,就不再需要实际的OrderedDict。 A fellow student recommended an ordered list, but I don't see how I'd preserve the link between the keys and values as I currently have them. 一位同学推荐了一个有序列表,但是我看不到如何保留键和值之间的链接,因为我现在拥有它们。

Is OrderedDict the best tool to do what I'm looking for, or is there something else? 是OrderedDict最好的工具,可以做我正在寻找的东西,还是还有其他东西? If it is, is there a way to filter/slice(or equivalent) the OrderedDict? 如果是的话,有没有办法过滤/切片(或等效的)OrderedDict?

You can simply use sorted with a proper key function, in this case you can use operator.itemgetter(1) which will sorts your items based on values. 您可以简单地使用带有适当键功能的sorted ,在这种情况下,您可以使用operator.itemgetter(1)来根据值对项目进行排序。

from operator import itemgetter

print sorted(my_dict.items(),key=itemgetter(1),reverse=True)

This can be solved in two steps. 这可以分两个步骤解决。 First sort your dictionary entries by their frequency so that the highest frequency is first. 首先,按字典条目的频率对其进行排序,以使频率最高。

Secondly use Python's groupby function to take matching entries from the list. 其次,使用Python的groupby函数从列表中获取匹配的条目。 As you are only interested in the highest, you stop after one iteration. 因为您只对最高的东西感兴趣,所以您会在一次迭代后停止。 For example: 例如:

from itertools import groupby
from operator import itemgetter

my_dict = {"a" : 8, "d" : 3, "c" : 8, "b" : 2, "e" : 2}

for k, g in groupby(sorted(my_dict.items(), key=itemgetter(1), reverse=True), key=itemgetter(1)):
    print list(g)
    break

This would display: 这将显示:

[('a', 8), ('c', 8)]

As a and c are equal top. 由于ac等于顶部。

If you remove the break statement, you would get the full list: 如果删除break语句,将获得完整列表:

[('a', 8), ('c', 8)]
[('d', 3)]
[('b', 2), ('e', 2)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM