简体   繁体   English

Python:查找最频繁的字节?

[英]Python: find most frequent bytes?

I'm looking for a (preferably simple) way to find and order the most common bytes in a python stream element. 我正在寻找一种(最好是简单的)方法来查找和排序python流元素中最常见的字节。

eg 例如

>>> freq_bytes(b'hello world')
b'lohe wrd'

or even 甚至

>>> freq_bytes(b'hello world')
[108,111,104,101,32,119,114,100]

I currently have a function that returns a list in the form list[97] == occurrences of "a" . 我目前有一个函数,该函数以list[97] == occurrences of "a"形式返回列表。 I need that to be sorted. 我需要对它进行排序。

I figure I basically need to flip the list so list[a] = b --> list[b] = a at the same time removing the repeates. 我认为我基本上需要翻转列表,因此list[a] = b --> list[b] = a同时删除重复项。

Try the Counter class in the collections module. 在collections模块中尝试Counter类

from collections import Counter

string = "hello world"
print ''.join(char[0] for char in Counter(string).most_common())

Note you need Python 2.7 or later. 请注意,您需要Python 2.7或更高版本。

Edit: Forgot the most_common() method returned a list of value/count tuples, and used a list comprehension to get just the values. 编辑:忘记了most_common()方法返回值/计数元组的列表,并使用列表推导来获取值。

def frequent_bytes(aStr):
    d = {}
    for char in aStr:
        d[char] = d.setdefault(char, 0) + 1

    myList = []
    for char, frequency in d.items():
        myList.append((frequency, char))
    myList.sort(reverse=True)

    return ''.join(myList)

>>> frequent_bytes('hello world')
'lowrhed '

I just tried something obvious. 我只是尝试了一些显而易见的事情。 @kindall's answer rocks, though. 不过,@ kindall的答案很糟糕。 :) :)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM