在大泡菜数据上使用 collections.Counter

Question

I have a pickle file with over a million words in it.我有一个包含超过一百万字的泡菜文件。 The pickle file can be downloaded from here .可以从这里下载 pickle 文件。

I want to use Counter on these words to sort them.我想对这些单词使用Counter对它们进行排序。 Here's my code:这是我的代码：

with open('data/words.pkl', 'rb') as f:
    data = list(pickle.load(f))

print(Counter(data).most_common(3))

The printed result changes every time, but it's usually like this:打印的结果每次都会变化，但通常是这样的：

[('', 1), ('fraksiyonal', 1), ('editado', 1)]

So, it seems to be not counting the words and every word's occurrence is 1. What am I doing wrong?所以，它似乎没有计算单词，每个单词的出现都是 1。我做错了什么？

Edit: As an example of how data list looks:编辑：作为数据列表外观的示例：

print(data[0:10])

Result:结果：

['', 'hillview', 'dipnota', 'дол', 'censusi', 'quathie', 'kalacağının', 'stralauerstrasse', 'sbaglio', 'keny']

Answer 1

The problem is with your data.问题在于您的数据。 In a comment you said,你在评论中说，

I changed it to list because pickle load data is a set object我将其更改为列表，因为泡菜负载数据是一组 object

Sets can't contain duplicates, hence why the counts are always 1.集合不能包含重复项，因此计数始终为 1。

due credit to jasonharper for posting the comment that figured it out应归功于jasonharper发表的评论