[英]Using collections.Counter on large pickle data
I have a pickle file with over a million words in it.我有一个包含超过一百万字的泡菜文件。 The pickle file can be downloaded from here .
可以从这里下载 pickle 文件。
I want to use Counter
on these words to sort them.我想对这些单词使用
Counter
对它们进行排序。 Here's my code:这是我的代码:
with open('data/words.pkl', 'rb') as f:
data = list(pickle.load(f))
print(Counter(data).most_common(3))
The printed result changes every time, but it's usually like this:打印的结果每次都会变化,但通常是这样的:
[('', 1), ('fraksiyonal', 1), ('editado', 1)]
So, it seems to be not counting the words and every word's occurrence is 1. What am I doing wrong?所以,它似乎没有计算单词,每个单词的出现都是 1。我做错了什么?
Edit: As an example of how data list looks:编辑:作为数据列表外观的示例:
print(data[0:10])
Result:结果:
['', 'hillview', 'dipnota', 'дол', 'censusi', 'quathie', 'kalacağının', 'stralauerstrasse', 'sbaglio', 'keny']
The problem is with your data.问题在于您的数据。 In a comment you said,
你在评论中说,
I changed it to list because pickle load data is a set object
我将其更改为列表,因为泡菜负载数据是一组 object
Sets can't contain duplicates, hence why the counts are always 1.集合不能包含重复项,因此计数始终为 1。
due credit to jasonharper for posting the comment that figured it out应归功于jasonharper发表的评论
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.