简体   繁体   English

从柜台绘图将维持秩序

[英]Plotting From Counter Will Maintaining Order

I'm attempting to plot the word frequencies of the top 50 words from an article I copied off Wikipedia. 我试图绘制我从Wikipedia复制的文章中前50个单词的单词频率。 I've had a look at How to plot the number of times each element is in a list Python: Frequency of occurrences and Using Counter() in Python to build histogram? 我看过如何绘制每个元素在列表中 的次数Python:出现频率以及在Python中使用Counter()来构建直方图? which seemed to be a promising results until I realized that the solution does not maintain the order from Counter() . 在我意识到解决方案无法维持Counter()的顺序之前,这似乎是一个有希望的结果。 Is there a way to that I could retain the descending from Counter() while plotting? 有没有一种方法可以在绘图时保留Counter()的降序?

The code that I'm using to play with the data: 我用来处理数据的代码:

# Standard Library
import collections
from collections import Counter
import itertools 
import re

# Third Party Library
import matplotlib.pyplot as plt
import nltk
import numpy as np

file = '...\\NLP\\Word_Embedding\\Basketball.txt'
text = open(file, 'r').read()
text = re.sub(r'([\"\'.])([\)\[,.;])', r'\1 \2', text)

vocab = text.split()
vocab = [words.lower() for words in vocab]
print('There are a total of {} words in the corpus'.format(len(vocab)))
tokens = list(set(vocab))
print('There are {} unique words in the corpus'.format(len(tokens)))

vocab_labels, vocab_values = zip(*Counter(vocab).items())
vocab_freq = Counter(vocab)

indexes = np.arange(len(vocab_labels[:10]))
width = 1

# plt.bar(indexes, vocab_values[:10], width) # Random 10 items from list
# plt.xticks(indexes + width * 0.5, vocab_labels[:10])
# plt.show()

Link to Basketball.txt file 链接到Basketball.txt文件

You can sort the vocab_values based on vocab_freq and reverse using [::-1] : 您可以基于vocab_valuesvocab_freq进行排序,并使用[::-1]反转:

import collections
from collections import Counter
import itertools
import re

# Third Party Library
import matplotlib.pyplot as plt
import nltk
import numpy as np

file = '.\Basketball.txt'
text = open(file, 'r').read()
text = re.sub(r'([\"\'.])([\)\[,.;])', r'\1 \2', text)

vocab = text.split()
vocab = [words.lower() for words in vocab]
print('There are a total of {} words in the corpus'.format(len(vocab)))
tokens = list(set(vocab))
print('There are {} unique words in the corpus'.format(len(tokens)))

vocab_labels, vocab_values = zip(*Counter(vocab).items())
vocab_freq = Counter(vocab)

sorted_values = sorted(vocab_values)[::-1]
sorted_labels = [x for (y,x) in sorted(zip(vocab_values,vocab_labels))][::-1]
indexes = np.arange(len(sorted_labels[:10]))
width = 1

plt.bar(indexes, sorted_values[:10] ) # Random 10 items from list
plt.xticks(indexes + width * 0.5, sorted_labels[:10])
plt.show()

result: 结果:

单词降序

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM