简体   繁体   English

如何 plot 列字按计数值排序

[英]How to plot column words sorted by count value

This code counts the words in a column.此代码计算列中的单词。

df['businesstype'].value_counts()    #value count

My question how can I plot now the 10 or 5 highest counted word in the businesstype column?我的问题是如何让 plot 现在成为businesstype类型列中计数最高的 10 或 5 个单词?

df.head(10)['businesstype'].value_counts().plot.bar()

That works but it counts by the axis my csv data is sorted, not by the value count.这行得通,但它按轴计数,我的 csv 数据是按值排序的,而不是按值计数。

The question is probably easy but I am learning and I haven't found anything on SO that answers my question.这个问题可能很简单,但我正在学习,我还没有在 SO 上找到任何可以回答我问题的东西。

The dataframe looks like this: dataframe 看起来像这样:

Index(['Rang 2014', 'Unnamed: 1', 'Rang 2013','unternehmen' , 'Sitz',
       'Umsatz (Mrd. €)', 'Gewinn/Verlust (Mio. €)', 'Mitarbeiter weltweit',
       'businestype'],
      dtype='object')

I also checked the pd option max rows nothing changed just plotted top and bottom half if I set max rows.如果我设置最大行,我还检查了 pd 选项max rows没有任何变化,只是绘制了上半部分和下半部分。

You could simply plot entries 1-5 in your value_count series but this would distort the output in case there are ties with the following entries.您可以在value_count系列中简单地 plot 条目 1-5 但这会扭曲 output 以防与以下条目有联系。 A better strategy would be:更好的策略是:

import pandas as pd
from matplotlib import pyplot as plt

#number of top entries
nmax = 5

#fake data generation
import numpy as np
np.random.seed(1234)
n = 30
df = pd.DataFrame({"A": np.random.choice(list("XYZUVWKLM"), n), "B": np.random.randint(1, 10, n)})

#create value count series from A
plot_df = df["A"].value_counts()

#plot the two strategies into different panels for better comparison
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))

#strategy 1: simply plot the first nmax rows
plot_df[:nmax].plot.bar(ax=ax1, rot=0)
ax1.set_title("First nmax entries")

#better approach with strategy 2:
#find value for top nmax entry in case there is a tie with the following entries
val_for_nmax =  plot_df[nmax-1] 
#plot columns that have no less than this value
plot_df[plot_df>=val_for_nmax].plot.bar(ax=ax2, rot=45)
ax2.set_title("Take care of tie values")

plt.show()

Sample output:样品 output: 在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM