如何 plot 列字按計數值排序

Question

此代碼計算列中的單詞。

df['businesstype'].value_counts()    #value count

我的問題是如何讓 plot 現在成為businesstype類型列中計數最高的 10 或 5 個單詞？

df.head(10)['businesstype'].value_counts().plot.bar()

這行得通，但它按軸計數，我的 csv 數據是按值排序的，而不是按值計數。

這個問題可能很簡單，但我正在學習，我還沒有在 SO 上找到任何可以回答我問題的東西。

dataframe 看起來像這樣：

Index(['Rang 2014', 'Unnamed: 1', 'Rang 2013','unternehmen' , 'Sitz',
       'Umsatz (Mrd. €)', 'Gewinn/Verlust (Mio. €)', 'Mitarbeiter weltweit',
       'businestype'],
      dtype='object')

如果我設置最大行，我還檢查了 pd 選項max rows沒有任何變化，只是繪制了上半部分和下半部分。

Answer 1

您可以在value_count系列中簡單地 plot 條目 1-5 但這會扭曲 output 以防與以下條目有聯系。 更好的策略是：

import pandas as pd
from matplotlib import pyplot as plt

#number of top entries
nmax = 5

#fake data generation
import numpy as np
np.random.seed(1234)
n = 30
df = pd.DataFrame({"A": np.random.choice(list("XYZUVWKLM"), n), "B": np.random.randint(1, 10, n)})

#create value count series from A
plot_df = df["A"].value_counts()

#plot the two strategies into different panels for better comparison
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))

#strategy 1: simply plot the first nmax rows
plot_df[:nmax].plot.bar(ax=ax1, rot=0)
ax1.set_title("First nmax entries")

#better approach with strategy 2:
#find value for top nmax entry in case there is a tie with the following entries
val_for_nmax =  plot_df[nmax-1] 
#plot columns that have no less than this value
plot_df[plot_df>=val_for_nmax].plot.bar(ax=ax2, rot=45)
ax2.set_title("Take care of tie values")

plt.show()

樣品 output：

如何 plot 列字按計數值排序

問題描述

1 個解決方案

解決方案1
1 已采納 2020-12-26 12:20:49

如何 plot 列字按計數值排序

問題描述

1 個解決方案

解決方案1 1 已采納 2020-12-26 12:20:49

解決方案1
1 已采納 2020-12-26 12:20:49