简体   繁体   中英

How to plot column words sorted by count value

This code counts the words in a column.

df['businesstype'].value_counts()    #value count

My question how can I plot now the 10 or 5 highest counted word in the businesstype column?


That works but it counts by the axis my csv data is sorted, not by the value count.

The question is probably easy but I am learning and I haven't found anything on SO that answers my question.

The dataframe looks like this:

Index(['Rang 2014', 'Unnamed: 1', 'Rang 2013','unternehmen' , 'Sitz',
       'Umsatz (Mrd. €)', 'Gewinn/Verlust (Mio. €)', 'Mitarbeiter weltweit',

I also checked the pd option max rows nothing changed just plotted top and bottom half if I set max rows.

You could simply plot entries 1-5 in your value_count series but this would distort the output in case there are ties with the following entries. A better strategy would be:

import pandas as pd
from matplotlib import pyplot as plt

#number of top entries
nmax = 5

#fake data generation
import numpy as np
n = 30
df = pd.DataFrame({"A": np.random.choice(list("XYZUVWKLM"), n), "B": np.random.randint(1, 10, n)})

#create value count series from A
plot_df = df["A"].value_counts()

#plot the two strategies into different panels for better comparison
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))

#strategy 1: simply plot the first nmax rows
plot_df[:nmax].plot.bar(ax=ax1, rot=0)
ax1.set_title("First nmax entries")

#better approach with strategy 2:
#find value for top nmax entry in case there is a tie with the following entries
val_for_nmax =  plot_df[nmax-1] 
#plot columns that have no less than this value
plot_df[plot_df>=val_for_nmax].plot.bar(ax=ax2, rot=45)
ax2.set_title("Take care of tie values")


Sample output: 在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM