简体   繁体   中英

Plotting Bokeh bar chart using sum of grouped Pandas column

I'm trying to create a bar chart to see which stores had the biggest revenue in my dataset. Using the default Pandas plot I can do that in one line:

df.groupby('store_name')['sale_value'].sum().sort_values(ascending=False).head(20).plot(kind='bar')

熊猫情节

But this chart is not very interactive and I can't see the exact values, so I want to try and create it using Bokeh and be able to mouseover a bar and see the exact amout, for example.

I tried doing the following but just got a blank page:

source = ColumnDataSource(df.groupby('store_name')['sale_value'])
plot = Plot()
glyph = VBar(x='store_name', top='sale_value')
plot.add_glyph(source, glyph)
show(plot)

and if I change source to ColumnDataSource(df.groupby('store_name')['sale_value'].sum()) I get 'ValueError: expected a dict or pandas.DataFrame, got store_name'

How can I create this chart with mouseover using Bokeh?

Let's asume this is our DataFrame:

df = pd.DataFrame({'store_name':['a', 'b', 'a', 'c'], 'sale_value':[4, 5, 2, 4]})
df
>>>
   store_name  sale_value
0          a           4
1          b           5
2          a           2
3          c           4

Now it is possible to creat a bar chart with your approach.

First we have to do some imports and preprocessing:

from bokeh.models import ColumnDataSource, Grid, LinearAxis, Plot, VBar, Title

source = ColumnDataSource(df.groupby('store_name')['sale_value'].sum().to_frame().reset_index())
my_ticks = [i for i in range(len(source.data['store_name']))]
my_tick_labels = {i: source.data['store_name'][i] for i in range(len(source.data['store_name']))}

There are some changes in the section of the groupby . A .sum() is added and it is reset to a DataFrame with ascending index.

Then you can create a plot.

plot = Plot(title=Title(text='Plot'), 
            plot_width=300, 
            plot_height=300,
            min_border=0, 
            toolbar_location=None
           )

glyph = VBar(x='index', 
             top='sale_value', 
             bottom=0, 
             width=0.5, 
             fill_color="#b3de69"
            )

plot.add_glyph(source, glyph)

xaxis = LinearAxis(ticker = my_ticks,
                   major_label_overrides= my_tick_labels
                  )
plot.add_layout(xaxis, 'below')

yaxis = LinearAxis()
plot.add_layout(yaxis, 'left')

plot.add_layout(Grid(dimension=0, ticker=xaxis.ticker))
plot.add_layout(Grid(dimension=1, ticker=yaxis.ticker))

show(plot)

I also want to show your a second approach I prefere more.

from bokeh.plotting import figure, show

plot = figure(title='Plot', 
              plot_width=300, 
              plot_height=300,
              min_border=0, 
              toolbar_location=None
             )
plot.vbar(x='index', 
          top='sale_value', 
          source=source, 
          bottom=0, 
          width=0.5, 
          fill_color="#b3de69"
         )
plot.xaxis.ticker = my_ticks
plot.xaxis.major_label_overrides = my_tick_labels
show(plot)

I like the second one more, because it is a bit shorter.

The created figure is in both cases the same. It looks like this. 在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM