Pandas: multiple bar plot from aggregated columns

Question

In python pandas I have create a dataframe with one value for each year and two subclasses - ie, one metric for a parameter triplet

import pandas, requests, numpy
import matplotlib.pyplot as plt

df

       Metric    Tag_1  Tag_2  year
0     5770832  FOOBAR1  name1  2008
1     7526436  FOOBAR1    xyz  2008
2    33972652  FOOBAR1  name1  2009
3    17491416  FOOBAR1    xyz  2009
...
16    6602920  baznar2  name1  2008
17       6608  baznar2    xyz  2008
...
30  142102944  baznar2  name1  2015
31          0  baznar2    xyz  2015

I would like to produce a bar plot with metrics as y-values over x=(year,Tag_1,Tag_2) and sorting primarily for years and secondly for tag_1 and color the bars depending on tag_1. Something like

(2008,FOOBAR,name1)   --> 5770832  *RED*
(2008,baznar2,name1)  --> 6602920  *BLUE*
(2008,FOOBAR,xyz)     --> 7526436  *RED*
(2008,baznar2,xyz)    --> ...      *BLUE*
(2008,FOOBAR,name1)   --> ...      *RED*

I tried starting with a grouping of columns like

df.plot.bar(x=['year','tag_1','tag_2']

but have not found a way to separate selections into two bar sets next to each other.

Answer 1

This should get you on your way:

df = pd.read_csv('path_to_file.csv')

# Group by the desired columns
new_df = df.groupby(['year', 'Tag_1', 'Tag_2']).sum()
# Sort descending
new_df.sort('Metric', inplace=True)


# Helper function for generation sequence of 'r' 'b' colors
def get_color(i):
    if i%2 == 0:
        return 'r'
    else:
        return 'b'

colors = [get_color(j) for j in range(new_df.shape[0])]

# Make the plot
fig, ax = plt.subplots()
ind = np.arange(new_df.shape[0])
width = 0.65
a = ax.barh(ind, new_df.Metric, width, color = colors) # plot a vals
ax.set_yticks(ind + width)  # position axis ticks
ax.set_yticklabels(new_df.index.values)  # set them to the names
fig.tight_layout()
plt.show()

Answer 2

you can also do it this way:

fig, ax = plt.subplots()
df.groupby(['year', 'Tag_1', 'Tag_2']).sum().plot.barh(color=['r','b'], ax=ax)
fig.tight_layout()
plt.show()

PS if don't like scientific notation you can get rid of it:

ax.get_xaxis().get_major_formatter().set_scientific(False)

Pandas: multiple bar plot from aggregated columns

Question

2 answers

solution1
1 2016-05-27 16:56:51

solution2
0 2016-05-27 20:10:06

Pandas: multiple bar plot from aggregated columns

Question

2 answers

solution1 1 2016-05-27 16:56:51

solution2 0 2016-05-27 20:10:06

solution1
1 2016-05-27 16:56:51

solution2
0 2016-05-27 20:10:06