Plot a pandas dataframe grouped by column

Question

I have the following pandas dataframe 'df':

---------------------------------------------------
             C1     C2     C3      C4      Type
---------------------------------------------------
    Name 
---------------------------------------------------
     x1       a1     b1      c1      d1     'A'
     x2       a2     b2      c2      d2     'A'
     x3       a3     b3      c3      d3     'B'
     x4       a4     b4      c4      d4     'B'
     x5       a5     b5      c5      d5     'A'
     x6       a6     b6      c6      d6     'B'
     x7       a7     b7      c7      d7     'B'
---------------------------------------------------

There are 6 columns in this dataframe : Name, C1, C2, C3, C4, and Type . I would like to generate two line plots (separate plots - not two lines on the same plot) using this dataframe grouped by the 'Type' Column. Basically, I want to plot the values of C1 with respect to Name grouped by Type. So, on one graph, I want to have (x1, c1), (x2, c2), (x5, c5) on one plot, and (x3,c3), (x4, c4), (x6,c6), and (x7,c7) on the other.

Please note that Name, and the other columns are in different rows.

I found a similar question on SO for plotting a boxplot here , so I tried modifying it for line plot. I tried using df.plot(column='C1', by='Type') but seems there is no property 'column' for a plot() .

Any ideas on how I can achieve my objective?

Answer 1

You can add the column "Type" to the index, and unstack it so as to have the values of C1 split in two columns according to the value of Type, and then plot them, eg:

import pandas
df = pandas.DataFrame({'Values': randn(10), 'Categories': list('AABABBABAB')}, index=range(10))
df.set_index('Categories', append=True).unstack().interpolate().plot(subplots=True)

Notice that for a line plot you need the 'interpolate()'.

Alternatively, you can select the data according to the value of "Type" ("Category" in these examples) and plot them separately, eg:

fig, axes = plt.subplots(ncols=2)
df[df.Categories=='A'].Values.plot(ax=axes[0])
df[df.Categories=='B'].Values.plot(ax=axes[1])

Answer 2

The following answer is based on faltarell's second method, but generalised for any number of categories.

Setup:

import pandas
import matplotlib.pyplot as plt
from numpy.random import randn
df = pandas.DataFrame({'Values': randn(10), 
                       'Categories': list('AABABBABAB')},
                       index=range(10))

Draw plots:

categories = df['Categories'].unique()

fig, axes = plt.subplots(ncols=len(categories))

for i, category in enumerate(categories):
    df[df['Categories'] == category]['Values'].plot.line(ax=axes[i])
    axes[i].set_title(category)

You can make a similar single-figure plot with labelled lines as:

fig, ax= plt.subplots()

for category in df['Categories'].unique():
    df[df['Categories'] == category]['Values'].plot.line(ax=ax, label=category)

plt.legend()

Plot a pandas dataframe grouped by column

Question

2 answers

solution1
5 ACCPTED 2015-11-27 09:08:48

solution2
1 2019-05-07 22:12:35

Plot a pandas dataframe grouped by column

Question

2 answers

solution1 5 ACCPTED 2015-11-27 09:08:48

solution2 1 2019-05-07 22:12:35

solution1
5 ACCPTED 2015-11-27 09:08:48

solution2
1 2019-05-07 22:12:35