简体   繁体   中英

Plot a pandas dataframe grouped by column

I have the following pandas dataframe 'df':

---------------------------------------------------
             C1     C2     C3      C4      Type
---------------------------------------------------
    Name 
---------------------------------------------------
     x1       a1     b1      c1      d1     'A'
     x2       a2     b2      c2      d2     'A'
     x3       a3     b3      c3      d3     'B'
     x4       a4     b4      c4      d4     'B'
     x5       a5     b5      c5      d5     'A'
     x6       a6     b6      c6      d6     'B'
     x7       a7     b7      c7      d7     'B'
---------------------------------------------------

There are 6 columns in this dataframe : Name, C1, C2, C3, C4, and Type . I would like to generate two line plots (separate plots - not two lines on the same plot) using this dataframe grouped by the 'Type' Column. Basically, I want to plot the values of C1 with respect to Name grouped by Type. So, on one graph, I want to have (x1, c1), (x2, c2), (x5, c5) on one plot, and (x3,c3), (x4, c4), (x6,c6), and (x7,c7) on the other.

Please note that Name, and the other columns are in different rows.

I found a similar question on SO for plotting a boxplot here , so I tried modifying it for line plot. I tried using df.plot(column='C1', by='Type') but seems there is no property 'column' for a plot() .

Any ideas on how I can achieve my objective?

You can add the column "Type" to the index, and unstack it so as to have the values of C1 split in two columns according to the value of Type, and then plot them, eg:

import pandas
df = pandas.DataFrame({'Values': randn(10), 'Categories': list('AABABBABAB')}, index=range(10))
df.set_index('Categories', append=True).unstack().interpolate().plot(subplots=True)

Notice that for a line plot you need the 'interpolate()'.

Alternatively, you can select the data according to the value of "Type" ("Category" in these examples) and plot them separately, eg:

fig, axes = plt.subplots(ncols=2)
df[df.Categories=='A'].Values.plot(ax=axes[0])
df[df.Categories=='B'].Values.plot(ax=axes[1])

The following answer is based on faltarell's second method, but generalised for any number of categories.

Setup:

import pandas
import matplotlib.pyplot as plt
from numpy.random import randn
df = pandas.DataFrame({'Values': randn(10), 
                       'Categories': list('AABABBABAB')},
                       index=range(10))

Draw plots:

categories = df['Categories'].unique()

fig, axes = plt.subplots(ncols=len(categories))

for i, category in enumerate(categories):
    df[df['Categories'] == category]['Values'].plot.line(ax=axes[i])
    axes[i].set_title(category)

You can make a similar single-figure plot with labelled lines as:

fig, ax= plt.subplots()

for category in df['Categories'].unique():
    df[df['Categories'] == category]['Values'].plot.line(ax=ax, label=category)

plt.legend()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM