Stacked plot of multirow data

Question

I am fairly new to pandas and matplotlib, and I am not sure what the propper way is to achieve the following:

I have below (example) data so_df :

IN:

    import pandas as pd

    so_df = pd.DataFrame({
        "CATEGORY" : ["A", "B", "A", "B"],
        "CONTEXT"  : [ 1 ,  1 ,  0 ,  0],
        "COUNT"    : [100, 111, 50 , 55]
    })
    so_df

OUT:

      CATEGORY  CONTEXT  COUNT
    0        A        1    100
    1        B        1    111
    2        A        0     50
    3        B        0     55

Now I want to create a stacked bar-plot with y="COUNT" by CATEGORY and X="CONTEXT" . The only way I know how to achieve this is by slicing and merging like so:

IN:

    cat_a_df = so_df[so_df["CATEGORY"] == "A"] \
        .rename(columns={"COUNT" : "COUNT A"}) \
        .loc[:,["CONTEXT", "COUNT A"]]

    cat_b_df = so_df[so_df["CATEGORY"] == "B"] \
        .rename(columns={"COUNT" : "COUNT B"}) \
        .loc[:,["CONTEXT", "COUNT B"]]

    stacked_df = cat_a_df.merge(cat_b_df, on="CONTEXT")
    stacked_df

OUT:

       CONTEXT  COUNT A  COUNT B
    0        1      100      111
    1        0       50       55

And then plot the new dataframe as usual:

    stacked_df.plot(kind='bar', stacked=True, x="CONTEXT")

output

But this seems way to complicated for what seems like a rather simple task. Is there a better way to do this?

Answer 1

You can do it all in one line:

so_df.groupby(['CONTEXT', 'CATEGORY']).sum()['COUNT'].unstack().plot.bar(stacked=True)

We group by 'CONTEXT' and 'CATEGORY ', then apply .sum() to get from a groupby object to a dataframe - in your case, the sum won't do anything. Finally we unstack to get one column for A and one column for B . Plotting this gives:

Stacked plot of multirow data

Question

1 answers

solution1
0 ACCPTED 2018-09-12 16:22:32

Stacked plot of multirow data

Question

1 answers

solution1 0 ACCPTED 2018-09-12 16:22:32

solution1
0 ACCPTED 2018-09-12 16:22:32