简体   繁体   中英

Stacked plot of multirow data

I am fairly new to pandas and matplotlib, and I am not sure what the propper way is to achieve the following:


I have below (example) data so_df :

IN:

    import pandas as pd

    so_df = pd.DataFrame({
        "CATEGORY" : ["A", "B", "A", "B"],
        "CONTEXT"  : [ 1 ,  1 ,  0 ,  0],
        "COUNT"    : [100, 111, 50 , 55]
    })
    so_df

OUT:

      CATEGORY  CONTEXT  COUNT
    0        A        1    100
    1        B        1    111
    2        A        0     50
    3        B        0     55

Now I want to create a stacked bar-plot with y="COUNT" by CATEGORY and X="CONTEXT" . The only way I know how to achieve this is by slicing and merging like so:

IN:

    cat_a_df = so_df[so_df["CATEGORY"] == "A"] \
        .rename(columns={"COUNT" : "COUNT A"}) \
        .loc[:,["CONTEXT", "COUNT A"]]

    cat_b_df = so_df[so_df["CATEGORY"] == "B"] \
        .rename(columns={"COUNT" : "COUNT B"}) \
        .loc[:,["CONTEXT", "COUNT B"]]

    stacked_df = cat_a_df.merge(cat_b_df, on="CONTEXT")
    stacked_df

OUT:

       CONTEXT  COUNT A  COUNT B
    0        1      100      111
    1        0       50       55

And then plot the new dataframe as usual:

    stacked_df.plot(kind='bar', stacked=True, x="CONTEXT")

output


But this seems way to complicated for what seems like a rather simple task. Is there a better way to do this?

You can do it all in one line:

so_df.groupby(['CONTEXT', 'CATEGORY']).sum()['COUNT'].unstack().plot.bar(stacked=True)

We group by 'CONTEXT' and 'CATEGORY ', then apply .sum() to get from a groupby object to a dataframe - in your case, the sum won't do anything. Finally we unstack to get one column for A and one column for B . Plotting this gives:

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM