I am fairly new to pandas and matplotlib, and I am not sure what the propper way is to achieve the following:
I have below (example) data so_df
:
IN:
import pandas as pd
so_df = pd.DataFrame({
"CATEGORY" : ["A", "B", "A", "B"],
"CONTEXT" : [ 1 , 1 , 0 , 0],
"COUNT" : [100, 111, 50 , 55]
})
so_df
OUT:
CATEGORY CONTEXT COUNT
0 A 1 100
1 B 1 111
2 A 0 50
3 B 0 55
Now I want to create a stacked bar-plot with y="COUNT"
by CATEGORY
and X="CONTEXT"
. The only way I know how to achieve this is by slicing and merging like so:
IN:
cat_a_df = so_df[so_df["CATEGORY"] == "A"] \
.rename(columns={"COUNT" : "COUNT A"}) \
.loc[:,["CONTEXT", "COUNT A"]]
cat_b_df = so_df[so_df["CATEGORY"] == "B"] \
.rename(columns={"COUNT" : "COUNT B"}) \
.loc[:,["CONTEXT", "COUNT B"]]
stacked_df = cat_a_df.merge(cat_b_df, on="CONTEXT")
stacked_df
OUT:
CONTEXT COUNT A COUNT B
0 1 100 111
1 0 50 55
And then plot the new dataframe as usual:
stacked_df.plot(kind='bar', stacked=True, x="CONTEXT")
But this seems way to complicated for what seems like a rather simple task. Is there a better way to do this?
You can do it all in one line:
so_df.groupby(['CONTEXT', 'CATEGORY']).sum()['COUNT'].unstack().plot.bar(stacked=True)
We group by 'CONTEXT'
and 'CATEGORY
', then apply .sum()
to get from a groupby
object to a dataframe - in your case, the sum won't do anything. Finally we unstack
to get one column for A
and one column for B
. Plotting this gives:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.