简体   繁体   中英

How to plot stacked bars within grouped bars within further grouped bars in a bar-chart using Python (or R)

I have the following Pandas df I would like to plot:

    Segment length     Parameter  Parameter value  Train score  Test score
0               16  n_estimators              5.0     0.975414    0.807823
1               16  n_estimators             10.0     0.982342    0.756803
2               16  n_estimators             15.0     1.000000    0.801020
3               16     max_depth              2.0     0.580884    0.284014
4               16     max_depth              6.0     1.000000    0.824830
5               16     max_depth             10.0     1.000000    0.824830
6               16  max_features              0.1     1.000000    0.845238
7               16  max_features              0.3     1.000000    0.845238
8               16  max_features              0.5     1.000000    0.845238
9               32  n_estimators              5.0     0.961905    0.714286
10              32  n_estimators             10.0     0.988095    0.857143
11              32  n_estimators             15.0     1.000000    0.857143
12              32     max_depth              2.0     0.785714    0.571429
13              32     max_depth              6.0     1.000000    0.857143
14              32     max_depth             10.0     1.000000    0.857143
15              32  max_features              0.1     1.000000    0.904762
16              32  max_features              0.3     1.000000    0.904762
17              32  max_features              0.5     1.000000    0.857143

The plot I imagine is a grouped bar-chart containing groups by 'segment length', containing further groups by 'parameter', containing further groups by 'value', containing two bars of 'train score' and 'test score' (either side-by-side or stacked)... Now that's a handful, but it works on paper. I've been trying to get this to work in Matplotlib (or R) all day without success. Does anybody have a suggestion on how to get this to work?

(NB in the above dataframe I have two 'Segment length' groups, and only three 'Parameter value' groups per parameter; eventually this will be 6 groups and 10 or so groups each respectfully.)

Here is a suggestion using R: We can switch the grouping dynamics: eg fill and faceting.

What we do here:

  1. Bring Score in long format
  2. Group and calculate the mean and sd
  3. plot with ggplot
library(tidyverse) 
library(ggsci)
df %>% 
  pivot_longer(ends_with("score")) %>% 
  group_by(name, Segment_length, Parameter) %>% 
  summarise(mean_value = mean(value), sd_value = sd(value)) %>% 
  ggplot(aes(x= name, y = mean_value, fill=factor(Segment_length)))+
  geom_bar(stat="identity",position="dodge")+
  facet_wrap(. ~ Parameter)+
  geom_errorbar(mapping=aes(ymin=mean_value-sd_value,ymax=mean_value+sd_value),
                width=0.2,position=position_dodge(width=0.9))+
  theme_classic()+
  scale_fill_nejm() +
  labs(x="Test/Train", y="Score", fill="Segment Length") 

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM