簡體   English   中英

R ggplot2堆疊條形圖,按百分比包含幾個類別變量

[英]R ggplot2 stacked barplot by percentage with several categorical variables

這是一個簡單的問題,但是我很難理解ggplot2所需的格式:

我在R中有以下data.table

print(dt)
    ID       category      A    B   C     totalABC                                                                                                                                                                                                                                         
1:  10      group1        1    3   0      4                                                                                                                                                                                                                                         
2:  11      group1        1   11   1      13                                                                                                                                                                                                                                         
3:  12      group2        15  20   2      37                                                                                                                                                                                                                                         
4:  13      group2        6   12   2      20                                                                                                                                                                                                                                         
5:  14      group2        17  83   6      106   
...

我的目標是創建一個比例堆疊的條形圖,如以下示例所示: https : //rpubs.com/escott8908/RGC_Ch3_Gar_Graphs

其中X / totalABC的百分比,其中X是A,B或C的category_type 。我也想按類別執行此操作,例如x軸值應為group1group2等。

作為具體的例子,在的情況下group1 ,有4 + 13 = 17總的元件。

百分比將為percent_A = 11.7%, percent_B = 82.3%, percent_C = 5.9%

正確的ggplot2解決方案似乎是:

library(ggplot2)
pp = ggplot(dt, aes(x=category, y=percentage, fill=category_type)) +                                                                                                                                                                                                                               
          geom_bar(position="dodge", stat="identity")  

我的困惑:如何創建與三個分類值相對應的單個percentage列?

如果以上內容不正確,我該如何格式化data.table以創建堆疊的條形圖?

您可以使用以下代碼:

melt(data.frame( #melt to get each variable (i.e. A, B, C) in a single row
     dt[,-1] %>% #get rid of ID
            group_by(category) %>% #group by category
                  summarise_each(funs(sum))), #get the summation for each variable
                  id.vars=c("category", "totalABC")) %>% 
ggplot(aes(x=category,y=value/totalABC,fill=variable))+ #define the x and y 
       geom_bar(stat = "identity",position="fill") + #make the stacked bars
                scale_y_continuous(labels = scales::percent) #change y axis to % format

它將繪制:

在此處輸入圖片說明

數據:

dt <- structure(list(ID = 10:14, category = structure(c(1L, 1L, 2L, 
    2L, 2L), .Label = c("group1", "group2"), class = "factor"), A = c(1L, 
    1L, 15L, 6L, 17L), B = c(3L, 11L, 20L, 12L, 83L), C = c(0L, 1L, 
    2L, 2L, 6L), totalABC = c(4L, 13L, 37L, 20L, 106L)), .Names = c("ID", 
    "category", "A", "B", "C", "totalABC"), row.names = c(NA, -5L
    ), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000000000100788>)

如果要堅持繪圖所用的代碼怎么辦?

在這種情況下,您可以使用此方法獲取百分比:

df <- melt(data.frame( #melt to get each variable (i.e. A, B, C) in a single row
        dt[,-1] %>% #get rid of ID
          group_by(category) %>% #group by category
            summarise_each(funs(sum))), #get the summation for each variable
              id.vars=c("category", "totalABC")) %>% 
                mutate(percentage = dtf$value*100/dtf$totalABC)

但是需要修改您的ggplot才能正確獲取堆積的條形:

#variable is the column carrying category_type
#position dodge make the bars to be plotted next to each other 
#while fill makes the stacked bars
ggplot(df, aes(x=category, y=percentage, fill=variable)) +           
       geom_bar(position="fill", stat="identity") 

這是一個解決方案:

require(data.table)
require(ggplot2)
require(dplyr)

melt(dt,measure.vars = c("A","B","C"),
     variable.name = "groups",value.name = "nobs") %>%
 ggplot(aes(x=category,y=nobs,fill=groups)) + 
  geom_bar(stat = "identity",position="fill")

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM