簡體   English   中英

R:如何將數據聚合成百分比而不丟失ggplot2中堆積條圖的數據?

[英]R: How to aggregate data into percentages without missing data for stacked-bar plot in ggplot2?

我想通過位置和底物(見下面的樣品數據)總結我的“核型”分子數據,作為百分比,以便在ggplot2中創建堆棧條圖。

我已經想出如何使用'dcast'得到每個核型的總數,但無法弄清楚如何獲得三個核型中的每一個的百分比(即'BB','BD','DD')。

數據的格式應為'ggplot2'中的堆積條形圖。

樣本數據:

library(reshape2)
Karotype.Data <- structure(list(Location = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L), .Label = c("Kampinge", "Kaseberga", "Molle", "Steninge"
), class = "factor"), Substrate = structure(c(1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 
2L, 2L, 2L, 2L, 2L), .Label = c("Kampinge", "Kaseberga", "Molle", 
"Steninge"), class = "factor"), Karyotype = structure(c(1L, 3L, 
4L, 4L, 3L, 3L, 4L, 4L, 4L, 3L, 1L, 4L, 3L, 4L, 4L, 3L, 1L, 4L, 
3L, 3L, 4L, 3L, 4L, 3L, 3L), .Label = c("", "BB", "BD", "DD"), class = "factor")), .Names = c("Location", 
"Substrate", "Karyotype"), row.names = c(135L, 136L, 137L, 138L, 
139L, 165L, 166L, 167L, 168L, 169L, 236L, 237L, 238L, 239L, 240L, 
326L, 327L, 328L, 329L, 330L, 426L, 427L, 428L, 429L, 430L), class = "data.frame")

## Summary count for each karoytype ##
Karyotype.Summary <- dcast(Karotype.Data , Location + Substrate ~ Karyotype, value.var="Karyotype", length)

您可以使用dplyr包:

library(dplyr)
z.counts <- Karotype.Data %>% 
  group_by(Location,Substrate,Karyotype) %>% 
  summarize(freq=n()) 

z.freq <- z.counts %>% 
  group_by(Location,Substrate) %>% 
  mutate(freq=freq/sum(freq)*100)

在這里,數據保持長格式,因此使用ggplot構建條形圖很簡單:

library(ggplot2)
ggplot(z.freq) + 
  aes(x=Karyotype,y=freq) + 
  facet_grid(Location~Substrate) + 
  geom_bar(stat='identity')

在此輸入圖像描述

在'Marat Talipov'的幫助下以及Stackoverflow問題的許多其他答案中,我發現在'dplyr'之前加載'plyr'並使用'summarize'而不是'summarize'是很重要的。 然后刪除丟失的數據是使用“過濾器”的最后一步。

library(dplyr)
z.counts <- Karotype.Data %>% 
  group_by(Location,Substrate,Karyotype) %>% 
  summarise(freq=n()) 

z.freq <- z.counts %>% filter(Karyotype != '') %>% 
  group_by(Location,Substrate) %>% 
  mutate(freq=freq/sum(freq))
z.freq

library (ggplot2)
ggplot(z.freq, aes(x=Substrate, y=freq, fill=Karyotype)) +
  geom_bar(stat="identity") +
  facet_wrap(~ Location)

現在我創建了我正在尋找的情節:

在此輸入圖像描述

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM