简体   繁体   English

R堆叠的条形图,包括“其他”(使用ggplot2)

[英]R stacked bar charts including “other” (using ggplot2)

I want to make a stacked barchart that describes abundances of taxa at two locations in three different seasons. 我想制作一个堆叠的条形图,描述三个不同季节中两个位置的分类单元数量。 I'm using ggplot2. 我正在使用ggplot2。 Making the plot is ok, but I have 48 taxa so I end up with a lot of different colours in the bar. 可以进行绘图,但是我有48个分类单元,所以在酒吧中最终得到很多不同的颜色。 There are only eight taxa that occur frequently and abundantly, so I'd like to group the others into "Other" for the plot. 只有八个分类单元频繁且大量出现,因此我想将其他分类单元归类为“其他”。

My data looks like this: 我的数据如下所示:

SampleID     TransectID     SampleYear     Season     Location    Taxa1     Taxa2     Taxa3 .... Taxa48
BW15001              1            2015     fall        SiteA         25         0         0           0
BW15001              2            2015     fall        SiteA         32         0         0           2
BW15001              2            2015     fall        SiteA          6         0        45           0
BW15001              3            2015     fall        SiteA         78         1         2           0   

This is what I have tried (modified from here ): 这是我尝试过的(从此处修改):

y <- rowSums(invert[6:54])
x<-invert[6:54]/y
x<-invert[,order(-colSums(x))]

#Extract list of top N Taxa
N<-8
taxa_list<-colnames(x)[1:N]

#remove "__Unknown__" and add it to others
taxa_list<-taxa_list[!grepl("Unknown",taxa_list)]
N<-length(taxa_list)

#Generate a new table with everything added to Others
new_x<-data.frame(x[,colnames(x) %in% taxa_list],
              Others=rowSums(x[,!colnames(x) %in% taxa_list]))
df<-NULL
for (i in 1:dim(new_x)[2]){
  tmp<-data.frame(row.names=NULL,Sample=rownames(new_x),
  Taxa=rep(colnames(new_x)[i],dim(new_x)    [1]),Value=new_x[,i],Type=grouping_info[,1])
   if(i==1){df<-tmp} else {df<-rbind(df,tmp)}
}

To plot the graph: 绘制图形:

colours <- c("#F0A3FF", "#0075DC", "#993F00","#4C005C","#2BCE48","#FFCC99","#808080","#94FFB5","#8F7C00","#9DCC00","#C20088","#003380","#FFA405","#FFA8BB","#426600","#FF0010","#5EF1F2","#00998F","#740AFF","#990000","#FFFF00");

library(ggplot2)
p<-ggplot(df,aes(Sample,Value,fill=Taxa))+
   geom_bar(stat="identity")+
   facet_grid(. ~ Type, drop=TRUE,scale="free",space="free_x")
p<-p+scale_fill_manual(values=colours[1:(N+1)])
p<-p+theme_bw()+ylab("Proportions")
p<-p+ scale_y_continuous(expand = c(0,0))+
  theme(strip.background = element_rect(fill="gray85"))+
  theme(panel.spacing = unit(0.3, "lines"))
p<-p+theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))
p

The main problem that I would like help with today is pulling out the main taxa and lumping the rest as "Other". 我今天需要帮助的主要问题是拔出主要分类单元,将其余的分类为“其他”。 I think I can figure out how to group the graph by Season and Location using facet_grid() later... 我想我可以稍后再使用facet_grid()将图按季节和位置分组...

Thanks! 谢谢!

One way of doing it: 一种方法是:

library(plyr)
d=data.frame(SampleID=rep('BW15001',4),
             TransectID=c(1,2,2,3),
             SampleYear=rep(2015,4),
             Taxa1=c(25,32,6,78),
             Taxa2=c(0,0,0,1),
             Taxa3=c(0,0,45,3))
#Reshape the df so that all taxa columns are melted into two
d=melt(d,id=colnames(d[,1:3]))
d$variable=as.character(d$variable)

# rename all uninteresting taxa as 'other'
`%ni%` <- Negate(`%in%`) # Here I decided to select the ones to keep, but the other way around is fine as well of course
d[d$variable %ni% c('Taxa1','Taxa2'),'variable']='Other' #here you could add a function to automatically determine which taxta you want to keep, as you already did

# aggregate all data for 'other'
d=ddply(d,colnames(d[,1:4]),summarise,value=sum(value)) 

#make your plot, this one is just a bad example
ggplot(d,aes(SampleID,value,fill=variable))+
  geom_bar(stat="identity")+
  facet_grid(. ~ Type, drop=TRUE,scale="free",space="free_x")

Expanding on my comment. 扩展我的评论。 Take a look at the forcats package. 看一看forcats包。 Without a full example, it's hard to say, but the following should work: 没有完整的示例,很难说,但是以下方法应该起作用:

library(tidyverse)
library(forcats)

temp <- df %>%
  gather(taxa, amount, -c(1:5))

# Reshape the data so that that there is one record per each amount
tidy_df <- temp[rep(rownames(temp), times = temp$amount), ]

tidy_df %>%
  select(-amount) %>%
  mutate(taxa = fct_lump(taxa, n = 2)) %>%       # Check out this line
  ggplot(., aes(x = SampleID, fill = taxa)) +
    geom_bar()

You can change fct_lump(taxa, n = 2) to fct_lump(taxa, n = 8) to group the top 8 categories. 您可以将fct_lump(taxa, n = 2)更改为fct_lump(taxa, n = 8)以对前8个类别进行分组。 Alternatively, you can use fct_lump(taxa, prop = 0.9) to lump things up by proportions. 另外,您可以使用fct_lump(taxa, prop = 0.9)来按比例将事情汇总。

If you are simply going after the "presence" of the taxa in a sample (and not the value or amount), things are a bit simpler and can likely be handled in one pipe: 如果您只是追求样本中分类单元的“存在”(而不是价值或金额),那么事情会更简单一些,并且可以在一个管道中处理:

df %>%
  gather(taxa, amount, -c(1:5)) %>%
  mutate(amount = na_if(amount, 0)) %>%
  na.omit() %>%
  mutate(taxa = fct_lump(taxa, n = 2)) %>%
  ggplot(., aes(x = SampleID, fill = taxa)) +
   geom_bar()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM