简体   繁体   中英

Stacked histogram plot in R

I want to plot a histogram with ggplot of the counts of the variable. However, I want the bars to each show the relative fraction of a second (categorical) variable.

For example the sum of four variable is always 1. I want to plot a histogram based on the counts variable.

library(reshape)
library(ggplot2)

values= replicate(4, diff(c(0, sort(runif(92)), 1)))
 colnames(values) = c("A","B","C","D")
 counts = sample(1:100, 93, replace=T)
 df = data.frame(cbind(values,"count"=counts))
 mdf = melt(df,id="count")



ggplot(mdf, aes(count,fill=variable)) +
  geom_histogram(alpha=0.3, 
   position="identity",lwd=0.2,binwidth=5,boundary=0)

I want each bars of historgram to be coloured based on the on the relative fraction of column(A,B,C,D). so each bin should have four categorical variables.

I think this is what you want (I used dplyr package as well):

library(reshape2)
library(ggplot2)
library(dplyr)

set.seed(2)
values= replicate(4, diff(c(0, sort(runif(92)), 1)))
colnames(values) = c("A","B","C","D")
counts = sample(1:100, 93, replace=T)
df = data.frame(cbind(values,"count"=counts))
mdf = melt(df,id="count")

mdf = mdf %>%
  mutate(binCounts = cut(count, breaks = seq(0, 100, by = 5))) %>%
  group_by(binCounts) %>%
  mutate(sumVal = sum(value)) %>%
  ungroup() %>%
  group_by(binCounts, variable) %>%
  summarise(prct = sum(value)/mean(sumVal))

plot = ggplot(mdf) +
  geom_bar(aes(x=binCounts, y=prct, fill=variable), stat="identity") +
  theme(axis.text.x=element_text(angle = 90, hjust=1))

print(plot)

在此处输入图片说明

I found the answer with the help of others in this post. I want each bar of the plot as the fraction of the variables in (A,B,C,D).Though the code is not elegant. Might be helpful for someone !! 在此处输入图片说明

library(reshape2)
library(ggplot2)
library(dplyr)

##generate the random variables that sum to 1 for each columns
values <- matrix(runif(100*4),nrow=100) 
S <- apply(values,1,sum); values = values/S 
colnames(values) = c("A","B","C","D")
set.seed(2)
counts = sample(1:100, 100, replace=T)

##frequency of the data in binwidth of 5
table = hist(counts,breaks=seq(0, 100, by = 5),plot=F)$counts

##create a dataframe
df = data.frame(cbind(values,"count"=counts))


breaks = seq(5, 100, by = 5)
newdf = do.call("rbind",lapply(as.numeric(breaks), function(x) apply(df[which(df$count < x),][,1:4],2,sum)))
newdf = melt(sweep(newdf, 1, rowSums(newdf), FUN="/") * table)
colnames(newdf) = c("bins","variable","value")
ggplot(newdf) +
  geom_bar(aes(x=bins, y=value, fill=variable), stat="identity") +
  theme(axis.text.x=element_text(angle = 90, hjust=1))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM