[英]Merge data.frames for grouped boxplot r
I have two data frames z (1 million observations) and b (500k observations). 我有两个数据框z(100万个观测值)和b(500k观测值)。
z= Tracer time treatment
15 0 S
20 0 S
25 0 X
04 0 X
55 15 S
16 15 S
15 15 X
20 15 X
b= Tracer time treatment
2 0 S
35 0 S
10 0 X
04 0 X
20 15 S
11 15 S
12 15 X
25 15 X
I'd like to create grouped boxplots using time as a factor and treatment as colour. 我想使用时间作为因素并将颜色作为处理来创建分组的箱线图。 Essentially I need to bind them together and then differentiate between them but not sure how. 本质上,我需要将它们绑定在一起,然后区分它们,但不确定如何。 One way I tried was using: 我尝试的一种方法是使用:
zz<-factor(rep("Z", nrow(z))
bb<-factor(rep("B",nrow(b))
dumB<-merge(z,zz) #this won't work because it says it's too big
dumB<-merge(b,zz)
total<-rbind(dumB,dumZ)
But z and zz merge won't work because it says it's 10G in size (which can't be right) 但是z和zz合并将无法正常工作,因为它说它的大小为10G(可能不正确)
The end plot might be similar to this example: Boxplot with two levels and multiple data.frames 最终图可能类似于此示例: 具有两个级别和多个data.frames的Boxplot
Any thoughts? 有什么想法吗?
Cheers, 干杯,
I would approach it as follows: 我将按以下方式处理:
# create a list of your data.frames
l <- list(z,b)
# assign names to the dataframes in the list
names(l) <- c("z","b")
# bind the dataframes together with rbindlist from data.table
# the id parameter will create a variable with the names of the dataframes
# you could also use 'bind_rows(l, .id="id")' from 'dplyr' for this
library(data.table)
zb <- rbindlist(l, id="id")
# create the plot
ggplot(zb, aes(x=factor(time), y=Tracer, color=treatment)) +
geom_boxplot() +
facet_wrap(~id) +
theme_bw()
which gives: 这使:
Other alternatives for creating your plot: 创建情节的其他替代方法:
# facet by 'time'
ggplot(zb, aes(x=id, y=Tracer, color=treatment)) +
geom_boxplot() +
facet_wrap(~time) +
theme_bw()
# facet by 'time' & color by 'id' instead of 'treatment'
ggplot(zb, aes(x=treatment, y=Tracer, color=id)) +
geom_boxplot() +
facet_wrap(~time) +
theme_bw()
In respons to your last comment: to get everything in one plot, you use interaction
to distinguish between the different groupings as follows: 为了回应您的最后一条评论:要在一个图中获得所有内容,请使用interaction
来区分不同的分组,如下所示:
ggplot(zb, aes(x=treatment, y=Tracer, color=interaction(id, time))) +
geom_boxplot(width = 0.7, position = position_dodge(width = 0.7)) +
theme_bw()
which gives: 这使:
The key is you do not need to perform a merge
, which is computationally expensive on large tables. 关键是您不需要执行merge
,这对于大型表而言在计算上非常昂贵。 Instead assign a new variable and value (source c(b,z) in my code below) to each dataframe and then rbind
. 而是为每个数据帧分配一个新的变量和值(在下面的代码中为源c(b,z)),然后为rbind
。 Then it becomes straight forward, my solution is very similar to @Jaap's just with different faceting. 然后变得直截了当,我的解决方案与@Jaap的解决方案非常相似,只是具有不同的方面。
library(ggplot2)
#Create some mock data
t<-seq(1,55,by=2)
z<-data.frame(tracer=sample(t,size = 10,replace = T), time=c(0,15), treatment=c("S","X"))
b<-data.frame(tracer=sample(t,size = 10,replace = T), time=c(0,15), treatment=c("S","X"))
#Add a variable to each table to id itself
b$source<-"b"
z$source<-"z"
#concatenate the tables together
all<-rbind(b,z)
ggplot(all, aes(source, tracer, group=interaction(treatment,source), fill=treatment)) +
geom_boxplot() + facet_grid(~time)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.