简体   繁体   English

合并分组框图r的data.frames

[英]Merge data.frames for grouped boxplot r

I have two data frames z (1 million observations) and b (500k observations). 我有两个数据框z(100万个观测值)和b(500k观测值)。

 z= Tracer time treatment
        15 0 S
        20 0 S
        25 0 X
        04 0 X
        55 15 S
        16 15 S
        15 15 X
        20 15 X

  b= Tracer time treatment
            2 0 S
            35 0 S
            10 0 X
            04 0 X
            20 15 S
            11 15 S
            12 15 X
            25 15 X

I'd like to create grouped boxplots using time as a factor and treatment as colour. 我想使用时间作为因素并将颜色作为处理来创建分组的箱线图。 Essentially I need to bind them together and then differentiate between them but not sure how. 本质上,我需要将它们绑定在一起,然后区分它们,但不确定如何。 One way I tried was using: 我尝试的一种方法是使用:

zz<-factor(rep("Z", nrow(z))
bb<-factor(rep("B",nrow(b))
dumB<-merge(z,zz) #this won't work because it says it's too big
dumB<-merge(b,zz)
total<-rbind(dumB,dumZ)

But z and zz merge won't work because it says it's 10G in size (which can't be right) 但是z和zz合并将无法正常工作,因为它说它的大小为10G(可能不正确)

The end plot might be similar to this example: Boxplot with two levels and multiple data.frames 最终图可能类似于此示例: 具有两个级别和多个data.frames的Boxplot

Any thoughts? 有什么想法吗?

Cheers, 干杯,

EDIT: Added boxplot 编辑:添加的箱线图 在此处输入图片说明

I would approach it as follows: 我将按以下方式处理:

# create a list of your data.frames
l <- list(z,b)
# assign names to the dataframes in the list
names(l) <- c("z","b")

# bind the dataframes together with rbindlist from data.table
# the id parameter will create a variable with the names of the dataframes
# you could also use 'bind_rows(l, .id="id")' from 'dplyr' for this
library(data.table)
zb <- rbindlist(l, id="id")

# create the plot
ggplot(zb, aes(x=factor(time), y=Tracer, color=treatment)) +
  geom_boxplot() +
  facet_wrap(~id) +
  theme_bw()

which gives: 这使:

在此处输入图片说明

Other alternatives for creating your plot: 创建情节的其他替代方法:

# facet by 'time'
ggplot(zb, aes(x=id, y=Tracer, color=treatment)) + 
  geom_boxplot() + 
  facet_wrap(~time) + 
  theme_bw()

# facet by 'time' & color by 'id' instead of 'treatment'
ggplot(zb, aes(x=treatment, y=Tracer, color=id)) + 
  geom_boxplot() + 
  facet_wrap(~time) + 
  theme_bw()

In respons to your last comment: to get everything in one plot, you use interaction to distinguish between the different groupings as follows: 为了回应您的最后一条评论:要在一个图中获得所有内容,请使用interaction来区分不同的分组,如下所示:

ggplot(zb, aes(x=treatment, y=Tracer, color=interaction(id, time))) + 
  geom_boxplot(width = 0.7, position = position_dodge(width = 0.7)) + 
  theme_bw()

which gives: 这使:

在此处输入图片说明

The key is you do not need to perform a merge , which is computationally expensive on large tables. 关键是您不需要执行merge ,这对于大型表而言在计算上非常昂贵。 Instead assign a new variable and value (source c(b,z) in my code below) to each dataframe and then rbind . 而是为每个数据帧分配一个新的变量和值(在下面的代码中为源c(b,z)),然后为rbind Then it becomes straight forward, my solution is very similar to @Jaap's just with different faceting. 然后变得直截了当,我的解决方案与@Jaap的解决方案非常相似,只是具有不同的方面。

library(ggplot2)
#Create some mock data
t<-seq(1,55,by=2)
z<-data.frame(tracer=sample(t,size = 10,replace = T), time=c(0,15), treatment=c("S","X"))
b<-data.frame(tracer=sample(t,size = 10,replace = T), time=c(0,15), treatment=c("S","X"))
#Add a variable to each table to id itself
b$source<-"b"
z$source<-"z"
#concatenate the tables together
all<-rbind(b,z)

ggplot(all, aes(source, tracer, group=interaction(treatment,source), fill=treatment)) +
  geom_boxplot() + facet_grid(~time)

按时间,处理和DF分组的流程图

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM