简体   繁体   中英

Merge data.frames for grouped boxplot r

I have two data frames z (1 million observations) and b (500k observations).

 z= Tracer time treatment
        15 0 S
        20 0 S
        25 0 X
        04 0 X
        55 15 S
        16 15 S
        15 15 X
        20 15 X

  b= Tracer time treatment
            2 0 S
            35 0 S
            10 0 X
            04 0 X
            20 15 S
            11 15 S
            12 15 X
            25 15 X

I'd like to create grouped boxplots using time as a factor and treatment as colour. Essentially I need to bind them together and then differentiate between them but not sure how. One way I tried was using:

zz<-factor(rep("Z", nrow(z))
bb<-factor(rep("B",nrow(b))
dumB<-merge(z,zz) #this won't work because it says it's too big
dumB<-merge(b,zz)
total<-rbind(dumB,dumZ)

But z and zz merge won't work because it says it's 10G in size (which can't be right)

The end plot might be similar to this example: Boxplot with two levels and multiple data.frames

Any thoughts?

Cheers,

EDIT: Added boxplot 在此处输入图片说明

I would approach it as follows:

# create a list of your data.frames
l <- list(z,b)
# assign names to the dataframes in the list
names(l) <- c("z","b")

# bind the dataframes together with rbindlist from data.table
# the id parameter will create a variable with the names of the dataframes
# you could also use 'bind_rows(l, .id="id")' from 'dplyr' for this
library(data.table)
zb <- rbindlist(l, id="id")

# create the plot
ggplot(zb, aes(x=factor(time), y=Tracer, color=treatment)) +
  geom_boxplot() +
  facet_wrap(~id) +
  theme_bw()

which gives:

在此处输入图片说明

Other alternatives for creating your plot:

# facet by 'time'
ggplot(zb, aes(x=id, y=Tracer, color=treatment)) + 
  geom_boxplot() + 
  facet_wrap(~time) + 
  theme_bw()

# facet by 'time' & color by 'id' instead of 'treatment'
ggplot(zb, aes(x=treatment, y=Tracer, color=id)) + 
  geom_boxplot() + 
  facet_wrap(~time) + 
  theme_bw()

In respons to your last comment: to get everything in one plot, you use interaction to distinguish between the different groupings as follows:

ggplot(zb, aes(x=treatment, y=Tracer, color=interaction(id, time))) + 
  geom_boxplot(width = 0.7, position = position_dodge(width = 0.7)) + 
  theme_bw()

which gives:

在此处输入图片说明

The key is you do not need to perform a merge , which is computationally expensive on large tables. Instead assign a new variable and value (source c(b,z) in my code below) to each dataframe and then rbind . Then it becomes straight forward, my solution is very similar to @Jaap's just with different faceting.

library(ggplot2)
#Create some mock data
t<-seq(1,55,by=2)
z<-data.frame(tracer=sample(t,size = 10,replace = T), time=c(0,15), treatment=c("S","X"))
b<-data.frame(tracer=sample(t,size = 10,replace = T), time=c(0,15), treatment=c("S","X"))
#Add a variable to each table to id itself
b$source<-"b"
z$source<-"z"
#concatenate the tables together
all<-rbind(b,z)

ggplot(all, aes(source, tracer, group=interaction(treatment,source), fill=treatment)) +
  geom_boxplot() + facet_grid(~time)

按时间,处理和DF分组的流程图

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM