使用ggplot2中的for循环排列多个图形

Question

I want to produce a pdf which shows multiple graphs, one for each NetworkTrackingPixelId . 我想生成一个显示多个图形的pdf，每个图形对应一个NetworkTrackingPixelId 。 I have a data frame similar to this: 我有一个类似于这样的数据框：

> head(data)
  NetworkTrackingPixelId                           Name       Date Impressions
1                   2421                    Rubicon RTB 2014-02-16      168801
2                   2615                     Google RTB 2014-02-16     1215235
3                   3366                      OpenX RTB 2014-02-16      104419
4                   3606                   AppNexus RTB 2014-02-16      170757
5                   3947                   Pubmatic RTB 2014-02-16       68690
6                   4299            Improve Digital RTB 2014-02-16         701

I was thinking to use a script similar to the one below: 我想使用类似下面的脚本：

# create a vector which stores the NetworkTrackingPixelIds
tp <- data %.%
        group_by(NetworkTrackingPixelId) %.%
        select(NetworkTrackingPixelId)

# create a for loop to print the line graphs
for (i in tp) {
      print(ggplot(data[which(data$NetworkTrackingPixelId == i), ], aes(x = Date, y = Impressions)) + geom_point() + geom_line())
    }

I was expecting this command to produce many graphs, one for each NetworkTrackingPixelId. 我期待这个命令产生许多图形，每个NetworkTrackingPixelId一个。 Instead the result is an unique graph which aggregate all the NetworkTrackingPixelIds. 相反，结果是一个汇总所有NetworkTrackingPixelIds的唯一图形。

Another thing I've noticed is that the variable tp is not a real vector. 我注意到的另一件事是变量tp不是真正的向量。

> is.vector(tp)
[1] FALSE

Even if I force it.. 即使我强迫它..

tp <- as.vector(data %.%
        group_by(NetworkTrackingPixelId) %.%
        select(NetworkTrackingPixelId))
> is.vector(tp)
[1] FALSE
> str(tp)
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 1397 obs. of  1 variable:
 $ NetworkTrackingPixelId: int  2421 2615 3366 3606 3947 4299 4429 4786 6046 6286 ...
 - attr(*, "vars")=List of 1
  ..$ : symbol NetworkTrackingPixelId
 - attr(*, "drop")= logi TRUE
 - attr(*, "indices")=List of 63
  ..$ : int  24 69 116 162 205 253 302 351 402 454 ...
  ..$ : int  1 48 94 140 184 232 281 330 380 432 ...

[I've cut a bit this output]

 - attr(*, "group_sizes")= int  29 29 2 16 29 1 29 29 29 29 ...
 - attr(*, "biggest_group_size")= int 29
 - attr(*, "labels")='data.frame':  63 obs. of  1 variable:
  ..$ NetworkTrackingPixelId: int  8799 2615 8854 8869 4786 7007 3947 9109 9126 9137 ...
  ..- attr(*, "vars")=List of 1
  .. ..$ : symbol NetworkTrackingPixelId

Answer 1

Since I don't have your dataset, I will use the mtcars dataset to illustrate how to do this using dplyr and data.table . 由于我没有您的数据集，我将使用mtcars数据集来说明如何使用dplyr和data.table执行此操作。 Both packages are the finest examples of the split-apply-combine paradigm in rstats. 这两个包都是rstats中split-apply-combine范例的最好例子。 Let me explain: 让我解释：

Step 1 Split data by gear 步骤1按齿轮分割数据

dplyr uses the function group_by dplyr使用函数group_by
data.table uses argument by data.table使用论证by

Step 2: Apply a function 第2步：应用功能

dplyr uses do to which you can pass a function that uses the pieces x. dplyr使用do来传递使用x片段的函数。
data.table interprets the variables to the function in context of each piece. data.table将变量解释为每个部分的上下文中的函数。

Step 3: Combine 第3步：结合

There is no combine step here, since we are saving the charts created to file. 这里没有组合步骤，因为我们将创建的图表保存到文件中。

library(dplyr)
mtcars %.%
  group_by(gear) %.%
  do(function(x){ggsave(
    filename = sprintf("gear_%s.pdf", unique(x$gear)), qplot(wt, mpg, data = x)
  )})

library(data.table)
mtcars_dt = data.table(mtcars)
mtcars_dt[,ggsave(
  filename = sprintf("gear_%s.pdf", unique(gear)), qplot(wt, mpg)),
  by = gear
]

UPDATE: To save all files into one pdf, here is a quick solution. 更新：要将所有文件保存为一个pdf，这是一个快速的解决方案。

plots = mtcars %.%
  group_by(gear) %.%
  do(function(x) {
    qplot(wt, mpg, data = x)
  })

pdf('all.pdf')
invisible(lapply(plots, print))
dev.off()

Answer 2

I recently had a project that required producing a lot of individual pngs for each record. 我最近有一个项目需要为每个记录生成大量的个人png。 I found I got a huge speed up doing some pretty simple parallelization. 我发现我加快了一些非常简单的并行化。 I am not sure if this is more performant than the dplyr or data.table technique but it may be worth trying. 我不确定这是否比dplyr或data.table技术更data.table但它可能值得尝试。 I saw a huge speed bump: 我看到一个巨大的减速带：

require(foreach)
require(doParallel)
workers <- makeCluster(4)
registerDoParallel(workers) 
foreach(i = seq(1, length(mtcars$gear)), .packages=c('ggplot2')) %dopar% {
  j <- qplot(wt, mpg, data = mtcars[i,])
  png(file=paste(getwd(), '/images/',mtcars[i, c('gear')],'.png', sep=''))
  print(j)
  dev.off()
}

Answer 3

I think you would be better off writing a function for plotting, then using lapply for every Network Tracking Pixel. 我认为你最好编写一个绘图功能，然后对每个网络跟踪像素使用lapply。

For example, your function might look like: 例如，您的函数可能如下所示：

    plot.function <- function(ntpid){
    sub = subset(dataset, dataset$networktrackingpixelid == ntpid)
    ggobj = ggplot(data=sub, aes(...)) + geom...
    ggsave(filename=sprintf("%s.pdf", ntpid))
    }

It would be helpful for you to put a reproducible example, but I hope this works! 你可以用一个可重复的例子来帮助你，但我希望这个有用！ Not sure about the vector issue though.. 虽然不确定矢量问题..

Cheers! 干杯!

Answer 4

Unless I'm missing something, generating plots by a subsetting variable is very simple. 除非我遗漏了某些内容，否则通过子集化变量生成绘图非常简单。 You can use split(...) to split the original data into a list of data frames by NetworkTrackingPixelId , and then pass those to ggplot using lapply(...) . 您可以使用split(...)将原始数据拆分为NetworkTrackingPixelId的数据帧列表，然后使用lapply(...)将它们传递给ggplot 。 Most of the code below is just to crate a sample dataset. 下面的大多数代码只是为了创建一个样本数据集。

# create sample data
set.seed(1)
names <- c("Rubicon","Google","OpenX","AppNexus","Pubmatic")
dates <- as.Date("2014-02-16")+1:10
df <- data.frame(NetworkTrackingPixelId=rep(1:5,each=10),
                 Name=sample(names,50,replace=T),
                 Date=dates,
                 Impressions=sample(1000:10000,50))
# end create sample data

pdf("plots.pdf")
lapply(split(df,df$NetworkTrackingPixelId),
       function(gg) ggplot(gg,aes(x = Date, y = Impressions)) + 
          geom_point() + geom_line()+
          ggtitle(paste("NetworkTrackingPixelId:",gg$NetworkTrackingPixelId)))
dev.off()

This generates a pdf containing 5 plots, one for each NetworkTrackingPixelId . 这将生成一个包含5个图的pdf，每个图对应一个NetworkTrackingPixelId 。

使用ggplot2中的for循环排列多个图形

问题描述

4 个解决方案

解决方案1
12 已采纳 2014-03-17 20:41:55

解决方案2
1 2014-03-17 21:17:03

解决方案3
0 2014-03-17 20:38:34

解决方案4
0 2014-03-17 22:53:55

使用ggplot2中的for循环排列多个图形

问题描述

4 个解决方案

解决方案1 12 已采纳 2014-03-17 20:41:55

解决方案2 1 2014-03-17 21:17:03

解决方案3 0 2014-03-17 20:38:34

解决方案4 0 2014-03-17 22:53:55

解决方案1
12 已采纳 2014-03-17 20:41:55

解决方案2
1 2014-03-17 21:17:03

解决方案3
0 2014-03-17 20:38:34

解决方案4
0 2014-03-17 22:53:55