[英]arrange multiple graphs using a for loop in ggplot2
I want to produce a pdf which shows multiple graphs, one for each NetworkTrackingPixelId
. 我想生成一个显示多个图形的pdf,每个图形对应一个
NetworkTrackingPixelId
。 I have a data frame similar to this: 我有一个类似于这样的数据框:
> head(data)
NetworkTrackingPixelId Name Date Impressions
1 2421 Rubicon RTB 2014-02-16 168801
2 2615 Google RTB 2014-02-16 1215235
3 3366 OpenX RTB 2014-02-16 104419
4 3606 AppNexus RTB 2014-02-16 170757
5 3947 Pubmatic RTB 2014-02-16 68690
6 4299 Improve Digital RTB 2014-02-16 701
I was thinking to use a script similar to the one below: 我想使用类似下面的脚本:
# create a vector which stores the NetworkTrackingPixelIds
tp <- data %.%
group_by(NetworkTrackingPixelId) %.%
select(NetworkTrackingPixelId)
# create a for loop to print the line graphs
for (i in tp) {
print(ggplot(data[which(data$NetworkTrackingPixelId == i), ], aes(x = Date, y = Impressions)) + geom_point() + geom_line())
}
I was expecting this command to produce many graphs, one for each NetworkTrackingPixelId. 我期待这个命令产生许多图形,每个NetworkTrackingPixelId一个。 Instead the result is an unique graph which aggregate all the NetworkTrackingPixelIds.
相反,结果是一个汇总所有NetworkTrackingPixelIds的唯一图形。
Another thing I've noticed is that the variable tp
is not a real vector. 我注意到的另一件事是变量
tp
不是真正的向量。
> is.vector(tp)
[1] FALSE
Even if I force it.. 即使我强迫它..
tp <- as.vector(data %.%
group_by(NetworkTrackingPixelId) %.%
select(NetworkTrackingPixelId))
> is.vector(tp)
[1] FALSE
> str(tp)
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 1397 obs. of 1 variable:
$ NetworkTrackingPixelId: int 2421 2615 3366 3606 3947 4299 4429 4786 6046 6286 ...
- attr(*, "vars")=List of 1
..$ : symbol NetworkTrackingPixelId
- attr(*, "drop")= logi TRUE
- attr(*, "indices")=List of 63
..$ : int 24 69 116 162 205 253 302 351 402 454 ...
..$ : int 1 48 94 140 184 232 281 330 380 432 ...
[I've cut a bit this output]
- attr(*, "group_sizes")= int 29 29 2 16 29 1 29 29 29 29 ...
- attr(*, "biggest_group_size")= int 29
- attr(*, "labels")='data.frame': 63 obs. of 1 variable:
..$ NetworkTrackingPixelId: int 8799 2615 8854 8869 4786 7007 3947 9109 9126 9137 ...
..- attr(*, "vars")=List of 1
.. ..$ : symbol NetworkTrackingPixelId
Since I don't have your dataset, I will use the mtcars
dataset to illustrate how to do this using dplyr
and data.table
. 由于我没有您的数据集,我将使用
mtcars
数据集来说明如何使用dplyr
和data.table
执行此操作。 Both packages are the finest examples of the split-apply-combine
paradigm in rstats. 这两个包都是rstats中
split-apply-combine
范例的最好例子。 Let me explain: 让我解释:
Step 1 Split data by gear 步骤1按齿轮分割数据
dplyr
uses the function group_by
dplyr
使用函数group_by
data.table
uses argument by
data.table
使用论证by
Step 2: Apply a function 第2步:应用功能
dplyr
uses do
to which you can pass a function that uses the pieces x. dplyr
使用do
来传递使用x片段的函数。 data.table
interprets the variables to the function in context of each piece. data.table
将变量解释为每个部分的上下文中的函数。 Step 3: Combine 第3步:结合
There is no combine step here, since we are saving the charts created to file. 这里没有组合步骤,因为我们将创建的图表保存到文件中。
library(dplyr)
mtcars %.%
group_by(gear) %.%
do(function(x){ggsave(
filename = sprintf("gear_%s.pdf", unique(x$gear)), qplot(wt, mpg, data = x)
)})
library(data.table)
mtcars_dt = data.table(mtcars)
mtcars_dt[,ggsave(
filename = sprintf("gear_%s.pdf", unique(gear)), qplot(wt, mpg)),
by = gear
]
UPDATE: To save all files into one pdf, here is a quick solution. 更新:要将所有文件保存为一个pdf,这是一个快速的解决方案。
plots = mtcars %.%
group_by(gear) %.%
do(function(x) {
qplot(wt, mpg, data = x)
})
pdf('all.pdf')
invisible(lapply(plots, print))
dev.off()
I recently had a project that required producing a lot of individual pngs for each record. 我最近有一个项目需要为每个记录生成大量的个人png。 I found I got a huge speed up doing some pretty simple parallelization.
我发现我加快了一些非常简单的并行化。 I am not sure if this is more performant than the
dplyr
or data.table
technique but it may be worth trying. 我不确定这是否比
dplyr
或data.table
技术更data.table
但它可能值得尝试。 I saw a huge speed bump: 我看到一个巨大的减速带:
require(foreach)
require(doParallel)
workers <- makeCluster(4)
registerDoParallel(workers)
foreach(i = seq(1, length(mtcars$gear)), .packages=c('ggplot2')) %dopar% {
j <- qplot(wt, mpg, data = mtcars[i,])
png(file=paste(getwd(), '/images/',mtcars[i, c('gear')],'.png', sep=''))
print(j)
dev.off()
}
I think you would be better off writing a function for plotting, then using lapply for every Network Tracking Pixel. 我认为你最好编写一个绘图功能,然后对每个网络跟踪像素使用lapply。
For example, your function might look like: 例如,您的函数可能如下所示:
plot.function <- function(ntpid){
sub = subset(dataset, dataset$networktrackingpixelid == ntpid)
ggobj = ggplot(data=sub, aes(...)) + geom...
ggsave(filename=sprintf("%s.pdf", ntpid))
}
It would be helpful for you to put a reproducible example, but I hope this works! 你可以用一个可重复的例子来帮助你,但我希望这个有用! Not sure about the vector issue though..
虽然不确定矢量问题..
Cheers! 干杯!
Unless I'm missing something, generating plots by a subsetting variable is very simple. 除非我遗漏了某些内容,否则通过子集化变量生成绘图非常简单。 You can use
split(...)
to split the original data into a list of data frames by NetworkTrackingPixelId
, and then pass those to ggplot
using lapply(...)
. 您可以使用
split(...)
将原始数据拆分为NetworkTrackingPixelId
的数据帧列表,然后使用lapply(...)
将它们传递给ggplot
。 Most of the code below is just to crate a sample dataset. 下面的大多数代码只是为了创建一个样本数据集。
# create sample data
set.seed(1)
names <- c("Rubicon","Google","OpenX","AppNexus","Pubmatic")
dates <- as.Date("2014-02-16")+1:10
df <- data.frame(NetworkTrackingPixelId=rep(1:5,each=10),
Name=sample(names,50,replace=T),
Date=dates,
Impressions=sample(1000:10000,50))
# end create sample data
pdf("plots.pdf")
lapply(split(df,df$NetworkTrackingPixelId),
function(gg) ggplot(gg,aes(x = Date, y = Impressions)) +
geom_point() + geom_line()+
ggtitle(paste("NetworkTrackingPixelId:",gg$NetworkTrackingPixelId)))
dev.off()
This generates a pdf containing 5 plots, one for each NetworkTrackingPixelId
. 这将生成一个包含5个图的pdf,每个图对应一个
NetworkTrackingPixelId
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.