[英]How to use lapply with ggplot2 while indexing variables
I would like to generate several hundred boxplots of continuous data from a large data frame, stratified by the factor "year". 我想从一个大型数据框中生成数百个连续数据的箱线图,并按“年”因子进行分层。 I started by creating a list from the original data frame that contains each dependent variable and the year. 我首先从原始数据框中创建一个列表,其中包含每个因变量和年份。
Here is an example data set that looks like mine: 这是一个看起来像我的示例数据集:
l<-list(data.frame(year=c(rep("2010",10),rep("2011",10),rep("2012",10)),
var1=sample(1:100,30,replace=T)),
data.frame(year=c(rep("2010",10),rep("2011",10),rep("2012",10)),
var2=sample(100:200,30,replace=T)),
data.frame(year=c(rep("2010",10),rep("2011",10),rep("2012",10)),
var3=sample(25:50,30, replace=T)))
The next step was to apply a ggplot2 function over the list. 下一步是在列表上应用ggplot2函数。 Neither of these functions produce plots: 这些函数均不产生图:
lapply(l, function (j) ggplot(j, aes(x=year, y=j[,2], fill=year)) +
geom_boxplot() + ylab(names(j[2])) )
lapply(l, function (j) ggplot(j, aes(x=year, y=j[[1]][2], fill=year)) +
geom_boxplot() + ylab(names(j[2])) )
The same error message is generated from those scripts: 从这些脚本生成相同的错误消息:
Error: No layers in plot" 错误:情节中没有图层”
In actuality, my data frame is much larger -- 2800 observations and over 250 different variables with unique descriptive names (eg "M2_loss", "SSC"). 实际上,我的数据框要大得多-2800个观测值和250多个具有唯一描述性名称的变量(例如“ M2_loss”,“ SSC”)。 Each variable is on a different scale, so using facets is not a good solution. 每个变量的比例不同,因此使用构面不是一个好的解决方案。 What makes this question different from other examples on stackoverflow is that I am trying to index the data rather than explicitly name it. 使这个问题与其他关于stackoverflow的示例不同的原因是,我试图对数据进行索引而不是对其进行显式命名。 It is important that I capture the unique name of each variable and use it to label the y-axis. 重要的是,我必须捕获每个变量的唯一名称并使用它来标记y轴。
Any ideas on how to proceed? 关于如何进行的任何想法?
If you want to have the lapply function actually create output on hte console screen device, it would be a matter of adding a +geom_boxplot
call: 如果要让lapply函数在控制台屏幕设备上实际创建输出,则可以添加+geom_boxplot
调用:
plist <- lapply(l, function (j) print( ggplot(j, aes(x=year, y=j[,2], fill=year))
ylab(names(j[2])) +geom_boxplot() ) )
If you wanted to store in a list and then plot later leave out the print call: 如果要存储在列表中,然后在以后打印,请忽略打印调用:
plist <- lapply(l, function (j) ggplot(j, aes(x=year, y=j[,2], fill=year)) +
ylab(names(j[2])) +geom_boxplot() )
# To print ...
plist[[1]]
If I understand what you want, I think you can make things much simpler by using aes_string
instead of aes
. 如果我了解您的需求,我认为您可以使用aes_string
而不是aes
简化事情。 This allows you to specify the variables of interest as strings rather than as names. 这使您可以将感兴趣的变量指定为字符串而不是名称。 Here is a simple example using the well worn iris
data set: 这是一个使用磨损的iris
数据集的简单示例:
lapply( names(iris)[1:4], function(n) ggplot(data = iris, aes_string(y = n, x = "Species")) + geom_boxplot() )
This generates side-by-side boxplots (by species) for each of the four quantitative variables in the iris
data set and should be easy to adjust for your data frame. lapply( names(iris)[1:4], function(n) ggplot(data = iris, aes_string(y = n, x = "Species")) + geom_boxplot() )
这将生成并排的lapply( names(iris)[1:4], function(n) ggplot(data = iris, aes_string(y = n, x = "Species")) + geom_boxplot() )
形图(按种类),以针对iris
数据集中的四个定量变量中的每个变量进行调整,并且应该易于针对您的数据框进行调整。
The issue turned out to be old versions of R (3.2.2) that was confusing Rstudio. 问题原来是R(3.2.2)的旧版本,这使Rstudio感到困惑。 Once I deleted the old version, that solved the problem - my original lapply() function (the first example) works fine. 删除旧版本后,即可解决问题-我原来的lapply()函数(第一个示例)可以正常工作。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.