简体   繁体   English

索引变量时如何在ggplot2中使用lapply

[英]How to use lapply with ggplot2 while indexing variables

I would like to generate several hundred boxplots of continuous data from a large data frame, stratified by the factor "year". 我想从一个大型数据框中生成数百个连续数据的箱线图,并按“年”因子进行分层。 I started by creating a list from the original data frame that contains each dependent variable and the year. 我首先从原始数据框中创建一个列表,其中包含每个因变量和年份。

Here is an example data set that looks like mine: 这是一个看起来像我的示例数据集:

l<-list(data.frame(year=c(rep("2010",10),rep("2011",10),rep("2012",10)),     
var1=sample(1:100,30,replace=T)), 
    data.frame(year=c(rep("2010",10),rep("2011",10),rep("2012",10)), 
var2=sample(100:200,30,replace=T)),
    data.frame(year=c(rep("2010",10),rep("2011",10),rep("2012",10)), 
var3=sample(25:50,30, replace=T)))

The next step was to apply a ggplot2 function over the list. 下一步是在列表上应用ggplot2函数。 Neither of these functions produce plots: 这些函数均不产生图:

lapply(l, function (j) ggplot(j, aes(x=year, y=j[,2], fill=year)) +    
 geom_boxplot() + ylab(names(j[2])) )

lapply(l, function (j) ggplot(j, aes(x=year, y=j[[1]][2], fill=year)) +  
 geom_boxplot() + ylab(names(j[2])) )

The same error message is generated from those scripts: 从这些脚本生成相同的错误消息:

Error: No layers in plot" 错误:情节中没有图层”

In actuality, my data frame is much larger -- 2800 observations and over 250 different variables with unique descriptive names (eg "M2_loss", "SSC"). 实际上,我的数据框要大得多-2800个观测值和250多个具有唯一描述性名称的变量(例如“ M2_loss”,“ SSC”)。 Each variable is on a different scale, so using facets is not a good solution. 每个变量的比例不同,因此使用构面不是一个好的解决方案。 What makes this question different from other examples on stackoverflow is that I am trying to index the data rather than explicitly name it. 使这个问题与其他关于stackoverflow的示例不同的原因是,我试图对数据进行索引而不是对其进行显式命名。 It is important that I capture the unique name of each variable and use it to label the y-axis. 重要的是,我必须捕获每个变量的唯一名称并使用它来标记y轴。

Any ideas on how to proceed? 关于如何进行的任何想法?

If you want to have the lapply function actually create output on hte console screen device, it would be a matter of adding a +geom_boxplot call: 如果要让lapply函数在控制台屏幕设备上实际创建输出,则可以添加+geom_boxplot调用:

 plist <- lapply(l, function (j) print( ggplot(j, aes(x=year, y=j[,2], fill=year))     
  ylab(names(j[2])) +geom_boxplot() ) )

If you wanted to store in a list and then plot later leave out the print call: 如果要存储在列表中,然后在以后打印,请忽略打印调用:

 plist <- lapply(l, function (j)  ggplot(j, aes(x=year, y=j[,2], fill=year)) +
                                      ylab(names(j[2])) +geom_boxplot() ) 
# To print ...
plist[[1]]

If I understand what you want, I think you can make things much simpler by using aes_string instead of aes . 如果我了解您的需求,我认为您可以使用aes_string而不是aes简化事情。 This allows you to specify the variables of interest as strings rather than as names. 这使您可以将感兴趣的变量指定为字符串而不是名称。 Here is a simple example using the well worn iris data set: 这是一个使用磨损的iris数据集的简单示例:

lapply( names(iris)[1:4], function(n) ggplot(data = iris, aes_string(y = n, x = "Species")) + geom_boxplot() ) This generates side-by-side boxplots (by species) for each of the four quantitative variables in the iris data set and should be easy to adjust for your data frame. lapply( names(iris)[1:4], function(n) ggplot(data = iris, aes_string(y = n, x = "Species")) + geom_boxplot() )这将生成并排的lapply( names(iris)[1:4], function(n) ggplot(data = iris, aes_string(y = n, x = "Species")) + geom_boxplot() )形图(按种类),以针对iris数据集中的四个定量变量中的每个变量进行调整,并且应该易于针对您的数据框进行调整。

The issue turned out to be old versions of R (3.2.2) that was confusing Rstudio. 问题原来是R(3.2.2)的旧版本,这使Rstudio感到困惑。 Once I deleted the old version, that solved the problem - my original lapply() function (the first example) works fine. 删除旧版本后,即可解决问题-我原来的lapply()函数(第一个示例)可以正常工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM