简体   繁体   English

在R中的列表上使用Apply系列和多项功能

[英]Using apply family and multiple functions on lists in R

I have a question following my answer to this question on this question Matching vertex attributes across a list of edgelists R 在这个问题的答案后面有一个问题,在边列表R中匹配顶点属性

My solution was to use for loops, but we should always try to optimize(vectorize) when we can. 我的解决方案是使用for循环,但是我们应该尽可能尝试优化(向量化)。

What I'm trying to understand is how I would vectorize the solution I made in the post. 我想了解的是如何将帖子中提出的解决方案矢量化。

My solution was 我的解决方案是

for(i in 1:length(graph_list)){
  graph_list[[i]]=set_vertex_attr(graph_list[[i]],"gender", value=attribute_df$gender[match(V(graph_list[[i]])$name, attribute_df$names)])
}

Ideally we could vectorize this with lapply but I'm having some trouble conceiving how to do that. 理想情况下,我们可以使用lapply将其向量化,但是在构思如何做到这一点时遇到了一些麻烦。 Here's what I've got 这就是我所拥有的

graph_lists_new=lapply(graph_list, set_vertex_attr, value=attribute_df$gender[match(V(??????????)$name, attribute_df$names)]))

What I'm unclear about is what I'd put in the part with the ?????? 我不清楚的是我将在??????部分中放置什么? . The thing inside the V() function should be each item in the list, but what I don't get is what I'd put inside when I'm using lapply . V()函数内部的东西应该是列表中的每个项目,但是我没有得到的是当我使用lapply时要放在里面的lapply

All data can be found in the link I posted, but here's the data anyway 所有数据都可以在我发布的链接中找到,但无论如何这里都是数据

attribute_df<- structure(list(names = structure(c(6L, 7L, 5L, 2L, 1L, 8L, 3L, 
4L), .Label = c("Andy", "Angela", "Eric", "Jamie", "Jeff", "Jim", 
"Pam", "Tim"), class = "factor"), gender = structure(c(3L, 2L, 
3L, 2L, 3L, 1L, 1L, 2L), .Label = c("", "F", "M"), class = "factor"), 
    happiness = c(8, 9, 4.5, 5.7, 5, 6, 7, 8)), class = "data.frame", row.names = c(NA, 
-8L))



edgelist<-list(structure(list(nominator1 = structure(c(3L, 4L, 1L, 2L), .Label = c("Angela", 
"Jeff", "Jim", "Pam"), class = "factor"), nominee1 = structure(c(1L, 
2L, 3L, 2L), .Label = c("Andy", "Angela", "Jeff"), class = "factor")), class = "data.frame", row.names = c(NA, 
-4L)), structure(list(nominator2 = structure(c(4L, 1L, 2L, 3L
), .Label = c("Eric", "Jamie", "Oscar", "Tim"), class = "factor"), 
    nominee2 = structure(c(1L, 3L, 2L, 3L), .Label = c("Eric", 
    "Oscar", "Tim"), class = "factor")), class = "data.frame", row.names = c(NA, 
-4L)))

graph_list<- lapply(edgelist, graph_from_data_frame)

Since you need to use graph_list[[i]] multiple times in your call, to use lapply you need to write a custom function, such as this anonymous function. 由于您需要在调用中多次使用graph_list[[i]] ,因此要使用lapply您需要编写一个自定义函数,例如此匿名函数。 (It's the same code as your loop, I just wrapped it in function(x) and replaced all instances of graph_list[[i]] with x .) (这是相同的代码,你的循环,我只是把它包在function(x)和替换的所有实例graph_list[[i]]x )。

graph_list = lapply(graph_list, function(x)
  set_vertex_attr(x, "gender", value = attribute_df$gender[match(V(x)$name, attribute_df$names)])
)

(Note that I didn't test this, but it should work unless I made a typo.) (请注意,我没有对此进行测试,但是除非输入错误,否则它应该可以工作。)

lapply isn't vectorization---it's just "loop hiding". lapply不是矢量化-而是“循环隐藏”。 In this case, I think your for loop is a nicer way to do things than lapply . 在这种情况下,我认为您的for循环比lapply更好。 Especially since you are modifying existing objects, your simple for loop will probably be more efficient than an lapply solution, as well as more readable. 特别是由于您正在修改现有对象,因此简单的for循环可能比lapply解决方案更有效,并且更具可读性。

When we talk about vectorization for efficiency, we almost always mean atomic vectors, not list s. 当我们谈论矢量化以提高效率时,我们几乎总是指原子矢量,而不是list (It's vectorization , after all, not listization .) The reason to use lapply and related functions ( sapply , vapply , Map , most of the purrr package) isn't computer efficiency, it's readability , and human-efficiency to write. (毕竟,它是矢量化 ,而不是列表化 。)使用lapply和相关功能( sapplyvapplyMap ,大多数purrr软件包)的原因不是计算机效率, 可读性和人工效率。

Let's say you have a list of data frames, my_list = list(iris, mtcars, CO2) . 假设您有一个数据帧列表, my_list = list(iris, mtcars, CO2) If you want to get the number of rows for each of the data frames in the list and store it in a variable, we could use sapply or a for loop: 如果要获取列表中每个数据帧的行数并将其存储在变量中,我们可以使用sapplyfor循环:

# easy to write, easy to read
rows_apply = sapply(my_list, nrow)

# annoying to read and write
rows_for = integer(length(my_list))
for (i in seq_along(my_list)) rows_for[i] = nrow(my_list[[i]])

But the more complex your task gets, the more readable a for loop becomes compared to an alternative like these. 但是,您的任务越复杂,与此类替代方案相比, for循环的可读性就越高。 In your case, I'd prefer the for loop. 在您的情况下,我更喜欢for循环。


For more reading on this, see the old question Is apply more than syntactic sugar? 有关此内容的更多阅读,请参见旧问题。 是否还使用句法糖? . Since those answers were written, R has been upgraded to include a just-in-time compiler, which further speeds up for loops relative to apply. 由于这些问题的答案写,R已经升级到包括刚刚在即时编译器,进一步加快其for相回路申请。 In the nearly 10-year-old answers there, you'll see that sometimes *apply is slightly faster than a for loop. 在近10年历史的答案在那里,你会发现有时 *applyfor循环。 Since the JIT compiler, I think you'll find the opposite: most of the time a for loop is slightly faster than *apply . 从JIT编译器开始,我想您会发现相反的情况: 大多数情况下, for循环比*apply 快。

But in both of those cases, unless you're doing something absolutely trivial inside the for/apply, whatever you do inside for/apply will dominate the timings . 但是在这两种情况下,除非您在for / apply内部做的事情绝对微不足道 ,否则您在for / apply内部所做的任何事情都将主导时间

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM