简体   繁体   English

R通过lapply将函数应用于列表

[英]R Apply functions to lists with lapply

I have three lists, List1 contains identifiers, List2 contains comma separated strings which may be items in List1, List3 contains numbers (some measured scores). 我有三个列表,List1包含标识符,List2包含逗号分隔的字符串(可能是List1中的项目),List3包含数字(一些测得的分数)。

List1=c("Object1","Object2",......,"Objectn")
List2=c("Object1","Object2,Object3","Object4","Object5","Object6", .... )
List3=c("0.90","0,80",....)

All lists have same length. 所有列表的长度相同。

What I want to do, for each item in List1, for each item in List2, check if the intersection is not null, and increment a score. 对于List1中的每个项目,List2中的每个项目,我想做什么,请检查交集是否不为null,并增加一个分数。

I can do this iteratively, but since my lists are too long, I wanted to do that with lapply but failed. 我可以迭代地执行此操作,但是由于我的列表太长,因此我希望使用lapply进行操作,但失败了。 Any help would be appreciated. 任何帮助,将不胜感激。

FinalScoreList="",

for(i in 1:length(List1)){

  score=0

   for(j in 1:length(List2)){


      if(length(intersect(List1[[i]], 
             as.list(unlist(strsplit(as.character(List2[j]),',')))))>0) {

            score=score+as.double(List3[j])

        }

      }

     FinalScoreList=c(FinalScoreList,score)

   }

Here is something that I think is along the lines of what you're after: 我认为这是您追求的目标:

List1=c("Object1","Object2", "0.70")
List2=c("Object1","Object2", "Object3")
List3=c("0.90","0,80", "0.70")

# Make a list of lists
All_Lists = list(
  "List1" = List1,
  "List2" = List2,
  "List3" = List3
)

# Create a dataframe listing all pairwise combinations of the lists
intersect_df <- data.frame(t(combn(names(All_Lists), 2)))

# Add a new column to this dataframe indicating the length of the intersection
# between each pair of lists
intersect_df$count <- apply(intersect_df, 1, function(r) length(intersect(All_Lists[[r[1]]], All_Lists[[r[2]]])))

Output: 输出:

> intersect_df
     X1    X2 count
1 List1 List2     2
2 List1 List3     1
3 List2 List3     0

So each row in the output specifies a combination of two lists ( X1 and X2 ), and the column count indicates the length of the intersection between those two lists. 因此,输出中的每一行都指定两个列表( X1X2 )的组合,并且列count指示这两个列表之间的交点长度。

First I would not recommend giving the name "List" (List1,List2,List3...) to items that are not lists. 首先,我不建议将非列表项的名称命名为“列表”(List1,List2,List3 ...)。 Second since you want "List3" elements to be numeric do it from the beginning. 其次,因为您希望“ List3”元素为数字,所以请从头开始。 I created the following example: 我创建了以下示例:

library(dplyr)
List1=c("Object1","Object2","Object3","Object4","Object5","Object6","Object7","Object8")
List2=c("Object3","Object4","Object5","Object6","Object7","Object8","Object9","Object10")
List3=c("0.90","0.80","0.70","0.60","0.50","0.40","0.30","0.20")%>%as.numeric

now with few alterations in your code we get the FinalScoreList 现在,在您的代码中进行了少量更改,我们得到了FinalScoreList

FinalScoreList=c()

for(i in 1:length(List1)){

  score=0

  for(j in 1:length(List2)){

    if(length(intersect(List1[[i]], as.list(unlist(strsplit(as.character(List2[j]),',')))))>0) {
      score=score+List3[j]
    }
  }
  FinalScoreList=c(FinalScoreList,score)
}
> FinalScoreList
[1] 0.0 0.0 0.9 0.8 0.7 0.6 0.5 0.4

we can get the same result without looping with the code below: 我们可以得到相同的结果而无需循环下面的代码:

df=data.frame(List1,List2,List3)
df$Matches<-0
matches0<-grep(List1,pattern=paste(intersect(List2,List1),collapse="|"))
matches1<-grep(List2,pattern=paste(intersect(List2,List1),collapse="|"))
df$Matches[matches0]<-List3[matches1]
> df$Matches
[1] 0.0 0.0 0.9 0.8 0.7 0.6 0.5 0.4

You can perform the split of List2 before your loops, this speed things up already. 您可以在循环之前执行List2的拆分,这已经加快了速度。 Also as you start with an empty vector FinalScoreList , R has to grow this in each step which makes it also slower. 同样,从空向量FinalScoreList开始时,R必须在每个步骤中都增大它,这也会使其变慢。

This is a solution with nested lapply / sapply -calls: 这是嵌套lapply / sapply的解决方案:

List2 <- lapply(List2, function(x) unlist(strsplit(x, split = ",")))

FinalScoreList <- lapply(List1, function(x) {
  indicator <- sapply(List2, function(y) x %in% y) 
  sum(List3[indicator])
})

unlist(FinalScoreList)

As @Antonis already said, you should store your List3 vector already as a numeric vector. 正如@Antonis已经说过的,您应该已经将List3向量存储为数字向量。

Data 数据

List1 <- paste0("Object", 1:10)
List2 <- c("Object1", "Object6,Object5", "Object2,Object1", "Object7", 
           "Object6,Object8", "Object5,Object9", "Object4,Object2", 
           "Object3,Object8", "Object2,Object6", "Object10,Object3")
List3 <- runif(10)

Thank you guys. 感谢大伙们。

Now suppose that List1 is in the same nature as List2, ie, items could be concatenated strings. 现在假设List1与List2具有相同的性质,即项目可以是串联的字符串。 And also can have a different length. 并且也可以具有不同的长度。

I did lapply strsplit on List1 but still I obtain NA in FinalScoreList though. 我确实在List1上执行了strsplit,但仍然在FinalScoreList中获得了NA。

List1 <- c("Object1", "Object7,Object5", "Object2,Object1")


List2 <- c("Object1", "Object6,Object5", "Object0,Object1", "Object7", 
           "Object6,Object8", "Object5,Object9", "Object4,Object2", 
           "Object3,Object8", "Object2,Object3", "Object10,Object3")


List3 <- runif(10)

List2 <- lapply(List2, function(x) unlist(strsplit(x, split = ",")))


List1 <- lapply(List1, function(x) unlist(strsplit(x, split = ",")))

FinalScoreList <- lapply(List1, function(x) {
  indicator <- sapply(List2, function(y) {x %in% y}) 
  sum(List3[indicator])
})

unlist(FinalScoreList)

[1] 1.595639 NA NA [1] 1.595639 NA NA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM