I have three lists, List1 contains identifiers, List2 contains comma separated strings which may be items in List1, List3 contains numbers (some measured scores).
List1=c("Object1","Object2",......,"Objectn")
List2=c("Object1","Object2,Object3","Object4","Object5","Object6", .... )
List3=c("0.90","0,80",....)
All lists have same length.
What I want to do, for each item in List1, for each item in List2, check if the intersection is not null, and increment a score.
I can do this iteratively, but since my lists are too long, I wanted to do that with lapply but failed. Any help would be appreciated.
FinalScoreList="",
for(i in 1:length(List1)){
score=0
for(j in 1:length(List2)){
if(length(intersect(List1[[i]],
as.list(unlist(strsplit(as.character(List2[j]),',')))))>0) {
score=score+as.double(List3[j])
}
}
FinalScoreList=c(FinalScoreList,score)
}
Here is something that I think is along the lines of what you're after:
List1=c("Object1","Object2", "0.70")
List2=c("Object1","Object2", "Object3")
List3=c("0.90","0,80", "0.70")
# Make a list of lists
All_Lists = list(
"List1" = List1,
"List2" = List2,
"List3" = List3
)
# Create a dataframe listing all pairwise combinations of the lists
intersect_df <- data.frame(t(combn(names(All_Lists), 2)))
# Add a new column to this dataframe indicating the length of the intersection
# between each pair of lists
intersect_df$count <- apply(intersect_df, 1, function(r) length(intersect(All_Lists[[r[1]]], All_Lists[[r[2]]])))
Output:
> intersect_df
X1 X2 count
1 List1 List2 2
2 List1 List3 1
3 List2 List3 0
So each row in the output specifies a combination of two lists ( X1
and X2
), and the column count
indicates the length of the intersection between those two lists.
First I would not recommend giving the name "List" (List1,List2,List3...) to items that are not lists. Second since you want "List3" elements to be numeric do it from the beginning. I created the following example:
library(dplyr)
List1=c("Object1","Object2","Object3","Object4","Object5","Object6","Object7","Object8")
List2=c("Object3","Object4","Object5","Object6","Object7","Object8","Object9","Object10")
List3=c("0.90","0.80","0.70","0.60","0.50","0.40","0.30","0.20")%>%as.numeric
now with few alterations in your code we get the FinalScoreList
FinalScoreList=c()
for(i in 1:length(List1)){
score=0
for(j in 1:length(List2)){
if(length(intersect(List1[[i]], as.list(unlist(strsplit(as.character(List2[j]),',')))))>0) {
score=score+List3[j]
}
}
FinalScoreList=c(FinalScoreList,score)
}
> FinalScoreList
[1] 0.0 0.0 0.9 0.8 0.7 0.6 0.5 0.4
we can get the same result without looping with the code below:
df=data.frame(List1,List2,List3)
df$Matches<-0
matches0<-grep(List1,pattern=paste(intersect(List2,List1),collapse="|"))
matches1<-grep(List2,pattern=paste(intersect(List2,List1),collapse="|"))
df$Matches[matches0]<-List3[matches1]
> df$Matches
[1] 0.0 0.0 0.9 0.8 0.7 0.6 0.5 0.4
You can perform the split of List2
before your loops, this speed things up already. Also as you start with an empty vector FinalScoreList
, R has to grow this in each step which makes it also slower.
This is a solution with nested lapply
/ sapply
-calls:
List2 <- lapply(List2, function(x) unlist(strsplit(x, split = ",")))
FinalScoreList <- lapply(List1, function(x) {
indicator <- sapply(List2, function(y) x %in% y)
sum(List3[indicator])
})
unlist(FinalScoreList)
As @Antonis already said, you should store your List3
vector already as a numeric vector.
Data
List1 <- paste0("Object", 1:10)
List2 <- c("Object1", "Object6,Object5", "Object2,Object1", "Object7",
"Object6,Object8", "Object5,Object9", "Object4,Object2",
"Object3,Object8", "Object2,Object6", "Object10,Object3")
List3 <- runif(10)
Thank you guys.
Now suppose that List1 is in the same nature as List2, ie, items could be concatenated strings. And also can have a different length.
I did lapply strsplit on List1 but still I obtain NA in FinalScoreList though.
List1 <- c("Object1", "Object7,Object5", "Object2,Object1")
List2 <- c("Object1", "Object6,Object5", "Object0,Object1", "Object7",
"Object6,Object8", "Object5,Object9", "Object4,Object2",
"Object3,Object8", "Object2,Object3", "Object10,Object3")
List3 <- runif(10)
List2 <- lapply(List2, function(x) unlist(strsplit(x, split = ",")))
List1 <- lapply(List1, function(x) unlist(strsplit(x, split = ",")))
FinalScoreList <- lapply(List1, function(x) {
indicator <- sapply(List2, function(y) {x %in% y})
sum(List3[indicator])
})
unlist(FinalScoreList)
[1] 1.595639 NA NA
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.