[英]Faster alternative to nested for loops in R
這是場景:我有一個示例,其中將主題分為三組。 接下來,將每個組的主題歸為一組,從而形成由每個組的主題組成的幾個“三胞胎”。 我想計算來自給定組(1、2或3)的主題與不同原始組的主題i分組的次數。
這是一個簡單的代碼示例:
data <- cbind(c(1:9), c(rep("Group 1", 3), rep("Group 2", 3), rep("Group 3", 3)))
data <- data.frame(data)
names(data) <- c("ID", "Group")
groups.of.3 <- data.frame(rbind(c(1,4,7),c(2,4,7),c(2,5,7),c(3,6,8),c(3,6,9)))
N <- nrow(data)
n1 <- nrow(data[data$Group == "Group 1", ])
n2 <- nrow(data[data$Group == "Group 2", ])
n3 <- nrow(data[data$Group == "Group 3", ])
# Check the number of times a subject from a group is grouped with a subject i
# from another group
M1 <- matrix(0, nrow = N, ncol = n1)
M2 <- matrix(0, nrow = N, ncol = n2)
M3 <- matrix(0, nrow = N, ncol = n3)
for (i in 1:N){
if (data$Group[i] != "Group 1"){
for (j in 1:n1){
M1[i,j] <- nrow(groups.of.3[groups.of.3[,1] == j &
(groups.of.3[,2] == i |
groups.of.3[,3] == i), ])
}
}
if (data$Group[i] != "Group 2"){
for (j in 1:n2){
M2[i,j] <- nrow(groups.of.3[groups.of.3[,2] == (n1 + j) &
(groups.of.3[,1] == i |
groups.of.3[,3] == i), ])
}
}
if (data$Group[i] != "Group 3"){
for (j in 1:n3){
M3[i,j] <- nrow(groups.of.3[groups.of.3[,3] == (n1 + n2 + j) &
(groups.of.3[,1] == i |
groups.of.3[,2] == i), ])
}
}
}
因此,我有9個主題,每組三個。 然后,將每個組的主題隨后分組在一起(允許重復放置)。 對於更多的主題,這花費了更長的時間,我想知道是否有一種更快的選擇避免使用for循環。
例如,矩陣M1由第1組中的受試者隨后與任何其他組中的其他受試者分組的次數組成:
M1
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 0 0 0
[3,] 0 0 0
[4,] 1 1 0
[5,] 0 1 0
[6,] 0 0 2
[7,] 1 2 0
[8,] 0 0 1
[9,] 0 0 1
因此,第3列代表第1組的三個主題,行代表所有主題-條目是第1組的每個主題與其他任何主題分組的次數(例如,根據第3組, 3在與主題6相同的組中出現兩次,而主題1與主題7一起出現一次。
謝謝你的幫助!
像這樣嗎
library(tidyr)
library(dplyr)
data <- data %>%
mutate(ID = as.numeric(levels(ID))[ID])
tmp <- groups.of.3 %>%
add_rownames() %>%
gather("X", "Person", -rowname) %>%
inner_join(data, by = c("Person" = "ID"))
tmp %>%
inner_join(tmp, by = c("rowname")) %>%
filter(Group.x != Group.y) %>%
group_by(Person.x, Group.x, Group.y) %>%
summarise(N = n()) %>%
spread(key = Group.y, value = N, fill = 0)
Person.x Group.x Group 1 Group 2 Group 3
(dbl) (fctr) (dbl) (dbl) (dbl)
1 1 Group 1 0 1 1
2 2 Group 1 0 2 2
3 3 Group 1 0 2 2
4 4 Group 2 2 0 2
5 5 Group 2 1 0 1
6 6 Group 2 2 0 2
7 7 Group 3 3 3 0
8 8 Group 3 1 1 0
9 9 Group 3 1 1 0
for循環並不是天生就慢:
# coerce the fields in groups.of.3 to factor
for(i in 1:3)
groups.of.3[,i] <- as.factor(groups.of.3[,i],levels =data$ID)
M <- matrix(0, N, N)
out <- NULL
for(i in 1:(3-1))
for(j in (i+1):3)
M <- M + table(groups.of.3[,i],groups.of.3[,j])
M1 <- M[,as.integer(data$Group)==1]
M2 <- M[,as.integer(data$Group)==2]
M3 <- M[,as.integer(data$Group)==3]
我將對Thierry的答案做一個很小的修改,以回答我自己的問題:
庫(tidyr)庫(dplyr)
data <- data %>%
mutate(ID = as.numeric(levels(ID))[ID])
tmp <- groups.of.3 %>%
add_rownames() %>%
gather("X", "Person", -rowname) %>%
inner_join(data, by = c("Person" = "ID"))
tmp %>%
inner_join(tmp, by = c("rowname")) %>%
filter(Group.x != Group.y) %>%
group_by(Person.x, Group.x, Person.y) %>%
summarise(N = n()) %>%
spread(key = Person.y, value = N, fill = 0)
這給出了以下輸出,其中包括前一個for循環的M1,M2和M3,並將它們連接在一起。
Source: local data frame [9 x 11]
Person.x Group.x 1 2 3 4 5 6 7 8 9
(dbl) (fctr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
1 1 Group 1 0 0 0 1 0 0 1 0 0
2 2 Group 1 0 0 0 1 1 0 2 0 0
3 3 Group 1 0 0 0 0 0 2 0 1 1
4 4 Group 2 1 1 0 0 0 0 2 0 0
5 5 Group 2 0 1 0 0 0 0 1 0 0
6 6 Group 2 0 0 2 0 0 0 0 1 1
7 7 Group 3 1 2 0 2 1 0 0 0 0
8 8 Group 3 0 0 1 0 0 1 0 0 0
9 9 Group 3 0 0 1 0 0 1 0 0 0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.