[英]Calculating pairwise euclidean distances for observations from different groups in R?
我有一個數據框,其中包含三個變量(V1到V3)的觀察結果,分為3組:
V1 V2 V3 group
0.59 0.78 0.91 1
0.72 0.91 0.73 2
1.31 1.21 0.90 3
4.32 1.53 3.20 2
....
我想計算觀察之間的歐氏距離。 在所有觀測值之間計算成對距離很容易:
df %>%
select(-group) %>%
dist()
但我也有興趣計算成對距離(a)僅在不屬於同一組的觀察之間的同一組(b)中的觀察之間(例如,在組1中的每個觀察與組2和3中的所有觀察之間)。
對於(a),我可以這樣做:
for (x in unique(df$group){
df %>%
filter(group == x) %>%
select(-group) %>%
dist()
}
並將結果加在一起; 但我不太清楚如何完成(b)。 應如何做到最好?
謝謝!
如何在變量組合矩陣中應用類似於函數的函數:
library(dplyr)
## define the data frame
df = as.data.frame(cbind(c(.59, .72, 1.31, 4.32),
c(.78, .91, 1.21, 1.52),
c(.91, .73, .9, 3.2),
c(1,2,3,2)), stringsAsFactors = FALSE)
names(df) = c("V1", "V2", "V3", "group")
## generate a matrix with the unique combinations of groups
combinations = combn(x = unique(df$group), m = 2)
## apply a function over the matrix of group combinations to determine
## the distance between the variable observations
distlist = lapply(seq(from = 1, to = ncol(combinations)), function(i){
tmpdist = df %>% filter(group %in% combinations[,i]) %>%
select(-group) %>%
dist()
return(cbind(combinations[1,i], combinations[2,i], tmpdist))
})
## combine the list into a dataframe
dists = do.call(rbind, distlist)
names(dists) = c("group1", "group2", "dist")
這是一種通過給定條件分割距離和提取的計算方法。
## distance as a matrix
d_m <- df %>%
select(-group) %>%
dist() %>%
as.matrix()
## combination of groups
cb_g <- combn(df$group, m= 2)
## combination of indices
cb_i <- combn(1:length(df$group), m= 2)
## extract the values that fit to given conditions
corr_same_grp <- apply(cb_g, 2, function(x) x[1] == x[2]) %>% # same groups
{ cb_i[, ., drop= F] } %>% # get indices
apply(2, function(x) d_m[x[2], x[1]])
corr_diff_grp <- apply(cb_g, 2, function(x) x[1] != x[2]) %>% # different groups
{ cb_i[, ., drop= F] } %>% # get indices
apply(2, function(x) d_m[x[2], x[1]])
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.