[英]R - Apply dist function to groups
我正在嘗試在R中逐行應用dist()函數 ,但得到的結果似乎根本就沒有分組,它只是將dist()應用於所有數據框。
df2 %>% dplyr::group_by(X1) %>% dist()
df2
是我的數據幀,為簡單起見,我現在暫時申請。 本質上,每個組都包含坐標(A,B),我正在嘗試獲取每個點之間的距離。
這是我的數據框:
X1 A B
1 1 12 0.0
2 1 18 0.0
3 1 18 1.0
4 1 13 0.0
5 1 18 4.0
6 1 18 0.0
7 1 18 5.0
8 1 18 0.0
9 1 18 0.0
10 2 73 -2.0
11 2 73 -0.5
12 2 74 -0.5
13 2 73 0.0
14 2 71 -1.0
15 2 75 0.0
這是一個按物種創建虹膜數據集的距離矩陣的示例
results = list()
for(spec in unique(iris$Species)){
temp = iris[iris$Species==spec, 1:4]
results[[length(results)+1]] = dist(temp)
}
names(results) = unique(iris$Species)
您必須弄清楚如何處理后記。
我們可以使用purrr::map
:
library(purrr)
df %>%
split(.$X1) %>%
map(~{
dist(.x)
}) -> distList
distList
#> $`1`
#> 1 2 3 4 5 6 7 8
#> 2 6.000000
#> 3 6.082763 1.000000
#> 4 1.000000 5.000000 5.099020
#> 5 7.211103 4.000000 3.000000 6.403124
#> 6 6.000000 0.000000 1.000000 5.000000 4.000000
#> 7 7.810250 5.000000 4.000000 7.071068 1.000000 5.000000
#> 8 6.000000 0.000000 1.000000 5.000000 4.000000 0.000000 5.000000
#> 9 6.000000 0.000000 1.000000 5.000000 4.000000 0.000000 5.000000 0.000000
#>
#> $`2`
#> 10 11 12 13 14
#> 11 1.500000
#> 12 1.802776 1.000000
#> 13 2.000000 0.500000 1.118034
#> 14 2.236068 2.061553 3.041381 2.236068
#> 15 2.828427 2.061553 1.118034 2.000000 4.123106
df <- read.table(text = 'X1 A B
1 1 12 0.0
2 1 18 0.0
3 1 18 1.0
4 1 13 0.0
5 1 18 4.0
6 1 18 0.0
7 1 18 5.0
8 1 18 0.0
9 1 18 0.0
10 2 73 -2.0
11 2 73 -0.5
12 2 74 -0.5
13 2 73 0.0
14 2 71 -1.0
15 2 75 0.0', h = T)
這是我的代碼和解決方案
require(dplyr)
df2 <- structure(list(X1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L), A = c(12L, 18L, 18L, 13L, 18L, 18L, 18L,
18L, 18L, 73L, 73L, 74L, 73L, 71L, 75L), B = c(0, 0, 1, 0, 4,
0, 5, 0, 0, -2, -0.5, -0.5, 0, -1, 0)), .Names = c("X1", "A",
"B"), class = "data.frame", row.names = c("1", "2", "3", "4",
"5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15"))
mydf <- df2 %>% group_by(X1) %>% summarise(distmatrix=list(dist(cbind(A,B))))
mydf
# # A tibble: 2 × 2
# X1 distmatrix
# <int> <list>
# 1 1 <S3: dist>
# 2 2 <S3: dist>
mydf$distmatrix
# [[1]]
# 1 2 3 4 5 6 7 8
# 2 6.000000
# 3 6.082763 1.000000
# 4 1.000000 5.000000 5.099020
# 5 7.211103 4.000000 3.000000 6.403124
# 6 6.000000 0.000000 1.000000 5.000000 4.000000
# 7 7.810250 5.000000 4.000000 7.071068 1.000000 5.000000
# 8 6.000000 0.000000 1.000000 5.000000 4.000000 0.000000 5.000000
# 9 6.000000 0.000000 1.000000 5.000000 4.000000 0.000000 5.000000 0.000000
#
# [[2]]
# 1 2 3 4 5
# 2 1.500000
# 3 1.802776 1.000000
# 4 2.000000 0.500000 1.118034
# 5 2.236068 2.061553 3.041381 2.236068
# 6 2.828427 2.061553 1.118034 2.000000 4.123106
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.