[英]R - Apply dist function to groups
I am trying to apply the dist() function row wise in R but the result I get is as if it isn't grouping at all, it is simply applying dist() to all of my dataframe. 我正在尝试在R中逐行应用dist()函数 ,但得到的结果似乎根本就没有分组,它只是将dist()应用于所有数据框。
df2 %>% dplyr::group_by(X1) %>% dist()
Where df2
is my dataframe and I am just applying to the head for now, for simplicity. df2
是我的数据帧,为简单起见,我现在暂时申请。 Essentially, each group contains coordinates (A,B) and I am trying to get the distance between each point. 本质上,每个组都包含坐标(A,B),我正在尝试获取每个点之间的距离。
Here is my dataframe: 这是我的数据框:
X1 A B
1 1 12 0.0
2 1 18 0.0
3 1 18 1.0
4 1 13 0.0
5 1 18 4.0
6 1 18 0.0
7 1 18 5.0
8 1 18 0.0
9 1 18 0.0
10 2 73 -2.0
11 2 73 -0.5
12 2 74 -0.5
13 2 73 0.0
14 2 71 -1.0
15 2 75 0.0
My desired output is the lower triangular matrix of each group, here is an example: 我想要的输出是每组的下三角矩阵,下面是一个示例:
Here's an example of creating distance matrices of the iris data set by species 这是一个按物种创建虹膜数据集的距离矩阵的示例
results = list()
for(spec in unique(iris$Species)){
temp = iris[iris$Species==spec, 1:4]
results[[length(results)+1]] = dist(temp)
}
names(results) = unique(iris$Species)
You'll have to figure out what to do with it afterwords. 您必须弄清楚如何处理后记。
We can user purrr::map
: 我们可以使用
purrr::map
:
library(purrr)
df %>%
split(.$X1) %>%
map(~{
dist(.x)
}) -> distList
distList
#> $`1`
#> 1 2 3 4 5 6 7 8
#> 2 6.000000
#> 3 6.082763 1.000000
#> 4 1.000000 5.000000 5.099020
#> 5 7.211103 4.000000 3.000000 6.403124
#> 6 6.000000 0.000000 1.000000 5.000000 4.000000
#> 7 7.810250 5.000000 4.000000 7.071068 1.000000 5.000000
#> 8 6.000000 0.000000 1.000000 5.000000 4.000000 0.000000 5.000000
#> 9 6.000000 0.000000 1.000000 5.000000 4.000000 0.000000 5.000000 0.000000
#>
#> $`2`
#> 10 11 12 13 14
#> 11 1.500000
#> 12 1.802776 1.000000
#> 13 2.000000 0.500000 1.118034
#> 14 2.236068 2.061553 3.041381 2.236068
#> 15 2.828427 2.061553 1.118034 2.000000 4.123106
df <- read.table(text = 'X1 A B
1 1 12 0.0
2 1 18 0.0
3 1 18 1.0
4 1 13 0.0
5 1 18 4.0
6 1 18 0.0
7 1 18 5.0
8 1 18 0.0
9 1 18 0.0
10 2 73 -2.0
11 2 73 -0.5
12 2 74 -0.5
13 2 73 0.0
14 2 71 -1.0
15 2 75 0.0', h = T)
Here's my code and the solution 这是我的代码和解决方案
require(dplyr)
df2 <- structure(list(X1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L), A = c(12L, 18L, 18L, 13L, 18L, 18L, 18L,
18L, 18L, 73L, 73L, 74L, 73L, 71L, 75L), B = c(0, 0, 1, 0, 4,
0, 5, 0, 0, -2, -0.5, -0.5, 0, -1, 0)), .Names = c("X1", "A",
"B"), class = "data.frame", row.names = c("1", "2", "3", "4",
"5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15"))
mydf <- df2 %>% group_by(X1) %>% summarise(distmatrix=list(dist(cbind(A,B))))
mydf
# # A tibble: 2 × 2
# X1 distmatrix
# <int> <list>
# 1 1 <S3: dist>
# 2 2 <S3: dist>
mydf$distmatrix
# [[1]]
# 1 2 3 4 5 6 7 8
# 2 6.000000
# 3 6.082763 1.000000
# 4 1.000000 5.000000 5.099020
# 5 7.211103 4.000000 3.000000 6.403124
# 6 6.000000 0.000000 1.000000 5.000000 4.000000
# 7 7.810250 5.000000 4.000000 7.071068 1.000000 5.000000
# 8 6.000000 0.000000 1.000000 5.000000 4.000000 0.000000 5.000000
# 9 6.000000 0.000000 1.000000 5.000000 4.000000 0.000000 5.000000 0.000000
#
# [[2]]
# 1 2 3 4 5
# 2 1.500000
# 3 1.802776 1.000000
# 4 2.000000 0.500000 1.118034
# 5 2.236068 2.061553 3.041381 2.236068
# 6 2.828427 2.061553 1.118034 2.000000 4.123106
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.