简体   繁体   English

R-将dist函数应用于组

[英]R - Apply dist function to groups

I am trying to apply the dist() function row wise in R but the result I get is as if it isn't grouping at all, it is simply applying dist() to all of my dataframe. 我正在尝试在R中逐行应用dist()函数 ,但得到的结果似乎根本就没有分组,它只是将dist()应用于所有数据框。

df2 %>% dplyr::group_by(X1) %>% dist()

Where df2 is my dataframe and I am just applying to the head for now, for simplicity. df2是我的数据帧,为简单起见,我现在暂时申请。 Essentially, each group contains coordinates (A,B) and I am trying to get the distance between each point. 本质上,每个组都包含坐标(A,B),我正在尝试获取每个点之间的距离。

Here is my dataframe: 这是我的数据框:

   X1  A              B
1   1  12             0.0
2   1  18             0.0
3   1  18             1.0
4   1  13             0.0
5   1  18             4.0
6   1  18             0.0
7   1  18             5.0
8   1  18             0.0
9   1  18             0.0
10  2  73            -2.0
11  2  73            -0.5
12  2  74            -0.5
13  2  73             0.0
14  2  71            -1.0
15  2  75             0.0

My desired output is the lower triangular matrix of each group, here is an example: 我想要的输出是每组的下三角矩阵,下面是一个示例: 在此处输入图片说明

Here's an example of creating distance matrices of the iris data set by species 这是一个按物种创建虹膜数据集的距离矩阵的示例

results = list()

for(spec in unique(iris$Species)){
  temp = iris[iris$Species==spec, 1:4]
  results[[length(results)+1]] = dist(temp)
}
names(results) = unique(iris$Species)

You'll have to figure out what to do with it afterwords. 您必须弄清楚如何处理后记。

We can user purrr::map : 我们可以使用purrr::map

library(purrr)

df %>% 
  split(.$X1) %>% 
  map(~{
    dist(.x)
  }) -> distList

distList
#> $`1`
#>          1        2        3        4        5        6        7        8
#> 2 6.000000                                                               
#> 3 6.082763 1.000000                                                      
#> 4 1.000000 5.000000 5.099020                                             
#> 5 7.211103 4.000000 3.000000 6.403124                                    
#> 6 6.000000 0.000000 1.000000 5.000000 4.000000                           
#> 7 7.810250 5.000000 4.000000 7.071068 1.000000 5.000000                  
#> 8 6.000000 0.000000 1.000000 5.000000 4.000000 0.000000 5.000000         
#> 9 6.000000 0.000000 1.000000 5.000000 4.000000 0.000000 5.000000 0.000000
#> 
#> $`2`
#>          10       11       12       13       14
#> 11 1.500000                                    
#> 12 1.802776 1.000000                           
#> 13 2.000000 0.500000 1.118034                  
#> 14 2.236068 2.061553 3.041381 2.236068         
#> 15 2.828427 2.061553 1.118034 2.000000 4.123106

Data: 数据:

df <- read.table(text = 'X1  A              B
1   1  12             0.0
2   1  18             0.0
3   1  18             1.0
4   1  13             0.0
5   1  18             4.0
6   1  18             0.0
7   1  18             5.0
8   1  18             0.0
9   1  18             0.0
10  2  73            -2.0
11  2  73            -0.5
12  2  74            -0.5
13  2  73             0.0
14  2  71            -1.0
15  2  75             0.0', h = T)

Here's my code and the solution 这是我的代码和解决方案

require(dplyr)
df2 <- structure(list(X1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L), A = c(12L, 18L, 18L, 13L, 18L, 18L, 18L, 
18L, 18L, 73L, 73L, 74L, 73L, 71L, 75L), B = c(0, 0, 1, 0, 4, 
0, 5, 0, 0, -2, -0.5, -0.5, 0, -1, 0)), .Names = c("X1", "A", 
"B"), class = "data.frame", row.names = c("1", "2", "3", "4", 
"5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15"))
mydf <- df2 %>% group_by(X1) %>% summarise(distmatrix=list(dist(cbind(A,B))))
mydf
# # A tibble: 2 × 2
# X1 distmatrix
# <int>     <list>
#   1     1 <S3: dist>
#   2     2 <S3: dist>
mydf$distmatrix
# [[1]]
# 1        2        3        4        5        6        7        8
# 2 6.000000                                                               
# 3 6.082763 1.000000                                                      
# 4 1.000000 5.000000 5.099020                                             
# 5 7.211103 4.000000 3.000000 6.403124                                    
# 6 6.000000 0.000000 1.000000 5.000000 4.000000                           
# 7 7.810250 5.000000 4.000000 7.071068 1.000000 5.000000                  
# 8 6.000000 0.000000 1.000000 5.000000 4.000000 0.000000 5.000000         
# 9 6.000000 0.000000 1.000000 5.000000 4.000000 0.000000 5.000000 0.000000
# 
# [[2]]
# 1        2        3        4        5
# 2 1.500000                                    
# 3 1.802776 1.000000                           
# 4 2.000000 0.500000 1.118034                  
# 5 2.236068 2.061553 3.041381 2.236068         
# 6 2.828427 2.061553 1.118034 2.000000 4.123106

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM