计算组内观测值之间的最小距离

Question

在下面的数据集中，我如何创建一个新列min.diff来报告给定观察x与其组内的任何其他观察x之间的最小距离y由group列标识）？ 我想用abs(xy)测量x和y之间的距离。

    set.seed(1)

df <- data.frame(
  group = c('A', 'A', 'A', 'B', 'B', 'C', 'C', 'C'),
  value = sample(1:10, 8, replace = T)
)

预期 output：

  group value min.diff
1     A     9   2
2     A     4   3
3     A     7   2
4     B     1   1
5     B     2   1
6     C     7   4
7     C     2   1
8     C     3   1

我更喜欢使用dplyr的解决方案。 我想到的唯一方法是扩展 dataframe，方法是添加更多行以获取组内的每个可能对，计算距离，然后过滤出每个组中的最小值。 有没有更紧凑的方式？

Answer 1

我们可以使用combn来做'value'之间的成对差异，得到abs的min

library(dplyr)
df1 <- df %>% 
          mutate(new = min(abs(combn(value, 2, FUN = function(x) x[1] - x[2]))))

如果我们想获得给定元素之间的min ，即first从 rest

 df1 <- df %>%
            mutate(new = min(abs(value[-1] - first(value))))

Answer 2

我们可以使用map_dbl将当前值与所有其他值相减，并使用 select 从中减去每个group的最小值。

library(dplyr)
library(purrr)

df %>%
  group_by(group) %>%
  mutate(min.diff = map_dbl(row_number(), ~min(abs(value[-.x] - value[.x]))))
       

#  group value min.diff
#  <chr> <int>    <dbl>
#1 A         9        2
#2 A         4        3
#3 A         7        2
#4 B         1        1
#5 B         2        1
#6 C         7        4
#7 C         2        1
#8 C         3        1

Answer 3

如果顺序无关紧要...

library(dplyr)

df %>% 
  arrange(group, value) %>% #Order ascending by value, within each group
  group_by(group) %>% 
  mutate(min.diff = case_when(lag(group) == group & lead(group) == group ~ min(c(abs(value - lag(value)), abs(value - lead(value))), na.rm = T), #If the "group" for the previous and next entry are the same as the current group, take the smallest of the two differences
                              lag(group) == group ~ abs(value - lag(value)), #Otherwise, if only the previous entry's group is the same as the current one, take the difference from the previous
                              lead(group) == group ~ abs(value - lead(value)) #Otherwise, if only the next entry's group is the same as the current one, take the difference from the next
                              )
         ) %>%
  ungroup()

  #    group value min.diff
  #    <chr> <int>    <int>
  #  1 A         4        3
  #  2 A         7        2
  #  3 A         9        2
  #  4 B         1        1
  #  5 B         2        1
  #  6 C         2        1
  #  7 C         3        1
  #  8 C         7        4

如果顺序很重要，您可以添加一个索引并在之后重新排列它，如下所示：

library(dplyr)

df %>% 
  group_by(group) %>%
  mutate(index = row_number()) %>% #create the index
  arrange(group, value) %>%
  mutate(min.diff = case_when(lag(group) == group & lead(group) == group ~ min(c(abs(value - lag(value)), abs(value - lead(value))), na.rm = T),
                              lag(group) == group ~ abs(value - lag(value)),
                              lead(group) == group ~ abs(value - lead(value))
                              )
         ) %>%
  ungroup() %>%
  arrange(group, index) %>% #rearrange by the index
  select(-index) #remove the index


#   group value min.diff
#   <chr> <int>    <int>
# 1 A         9        2
# 2 A         4        3
# 3 A         7        2
# 4 B         1        1
# 5 B         2        1
# 6 C         7        4
# 7 C         2        1
# 8 C         3        1

计算组内观测值之间的最小距离

问题描述

3 个解决方案

解决方案1
1 2020-07-10 00:10:33

解决方案2
0 已采纳 2020-07-10 01:50:03

解决方案3
0 2020-07-10 03:03:49

计算组内观测值之间的最小距离

问题描述

3 个解决方案

解决方案1 1 2020-07-10 00:10:33

解决方案2 0 已采纳 2020-07-10 01:50:03

解决方案3 0 2020-07-10 03:03:49

解决方案1
1 2020-07-10 00:10:33

解决方案2
0 已采纳 2020-07-10 01:50:03

解决方案3
0 2020-07-10 03:03:49