[英]Computing minimum distance between observations within groups
在下面的数据集中,我如何创建一个新列min.diff
来报告给定观察x
与其组内的任何其他观察x
之间的最小距离y
由group
列标识)? 我想用abs(xy)
测量x
和y
之间的距离。
set.seed(1)
df <- data.frame(
group = c('A', 'A', 'A', 'B', 'B', 'C', 'C', 'C'),
value = sample(1:10, 8, replace = T)
)
预期 output:
group value min.diff
1 A 9 2
2 A 4 3
3 A 7 2
4 B 1 1
5 B 2 1
6 C 7 4
7 C 2 1
8 C 3 1
我更喜欢使用dplyr
的解决方案。 我想到的唯一方法是扩展 dataframe,方法是添加更多行以获取组内的每个可能对,计算距离,然后过滤出每个组中的最小值。 有没有更紧凑的方式?
我们可以使用combn
来做'value'之间的成对差异,得到abs
的min
library(dplyr)
df1 <- df %>%
mutate(new = min(abs(combn(value, 2, FUN = function(x) x[1] - x[2]))))
如果我们想获得给定元素之间的min
,即first
从 rest
df1 <- df %>%
mutate(new = min(abs(value[-1] - first(value))))
我们可以使用map_dbl
将当前值与所有其他值相减,并使用 select 从中减去每个group
的最小值。
library(dplyr)
library(purrr)
df %>%
group_by(group) %>%
mutate(min.diff = map_dbl(row_number(), ~min(abs(value[-.x] - value[.x]))))
# group value min.diff
# <chr> <int> <dbl>
#1 A 9 2
#2 A 4 3
#3 A 7 2
#4 B 1 1
#5 B 2 1
#6 C 7 4
#7 C 2 1
#8 C 3 1
如果顺序无关紧要...
library(dplyr)
df %>%
arrange(group, value) %>% #Order ascending by value, within each group
group_by(group) %>%
mutate(min.diff = case_when(lag(group) == group & lead(group) == group ~ min(c(abs(value - lag(value)), abs(value - lead(value))), na.rm = T), #If the "group" for the previous and next entry are the same as the current group, take the smallest of the two differences
lag(group) == group ~ abs(value - lag(value)), #Otherwise, if only the previous entry's group is the same as the current one, take the difference from the previous
lead(group) == group ~ abs(value - lead(value)) #Otherwise, if only the next entry's group is the same as the current one, take the difference from the next
)
) %>%
ungroup()
# group value min.diff
# <chr> <int> <int>
# 1 A 4 3
# 2 A 7 2
# 3 A 9 2
# 4 B 1 1
# 5 B 2 1
# 6 C 2 1
# 7 C 3 1
# 8 C 7 4
如果顺序很重要,您可以添加一个索引并在之后重新排列它,如下所示:
library(dplyr)
df %>%
group_by(group) %>%
mutate(index = row_number()) %>% #create the index
arrange(group, value) %>%
mutate(min.diff = case_when(lag(group) == group & lead(group) == group ~ min(c(abs(value - lag(value)), abs(value - lead(value))), na.rm = T),
lag(group) == group ~ abs(value - lag(value)),
lead(group) == group ~ abs(value - lead(value))
)
) %>%
ungroup() %>%
arrange(group, index) %>% #rearrange by the index
select(-index) #remove the index
# group value min.diff
# <chr> <int> <int>
# 1 A 9 2
# 2 A 4 3
# 3 A 7 2
# 4 B 1 1
# 5 B 2 1
# 6 C 7 4
# 7 C 2 1
# 8 C 3 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.