繁体   English   中英

计算数据框中各组点之间的最小距离

[英]Calculate minimum distance between groups of points in data frame

我的数据框如下所示:

Time, Value, Group
0, 1.0, A
1, 2.0, A
2, 3.0, A
0, 4.0, B
1, 6.0, B
2, 6.0, B
0, 7.0, C
1, 7.0, C
2, 9.0, C

我需要为每个组合 (A, B), (A, C), (B, C) 找到每个对应Time点的最大差异。

因此,比较 A 和 B 的最大距离为 t=1,即 6 (B) - 2 (A) = 4。

完整的输出应该是这样的:

combination,time,distance
AB, 0, 4
AC, 0, 6
BC, 0, 3

在基 R 中使用combn一种方法:

do.call(rbind, combn(unique(df$Group), 2, function(x) {
  df1 <- subset(df, Group == x[1])
  df2 <- subset(df, Group == x[2])
  df3 <- merge(df1, df2, by = 'Time')
  value <- abs(df3$Value.x - df3$Value.y)
  data.frame(combn = paste(x, collapse = ''), 
             time = df3$Time[which.max(value)],
             max_difference = max(value))
}, simplify = FALSE))

#  combn time max_difference
#1    AB    1              4
#2    AC    0              8
#3    BC    0              5

我们创建unique Group值的所有组合,为它们设置数据subset并在Timemerge它们。 减去相应的值列并返回它们之间的max差值。

数据

df <- structure(list(Time = c(0L, 1L, 2L, 0L, 1L, 2L, 0L, 0L, 0L), 
    Value = c(1, 2, 3, 4, 6, 6, 7, 7, 9), Group = c("A", "A", 
    "A", "B", "B", "B", "C", "C", "C")), 
    class = "data.frame", row.names = c(NA, -9L))

一种dplyr选项可能是:

df %>%
 inner_join(df, by = "Time") %>%
 filter(Group.x != Group.y) %>%
 group_by(Time,
          Group = paste(pmax(Group.x, Group.y), pmin(Group.x, Group.y), sep = "-")) %>%
 summarise(Max_Distance = abs(max(Value.x[Group.x == first(Group.x)]) -  max(Value.y[Group.y == first(Group.y)])))

   Time Group Max_Distance
  <int> <chr>        <dbl>
1     0 B-A              3
2     0 C-A              8
3     0 C-B              5
4     1 B-A              4
5     2 B-A              3

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM