简体   繁体   English

如何按组计算不同ID随时间的距离?

[英]How to calculate distance between different IDs over time by group?

My (example) data is structured as follows... where the X and Y coordinates of participants, recorded under varying conditions, are collected over time:我的(示例)数据结构如下...其中参与者的 X 和 Y 坐标在不同条件下记录,随时间收集:

    Individ <- data.frame(Participant = c("Bill", "Bill", "Bill", "Bill", "Bill", "Harry", "Harry", "Harry", "Harry","Harry", "Paul", "Paul", "Paul", "Paul", "Paul"),
                          Time = c(0.01, 0.02, 0.03, 0.04, 0.05, 0.01, 0.02, 0.03, 0.04, 0.05, 0.01, 0.02, 0.03, 0.04, 0.05),
                          Condition = c("Expr", "Expr", "Expr", "Expr", "Expr", "Con", "Con", "Con", "Con", "Con", "Nor", "Nor", "Nor", "Nor", "Nor"),
                          X = c(26.07, 26.06, 26.05, 26.09, 26.04, 26.65, 26.64, 26.62, 26.63, 26.62, 27.99, 28.01, 28.01, 28.02, 28.02),
                          Y = c(-5.01, -5.12, -5.14, -5.18, -5.2065, -12.37, 12.36, -12.35, -12.34, 12.33, -5.52, -5.514, -5.51, -5.50, -5.4962))

The X and Y coordinates are captured from the same location. X 和 Y 坐标是从同一位置捕获的。 I can calculate the distance covered by each Participant using the following:我可以使用以下方法计算每个参与者覆盖的距离:

require(plyr)
require(dplyr)
DistanceOutput <- Individ %>%
     arrange(Participant, Time, Condition) %>%
     group_by(Participant, Condition) %>%
     mutate( lagX = lag(X, order_by=Time), lagY = lag(Y, order_by=Time)) %>%
     rowwise() %>%
     mutate(Distance = dist( matrix( c(X,Y,lagX,lagY),nrow=2,byrow=TRUE) )) %>%
     select(-lagX, -lagY)

However, how can I calculate the distance between each Participant over Time , according to their Condition .但是,我怎么能计算出每个之间的距离Participant超过Time ,根据自己的Condition For example, the distance between Bill and Harry, Bill and Paul plus Harry and Paul over Time?例如,比尔和哈利之间的距离,比尔和保罗加上哈利和保罗随着时间的推移?

My dataset is 179,800 obs.我的数据集是 179,800 obs。 so ideally, a quick solution is preferred.所以理想情况下,首选快速解决方案。 Thank you!谢谢!

Here's a way to calculate the distance between each participant at each time point.这是一种计算每个参与者在每个时间点之间距离的方法。 I doubt it's the most efficient way, but maybe someone else will come along with a more elegant solution.我怀疑这是最有效的方法,但也许其他人会提出更优雅的解决方案。

You said that you'd like to calculate the distance between participants for each Condition .您说过要计算每个Condition参与者之间的距离。 In your sample data, there's only one participant in each condition.在您的示例数据中,每个条件只有一个参与者。 However, the solution below can easily be extended to be applied by Condition in addition to Time .但是,除了Time之外,下面的解决方案可以很容易地扩展到Condition应用。

library(reshape2)
library(dplyr)

# Calculate distance matrix for each Time
res = lapply(unique(Individ$Time), function(i) {

  mat = as.matrix(Individ[Individ$Time==i, c("X","Y")])
  rownames(mat) = Individ$Participant[Individ$Time==i]

  # Distance matrix
  d = as.matrix(dist(mat))

  # Keep only lower triangle
  d[upper.tri(d, diag=TRUE)] = NA

  # Return data frame with distances, time and participants
  data.frame(Time=i, d) %>% add_rownames("P1")
})

# Combine all time points into single long data frame of distances
res = bind_rows(res) %>% 
  melt(id.var=c("Time","P1"), variable.name="P2", value.name="Distance") %>%
  filter(!is.na(Distance)) %>% 
  rowwise %>%
  mutate(Pair = paste(sort(c(as.character(P1), as.character(P2))), collapse="-")) %>% 
  select(Pair, Time, Distance) %>%
  arrange(Pair, Time)

res
 Pair Time Distance 1 Bill-Harry 0.01 7.382818 2 Bill-Harry 0.02 17.489620 3 Bill-Harry 0.03 7.232496 4 Bill-Harry 0.04 7.180334 5 Bill-Harry 0.05 17.546089 6 Bill-Paul 0.01 1.986580 7 Bill-Paul 0.02 1.989406 8 Bill-Paul 0.03 1.994618 9 Bill-Paul 0.04 1.956349 10 Bill-Paul 0.05 2.001081 11 Harry-Paul 0.01 6.979835 12 Harry-Paul 0.02 17.926427 13 Harry-Paul 0.03 6.979807 14 Harry-Paul 0.04 6.979807 15 Harry-Paul 0.05 17.881091

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM