簡體   English   中英

如何按組計算不同ID隨時間的距離?

[英]How to calculate distance between different IDs over time by group?

我的(示例)數據結構如下...其中參與者的 X 和 Y 坐標在不同條件下記錄,隨時間收集:

    Individ <- data.frame(Participant = c("Bill", "Bill", "Bill", "Bill", "Bill", "Harry", "Harry", "Harry", "Harry","Harry", "Paul", "Paul", "Paul", "Paul", "Paul"),
                          Time = c(0.01, 0.02, 0.03, 0.04, 0.05, 0.01, 0.02, 0.03, 0.04, 0.05, 0.01, 0.02, 0.03, 0.04, 0.05),
                          Condition = c("Expr", "Expr", "Expr", "Expr", "Expr", "Con", "Con", "Con", "Con", "Con", "Nor", "Nor", "Nor", "Nor", "Nor"),
                          X = c(26.07, 26.06, 26.05, 26.09, 26.04, 26.65, 26.64, 26.62, 26.63, 26.62, 27.99, 28.01, 28.01, 28.02, 28.02),
                          Y = c(-5.01, -5.12, -5.14, -5.18, -5.2065, -12.37, 12.36, -12.35, -12.34, 12.33, -5.52, -5.514, -5.51, -5.50, -5.4962))

X 和 Y 坐標是從同一位置捕獲的。 我可以使用以下方法計算每個參與者覆蓋的距離:

require(plyr)
require(dplyr)
DistanceOutput <- Individ %>%
     arrange(Participant, Time, Condition) %>%
     group_by(Participant, Condition) %>%
     mutate( lagX = lag(X, order_by=Time), lagY = lag(Y, order_by=Time)) %>%
     rowwise() %>%
     mutate(Distance = dist( matrix( c(X,Y,lagX,lagY),nrow=2,byrow=TRUE) )) %>%
     select(-lagX, -lagY)

但是,我怎么能計算出每個之間的距離Participant超過Time ,根據自己的Condition 例如,比爾和哈利之間的距離,比爾和保羅加上哈利和保羅隨着時間的推移?

我的數據集是 179,800 obs。 所以理想情況下,首選快速解決方案。 謝謝!

這是一種計算每個參與者在每個時間點之間距離的方法。 我懷疑這是最有效的方法,但也許其他人會提出更優雅的解決方案。

您說過要計算每個Condition參與者之間的距離。 在您的示例數據中,每個條件只有一個參與者。 但是,除了Time之外,下面的解決方案可以很容易地擴展到Condition應用。

library(reshape2)
library(dplyr)

# Calculate distance matrix for each Time
res = lapply(unique(Individ$Time), function(i) {

  mat = as.matrix(Individ[Individ$Time==i, c("X","Y")])
  rownames(mat) = Individ$Participant[Individ$Time==i]

  # Distance matrix
  d = as.matrix(dist(mat))

  # Keep only lower triangle
  d[upper.tri(d, diag=TRUE)] = NA

  # Return data frame with distances, time and participants
  data.frame(Time=i, d) %>% add_rownames("P1")
})

# Combine all time points into single long data frame of distances
res = bind_rows(res) %>% 
  melt(id.var=c("Time","P1"), variable.name="P2", value.name="Distance") %>%
  filter(!is.na(Distance)) %>% 
  rowwise %>%
  mutate(Pair = paste(sort(c(as.character(P1), as.character(P2))), collapse="-")) %>% 
  select(Pair, Time, Distance) %>%
  arrange(Pair, Time)

res
 Pair Time Distance 1 Bill-Harry 0.01 7.382818 2 Bill-Harry 0.02 17.489620 3 Bill-Harry 0.03 7.232496 4 Bill-Harry 0.04 7.180334 5 Bill-Harry 0.05 17.546089 6 Bill-Paul 0.01 1.986580 7 Bill-Paul 0.02 1.989406 8 Bill-Paul 0.03 1.994618 9 Bill-Paul 0.04 1.956349 10 Bill-Paul 0.05 2.001081 11 Harry-Paul 0.01 6.979835 12 Harry-Paul 0.02 17.926427 13 Harry-Paul 0.03 6.979807 14 Harry-Paul 0.04 6.979807 15 Harry-Paul 0.05 17.881091

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM