data.table count observations close in distance and time of the current observation 統計當前觀測距離和時間接近的觀測值

Question

我希望通過計算值在當前行的 sec +/- 5 內和 x +/- 5 內以及 y +/- 5 內的次數來計算新列“擁塞”。 本質上，我想找到在當前觀察的近距離 (x,y) 和時間段 (秒) 內發生的觀察，這只是一個很大的 count ifelse 語句。 所有值都是數字。

當前 data.table

data <- data.table(x = c(1,3,10,15,6), 
y = c(5,5,11,14,19), 
sec=c(1,3,5,6,9))

需要 output

data <- data.table(x = c(1,3,10,15,6), 
y = c(5,5,11,14,6), 
sec=c(1,3,5,6,7),
congestion = c(1,2,1,1,2)

data.table 中的首選解決方案，但很高興在 dplyr 中工作。

Answer 1

鑒於您指定的標准，我認為您的“期望輸出”是不正確的。

但是，如果您的數據足夠小，您可以對data本身進行全連接，並過濾掉無效組合

library(data.table)

data <- data.table(x = c(1,3,10,15,6), 
                   y = c(5,5,11,14,19), 
                   sec=c(1,3,5,6,9))

data[, join_key := 1L ]     ## specify a key on which to join

data[
  data
  , on = .(join_key)                        ## Full Join to put all possible combinations together
  , allow.cartesian = TRUE
][
  (x >= i.x * 5 * -1 & x <= i.x * 5) &           ## Filter the valid combinations
    (y >= i.y * 5 * -1 & y <= i.y * 5) &
    (sec >= i.sec - 5 & sec <= i.sec + 5)
  , .(
    congestion = .N
  )
  , by = .(x, y, sec)
]

#     x  y sec congestion
# 1:  1  5   1          4
# 2:  3  5   3          4
# 3: 10 11   5          4
# 4: 15 14   6          4
# 5:  6 19   9          3

一種稍微更有效的方法可能是by =.EACHI加入（借用這個答案的概念

data[, row_idx := 1L]

data[
  data
  , {
    idx = (x >= i.x * 5 * -1 & x <= i.x * 5) &
      (y >= i.y * 5 * -1 & y <= i.y * 5) & 
      (sec >= i.sec - 5 & sec <= i.sec + 5)
    .(
      x = x[ idx ]
      , y = y[ idx ]
      , sec = sec[ idx ]
    )
  }
  , on = .(row_idx)
  , by = .EACHI
][
  , .(congestion = .N)
  , by = .(x, y, sec)
]

#     x  y sec congestion
# 1:  1  5   1          4
# 2:  3  5   3          4
# 3: 10 11   5          4
# 4: 15 14   6          4
# 5:  6 19   9          3

Answer 2

您可以定義限制並加入它們：

data[,`:=`(x_high = x +5,
           x_low = x - 5,
           y_high = y + 5,
           y_low = y - 5,
           sec_high = sec +5,
           sec_low = sec - 5)]

data[data,.(x,y,sec,x.x,x.y,x.sec),
          on=.(x>=x_low,
               x<=x_high,
               y>=y_low,
               y<=y_high,
               sec>=sec_low,
               sec<=sec_high)][
      !(x==x.x&y==x.y&sec==x.sec),.(congestion=.N),by=.(x,y,sec)]

       x     y   sec congestion
   <num> <num> <num>      <int>
1:     1     5     1          1
2:     3     5     3          1
3:    10    11     5          1
4:    15    14     6          1

根據 +/- 5 規則，我發現擁堵程度低於您的預期結果。 如果我正確理解這些約束，這對我來說似乎是正確的。

data.table count observations close in distance and time of the current observation 統計當前觀測距離和時間接近的觀測值

問題描述

2 個解決方案

解決方案1
1 2022-05-03 01:51:05

解決方案2
1 2022-05-04 15:46:36

data.table count observations close in distance and time of the current observation 統計當前觀測距離和時間接近的觀測值

問題描述

2 個解決方案

解決方案1 1 2022-05-03 01:51:05

解決方案2 1 2022-05-04 15:46:36

解決方案1
1 2022-05-03 01:51:05

解決方案2
1 2022-05-04 15:46:36