從滿足R中條件的行中提取值

Question

數據集

我有一個大數據框架，其中包含數百萬行和20多個列。 首先讓我描述一下數據是什么，以便使問題更清楚。 原始數據幀包含15分鍾內2169輛車的位置，速度和加速度。 每輛車都有唯一的Vehicle.ID ，即在其中觀察到的時間范圍的ID，即Frame.ID ，該幀中的車輛速度，即svel ，在該幀中的車輛加速度，即sacc和該車輛的類別， vehicle.class ，即1 =摩托車，2 =汽車，3 =卡車。 這些變量每隔0.1秒記錄一次，即每幀為0.1秒。 這是前6行：

> dput(head(df))
structure(list(Vehicle.ID = c(2L, 2L, 2L, 2L, 2L, 2L), Frame.ID = 133:138, 
    Vehicle.class = c(2L, 2L, 2L, 2L, 2L, 2L), Lane = c(2L, 2L, 
    2L, 2L, 2L, 2L), svel = c(37.29, 37.11, 36.96, 36.83, 36.73, 
    36.64), sacc = c(0.07, 0.11, 0.15, 0.19, 0.22, 0.25)), .Names = c("Vehicle.ID", 
"Frame.ID", "Vehicle.class", "Lane", "svel", "sacc"), row.names = 7750:7755, class = "data.frame")

在15分鍾的記錄期內，車輛的行駛中有一些實例完全停止，即svel==0 。 這種情況持續了一些幀，然后車輛再次加速。 為了重現性，我創建一個示例數據集，如下所示：

x <- data.frame(Vehicle.ID = c(rep(10,5), rep(20,5), rep(30,5), rep(40,5), rep(50,5)),
                    vehicle.class = c(rep(2,10), rep(3,10),rep(1,5)),
                    svel = rep(c(1,0,0,0,3),5),
                    sacc = rep(c(0.3,0.001,0.001,0.002,0.5),5))

我想找到什么？

如上所述，一些車輛停止並且在一段時間內具有零速度，但隨后加速以達到速度。 我想找到的加速， sacc他們有一段時間的零速度后應用（從靜止位置移動）。 這意味着我應該能夠看到svel==0的最后一幀之后的第一行。 在該例子中數據，這意味着車（ vehicle.class==2具有） Vehicle.ID==10有一個速度， svel等於1作為第一行中所示。 后來，它停了3幀（連續3行），然后加速到速度svel ，等於3。我希望它在這2幀（車輛10的第4和第5行，得出的是0.002）中應用加速度sacc和0.500）。 這意味着例如數據，以下應該是vehicle.class的輸出：

output <- data.frame(Vehicle.ID = c(10,10,20,20,30,30,40,40,50, 50),
                     vehicle.class = c(2,2,2,2,3,3,3,3,1,1),
                     xf = rep(c('l','f'),10),
                     sacc = rep(c(0.002,0.500),5))

xf標識最后一行l ，其中svel==0而f是其后的第一行。 我已經嘗試使用plyr和for loop的分裂vehicle.class但我不知道如何提取sacc 。

注意

xf應該是輸出的一部分。 它不在給定數據中。
原始數據幀df有2169輛車，有些停了下來，有些卻沒有停，因此並非所有車都svel==0 。
停車的車輛沒有同時停車。 此外， svel==0的行數是不同的車輛。

Answer 1

可能有一種更優雅的方法可以做到這一點，但這可行：

require(data.table)
x <- data.table(x)  ## much easier as data.table
x[, xf:='n']        ## create vector with 'n', neither first nor last

# create diff(svel) shifted upwards, 
# padding last observation with 0 to avoid cycling
x[, dsvel:=c(diff(svel, lag=1), 0), by=Vehicle.ID]

# svel is zero and dsvel positive at the last 0 value
x[svel==0 & dsvel > 0, xf:='l']

# there may be a better way to do this part
# get index of observation next to 'l'
# there is no risk of spilling to next Vehicle.ID,  
# because 'l' can only be second to last
i <- which(x$xf=='l') + 1
x[i, xf:='f']

那應該給你想要的xf向量。

從阿倫編輯：+1 @ilir，一個非常好的答案。 這是使用data.table的內置變量.I和.N的另一種方式：

idx = x[, {
            ix = tail(.I[svel==0L], 1);
            iy = (ix+1L)*((ix+1L) <= .I[.N] | NA) 
            list(idx = c(ix, iy))
          }, by = list(Vehicle.ID, vehicle.class)]$idx

您現在可以使用idx子集通過:=將l和f添加如下：

ans <- x[idx][, xf := c("l", "f")]
    Vehicle.ID vehicle.class svel  sacc xf
 1:         10             2    0 0.002  l
 2:         10             2    3 0.500  f
 3:         20             2    0 0.002  l
 4:         20             2    3 0.500  f
 5:         30             3    0 0.002  l
 6:         30             3    3 0.500  f
 7:         40             3    0 0.002  l
 8:         40             3    3 0.500  f
 9:         50             1    0 0.002  l
10:         50             1    3 0.500  f

.I包含每個組的x行號。 .N包含每個組的觀察數。 請閱讀?data.table了解更多信息。

ix獲得0的最后一次出現。對於每個組，我們使用tail子集對應於最后0的行號。

iy通常應的下一條目= ix+1L 。 但是由於0可能是某個組的最后一個條目，因此我們通過比較(ix+1L) <= .I[.N]檢查是否為(ix+1L) <= .I[.N] 。 如果是FALSE，則意味着ix是最后一個條目，因此我們必須輸出NA，否則我們必須輸出(ix+1L) 。

HTH。

Answer 2

我想我已經提出了一種相當優雅的方式來代表dplyr問題。 對於每輛車，我們都感興趣的行在該行中沒有停止，而是在上一行中停止了：

library(dplyr)
df <- tbl_df(data.frame(
  id = c(rep(10, 5), rep(20, 5), rep(30, 5), rep(40, 5), rep(50, 5)), 
  class = c(rep(2, 10), rep(3, 10), rep(1, 5)), 
  svel = rep(c(1, 0, 0, 0, 3), 5), 
  sacc = rep(c(0.3, 0.001, 0.001, 0.002, 0.5), 5)
))

df %.% group_by(id) %.% 
  mutate(stopped = svel == 0) %.%
  filter(lag(stopped) == TRUE, stopped == FALSE)

#> Source: local data frame [5 x 5]
#> Groups: id
#> 
#>   id class svel sacc stopped
#> 1 10     2    3  0.5   FALSE
#> 2 20     2    3  0.5   FALSE
#> 3 30     3    3  0.5   FALSE
#> 4 40     3    3  0.5   FALSE
#> 5 50     1    3  0.5   FALSE

您可以將其寫得更緊湊一些

df %.% group_by(id) %.% 
  mutate(stopped = svel == 0) %.%
  filter(lag(stopped), !stopped)

#> Source: local data frame [5 x 5]
#> Groups: id
#> 
#>   id class svel sacc stopped
#> 1 10     2    3  0.5   FALSE
#> 2 20     2    3  0.5   FALSE
#> 3 30     3    3  0.5   FALSE
#> 4 40     3    3  0.5   FALSE
#> 5 50     1    3  0.5   FALSE

Answer 3

不確定我是否完全理解這個問題，但是我認為這是您的追求：

x <- data.frame(Vehicle.ID = c(rep(10,5), rep(20,5), rep(30,5), rep(40,5), rep(50,5)),
                vehicle.class = c(rep(2,10), rep(3,10),rep(1,5)),
                svel = rep(c(1,0,0,0,3),5),
                sacc = rep(c(0.3,0.001,0.001,0.002,0.5),5)
)

# find "l" rows, the last row for a given Vehicle.ID where svel==0
l <- FALSE
l[x$svel==0] <- !duplicated(x$Vehicle.ID[x$svel==0], fromLast=TRUE)
# extract all rows following an l row.
x[which(l) + 1, c(1, 2, 4)]

Answer 4

library(data.table)
x = data.table(x)
output = x[xf == "f",sacc.after.zero := sacc, by = vehicle.class]
output[!is.na(sacc.after.zero),]

從滿足R中條件的行中提取值

問題描述

數據集

我想找到什么？

注意

4 個解決方案

解決方案1
1 已采納 2014-04-14 20:40:38

解決方案2
1 2014-04-15 13:17:39

解決方案3
0 2014-04-14 16:20:19

解決方案4
0 2014-04-14 16:53:21

從滿足R中條件的行中提取值

問題描述

數據集

我想找到什么？

注意

4 個解決方案

解決方案1 1 已采納 2014-04-14 20:40:38

解決方案2 1 2014-04-15 13:17:39

解決方案3 0 2014-04-14 16:20:19

解決方案4 0 2014-04-14 16:53:21

解決方案1
1 已采納 2014-04-14 20:40:38

解決方案2
1 2014-04-15 13:17:39

解決方案3
0 2014-04-14 16:20:19

解決方案4
0 2014-04-14 16:53:21