简体   繁体   English

如何根据另一列中的值重新启动序列或引用R中前一列的值

[英]How to restart a sequence based on values in another column OR reference the previous column's value in R

I am trying to number in sequence locations gathered within a certain time period (those with time since previous location >60 seconds). 我试图在一定时间段内收集的序列位置编号(自上次位置> 60秒后有时间的那些)。 I've eliminated columns irrelevant to this question, so example data looks like: 我已经删除了与此问题无关的列,因此示例数据如下所示:

TimeSincePrev TimeSincePrev
1 1
1 1
1 1
1 1
511 511
1 1
2 2
286 286
1 1

My desired output looks like this: (sorry for the underscores, but I couldn't otherwise figure out how to get it to include my spaces to make the columns obvious...) 我想要的输出看起来像这样:(对于下划线感到抱歉,但我无法弄清楚如何让它包含我的空格以使列明显......)

TimeSincePrev ___ NoInSeries TimeSincePrev ___ NoInSeries
1 ________________ 1 1 ________________ 1
1 ________________ 2 1 ________________ 2
1 ________________ 3 1 ________________ 3
1 ________________ 4 1 ________________ 4
511 ______________ 1 511 ______________ 1
1 ________________ 2 1 ________________ 2
2 ________________ 3 2 ________________ 3
286 ______________ 1 286 ______________ 1
1 ________________ 2 1 ________________ 2
...and so on for another 3500 lines ...等等另外3500行

I have tried a couple of ways to approach this unsuccessfully: 我尝试了几种方法来解决这个问题:

First, I tried to do an ifelse, where I would make the NoInSequence 1 if the TimeSincePrev was more than a minute, or else the previous row's value +1..(In this case, I first insert a line number column to help me reference the previous row, but I suspect there is an easier way to do this?) 首先,我尝试做一个ifelse,如果TimeSincePrev超过一分钟,我将使NoInSequence 1,或者前一行的值+1 ..(在这种情况下,我首先插入一个行号列来帮助我引用前一行,但我怀疑有更简单的方法吗?)

df$NoInSeries <- ifelse((dfTimeSincePrev > 60), 1, ((df[((df$LineNo)-1),"NoInSeries"])+1)). df $ NoInSeries < - ifelse((dfTimeSincePrev> 60),1,((df [((df $ LineNo)-1),“NoInSeries”])+ 1))。

I don't get any errors, but it only gives me the 1s where I want to restart sequences but does not fill in any of the other values: 我没有得到任何错误,但它只给我1s我想重新启动序列但不填写任何其他值:

TimeSincePrev ___ NoInSeries TimeSincePrev ___ NoInSeries
1 ________________ NA 1 ________________ NA
1 ________________ NA 1 ________________ NA
1 ________________ NA 1 ________________ NA
1 ________________ NA 1 ________________ NA
511 ______________ 1 511 ______________ 1
1 ________________ NA 1 ________________ NA
2 ________________ NA 2 ________________ NA
286 ______________ 1 286 ______________ 1
1 ________________ NA 1 ________________ NA
I assume this has something to do with trying to reference back to itself? 我认为这与尝试引用自身有关?

My other approach was to try to get it to do sequences of numbers (max 15), restarting every time there is a change in the TimeSincePrev value: 我的另一种方法是尝试让它执行数字序列(最多15个),每次TimeSincePrev值发生更改时重新启动:

df$NoInSeries <- ave(df$TimeSincePrev, df$TimeSincePrev, FUN=function(y) 1:15) df $ NoInSeries < - ave(df $ TimeSincePrev,df $ TimeSincePrev,FUN = function(y)1:15)

I still get no errors but exactly the same output as before, with NAs in place and no other numbers filled in. 我仍然没有错误,但输出与以前完全相同,有NAs,没有填写其他数字。

Thanks for any help! 谢谢你的帮助!

Using ave after creating a group detecting serie's change using ( diff + cumsum ) 创建一个使用( diff + cumsum )检测系列变化的组后使用ave

dt$NoInSeries <- 
      ave(dt$TimeSincePrev,
          cumsum(dt$TimeSincePrev >60),
          FUN=seq)

The result is: 结果是:

dt
# TimeSincePrev NoInSeries
# 1             1          1
# 2             1          2
# 3             1          3
# 4             1          4
# 5           511          1
# 6             1          2
# 7             2          3
# 8           286          1
# 9             1          2

add steps explanation: 添加步骤说明:

## detect time change > 60 seconds 
## group value by the time change
(gg <- cumsum(dt$TimeSincePrev >60))
[1] 0 0 0 0 1 1 1 2 2

## get the sequence by group 
ave(dt$TimeSincePrev, gg, FUN=seq)
[1] 1 2 3 4 1 2 3 1 2

Using data.table 使用data.table

library(data.table)
setDT(dt)[,NoInSeries:=seq_len(.N), by=cumsum(TimeSincePrev >60)]
dt
#     TimeSincePrev NoInSeries
#1:             1          1
#2:             1          2
#3:             1          3
#4:             1          4
#5:           511          1
#6:             1          2
#7:             2          3
#8:           286          1
#9:             1          2

Or 要么

  indx <- c(which(dt$TimeSincePrev >60)-1, nrow(dt))
  sequence(c(indx[1], diff(indx)))
   #[1] 1 2 3 4 1 2 3 1 2

data 数据

 dt <- data.frame(TimeSincePrev=c(1,1,1,1,511, 1,2, 286,1))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM