I have a dataset that looks like this:
threadid unixtime sent ID
123 1000 0 1
123 1050 1 1
123 1070 0 1
123 2000 1 1
123 2500 1 1
123 3000 0 1
123 1000 0 2
123 1500 0 2
123 2500 1 2
But I want it to look like this:
threadid unixtime sent ID change
123 1000 0 1
123 1050 1 1
123 1070 0 1
123 2000 1 1
123 2500 1 1 1430
123 3000 0 1
123 1000 0 2
123 1500 0 2
123 2500 1 2 1000
So by ID, I want to look for the last occurrence of a "1" and then calculate the time difference between the unix time that corresponds with the 1 and the previous observation (or last observation with a 0 in the "sent" column) that has 0 in the "sent" column. I think this may involve a "for" loop but I've tried a lot of things and just can't quite get it. Any help is greatly appreciated!
This is probably not the most efficient way to do this but you could try:
library(dplyr)
getDiff<-function(x){
x$change<-''
if(sum(unique(x$sent)==c(0,1))==2){
#get the max of the indexes where sent==1
lastSent<-max(which(x$sent==1))
#get the max of the indexes where sent==0 and that are smaller than lastSent
lastBeforeSent<-max(which(x$sent==0)[which(x$sent==0)<lastSent])
x$change[lastSent]<-x$unixtime[lastSent]-x$unixtime[lastBeforeSent]
}
return(x)
}
Run on your data it gives:
threadid unixtime sent ID change
1 123 1000 0 1
2 123 1050 1 1
3 123 1070 0 1
4 123 2000 1 1
5 123 2500 1 1 1430
6 123 3000 0 1
7 123 1000 0 2
8 123 1500 0 2
9 123 2500 1 2 1000
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.