简体   繁体   English

如何将按因子级别划分的列表重新组合到原始数据框?

[英]How do I recombine a list split by factor level to the original dataframe?

I have some tracking data where I want to calculate the time difference between each point which I can do with this: 我有一些跟踪数据,我想在此计算每个点之间的时间差:

# prep the data
ID = c(rep("A",5), rep("B",5))
DateTime = c("2014-09-25 08:39:45", "2014-09-25 08:39:48", "2014-09-25 08:40:44", "2014-09-25 09:04:00","2014-09-25 09:04:10", "2014-09-25 08:33:32", "2014-09-25 08:34:41", "2014-09-25 08:35:24", "2014-09-25 09:04:00", "2014-09-25 09:04:09")
speed = c(1:10)
df = data.frame(ID,DateTime,speed, stringsAsFactors = FALSE)
df$DateTime<-as.POSIXct(df$DateTime, tz = "UTC")

# function to calculate time differences 
timeCheck<-function(df) {
  sapply(1:(nrow(df) - 1), function(i){
    timeDiff<- difftime(df$DateTime[i+1], df$DateTime[i], units = "sec" )
    return(timeDiff)
  })
}
# preserve order of factor levels 
df$ID <- factor(df$ID, levels=unique(df$ID))

# apply the function by ID
timeDiffData<-sapply(split(df, df$ID), timeCheck)

I want to be able to add a new column of the time differences to the original dataframe but of course this list is a different length, because the function doesn't calculate the time difference from itself. 我希望能够将时差的新列添加到原始数据帧,但是此列表的长度当然是不同的,因为该函数不会根据自身计算时差。

I then want to use these time differences in a new function to split the tracks if the difference is greater than a certain value (say 100 seconds for the sake of example) and have the ID reflect this. 然后,我想在一个新函数中使用这些时差,如果时差大于某个值(例如,举例来说为100秒),则分割轨道,并让ID反映出来。

So in the end I'd have 4 levels for my ID column and the split would occur when the time difference is > 100 seconds. 因此,最后我将为我的ID列设置4个级别,并且当时差> 100秒时将发生拆分。

The resulting dataframe should look something like: 产生的数据框应类似于:

# what it should look like 
ID = c(rep("A",3),rep("A1",2) , rep("B",3), rep("B1",2))
DateTime = c("2014-09-25 08:39:45", "2014-09-25 08:39:48", "2014-09-25 08:40:44", "2014-09-25 09:04:00","2014-09-25 09:04:10", "2014-09-25 08:33:32", "2014-09-25 08:34:41", "2014-09-25 08:35:24", "2014-09-25 09:04:00", "2014-09-25 09:04:09")
speed = c(1:10)
timeDiff<-c(NA,3,56,1396,10,NA,69,43,1716,9)
newdf = data.frame(ID,DateTime,speed,timeDiff, stringsAsFactors = FALSE)
newdf$DateTime<-as.POSIXct(df$DateTime, tz = "UTC")
newdf

Really your operation has three steps: 确实,您的操作包含三个步骤:

  • Group your data by ID 按ID对数据分组
  • Compute the time differences between each timestamp in your group (the first time difference is NA) 计算组中每个时间戳之间的时差(第一个时差为NA)
  • Create a new ID that counts the number of prior time gaps that are large (eg > 100 seconds) 创建一个新的ID,该ID计算较大的先前时间间隔的数量(例如,> 100秒)

This can be done pretty simply with dplyr , using group_by for the grouping and mutate for computing new variables within each group: 这可以使用dplyr ,使用group_by进行分组,并使用mutate计算每个组中的新变量:

library(dplyr)
df %>%
  group_by(ID) %>%
  mutate(timeDiff = c(NA, difftime(tail(DateTime, -1), head(DateTime, -1), units="sec"))) %>%
  mutate(newID = paste0(ID, cumsum(!is.na(timeDiff) & timeDiff > 100))) %>%
  ungroup()
# A tibble: 10 × 5
#       ID            DateTime speed timeDiff newID
#    <chr>              <dttm> <int>    <dbl> <chr>
# 1      A 2014-09-25 08:39:45     1       NA    A0
# 2      A 2014-09-25 08:39:48     2        3    A0
# 3      A 2014-09-25 08:40:44     3       56    A0
# 4      A 2014-09-25 09:04:00     4     1396    A1
# 5      A 2014-09-25 09:04:10     5       10    A1
# 6      B 2014-09-25 08:33:32     6       NA    B0
# 7      B 2014-09-25 08:34:41     7       69    B0
# 8      B 2014-09-25 08:35:24     8       43    B0
# 9      B 2014-09-25 09:04:00     9     1716    B1
# 10     B 2014-09-25 09:04:09    10        9    B1

One answer that worked perfectly was deleted by the author. 作者删除了一个效果最佳的答案。 Here it is for posterity: 这里是给后代的:

library(data.table)
setDT(df)[ , ID2 := paste0(ID, cumsum(c(0, diff(DateTime)) > 100)), by = ID]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何拆分数据帧进行并行处理,然后重新组合结果? - How to split a dataframe for parallel processing and then recombine the results? 如何在因子水平之间切换? - How do I switch between factor level? 如何将底图添加到按因子级别拆分的数据? - How can I add a base map to data split by a factor level? 通过因子值将数据帧分解为子集,发送到返回glm类的函数,如何重新组合? - break dataframe into subsets by factor values, send to function that returns glm class, how to recombine? 我如何在没有循环的情况下通过数据帧中该级别中另一个因子的子集来操作因子级别内的数据 - How can i manipulate data within a factor level by a subset of another factor in that level in a dataframe without loops R:将 dataframe 拆分为列并重新组合为行 - R: Split dataframe into columns and recombine to rows 如何增加或减少有序因子的水平? (使之等于下一个水平) - How do I increase or decrease an ordered factor's level? (Make a factor equal to the very next level) 拆分和重组数据:有没有一种有效的方法来做到这一点 - Split and recombine data: Is there an efficient way to do this 如何添加列为因子水平比例的列 - How do I add a column that is the proportion of a factor level 如何删除R中的一个因子水平? - how do I remove one factor level in R?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM