简体   繁体   English

R - 将数字向量拆分为间隔

[英]R - Split numeric vector into intervals

I have a question regarding the "splitting" of a vector, although different approaches might be feasible. 我有一个关于矢量“分裂”的问题,尽管不同的方法可能是可行的。 I have a data.frame(df) which looks like this (simplified version): 我有一个data.frame(df),看起来像这样(简化版):

   case time
1   1   5
2   2   3
3   3   4

The "time" variable counts units of time (days, weeks etc) until an event occurs. “时间”变量计算事件发生前的时间单位(天,周等)。 I would like to expand the data set by increasing the number of rows and "split" the "time" into intervals of length 1, beginning at 2. The result might then look something like this: 我想通过增加行数来扩展数据集,并将“时间”“拆分”为长度为1的间隔,从2开始。结果可能如下所示:

    case    time    begin   end
1   1       5       2       3
2   1       5       3       4
3   1       5       4       5
4   2       3       2       3
5   3       4       2       3
6   3       4       3       4

Obviously, my data set is a bit larger than this example. 显然,我的数据集比这个例子略大。 What would be a feasible method to achieve this result? 实现这一结果的可行方法是什么?

I had one idea of beginning with 我有一个开头的想法

df.exp <- df[rep(row.names(df), df$time - 2), 1:2]

in order to expand the number of rows per case, according to the number of time intervals. 为了扩大每个案例的行数,根据时间间隔的数量。 Based on this, a "begin" and "end" column might be added in the fashion of: 基于此,可以以下列方式添加“开始”和“结束”列:

df.exp$begin <- 2:(df.exp$time-1)

However, I'm not successful at creating the respective columns, because this command only uses the first row to calculate (df.exp$time-1) and doesn't automatically distinguish by "case". 但是,我没有成功创建相应的列,因为此命令仅使用第一行来计算(df.exp $ time-1),并且不会自动区分“case”。

Any ideas would be very much appreciated! 任何想法将非常感谢!

You can try 你可以试试

df2 <- df1[rep(1:nrow(df1), df1$time-2),]
row.names(df2) <- NULL
m1 <- do.call(rbind,
          Map(function(x,y) {
                  v1 <- seq(x,y)
                  cbind(v1[-length(v1)],v1[-1L])},
                  2, df1$time))
df2[c('begin', 'end')] <- m1
df2
#  case time begin end
#1    1    5     2   3
#2    1    5     3   4
#3    1    5     4   5
#4    2    3     2   3
#5    3    4     2   3
#6    3    4     3   4

Or an option with data.table 或者是data.table的选项

library(data.table)
setDT(df1)[,{tmp <- seq(2, time)
               list(time= time,
                    begin= tmp[-length(tmp)],
                    end=tmp[-1])} , by = case]
#   case time begin end
#1:    1    5     2   3
#2:    1    5     3   4
#3:    1    5     4   5
#4:    2    3     2   3
#5:    3    4     2   3
#6:    3    4     3   4
library(data.table)
DT <- as.data.table(df)
DT[, rep(time, time-2), case][, begin := 2:(.N+1), case][, end := begin +1][]
#   case V1 begin end
#1:    1  5     2   3
#2:    1  5     3   4
#3:    1  5     4   5
#4:    2  3     2   3
#5:    3  4     2   3
#6:    3  4     3   4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM