简体   繁体   English

根据特定ID找出月份中的日期与R中的求解之间的差异

[英]Finding the difference between the dates in months based on particular ID and solving in R

Below is my sample data set which has many unique ID 's and I have to find out the difference between the first occurrence and the second occurrence in the First_Diff column and the difference between the second occurrence and the third in Second_Diff column and so on. 下面是我的示例数据集,它具有许多唯一的ID ,我必须在First_Diff列中找出第一次出现和第二次出现之间的差异,以及在Second_Diff列中Second_Diff第二出现和第三次出现之间的差异,依此类推。 They are many occurrences in the table and trying to give a sample. 它们在表中多次出现,并试图提供示例。

Input: 输入:

Date    ID
1/3/2006    209
1/3/2006    489
1/3/2006    502
1/3/2006    439
1/3/2006    534
1/3/2006    474
1/3/2006    566
1/3/2006    591
1/4/2006    209
1/4/2007    489
1/5/2007    502
1/7/2006    439
1/3/2008    534
1/3/2007    474
1/3/2008    566
1/7/2009    439
1/3/2009    534
1/3/2009    474
1/3/2010    566

Output: 输出:

ID  First_Diff  Second_Diff Third_DIff
209 1   0   0
489 13  0   0
502 14  0   0
439 3   0   0
534 24  0   0
474 12  0   0
566 24  12  0
591 0   0   0

Can anyone please help me in this. 谁能帮我这个忙。 As this very complicated for me and did not able to solve this for my further findings. 由于这对我来说非常复杂,无法解决我的进一步发现。

This could help with your problem 这可以帮助您解决问题

# Create fake data frame
dates = c(rep(c("2015-01-09", "2015-02-09", "2015-03-08"), 3))
id  = c(rep(c("A", "B", "C"), each = 3))

df =data.frame(date = as.Date(dates), id = as.factor(id)) 

You can calculate the differences using lag() function 您可以使用lag()函数计算差异

# Calculate difference between times
df = df %>% 
  group_by(id) %>%
  mutate(datediff = difftime(date, lag(date)))

And then transform to wide format using spread 然后使用spread转换为宽格式

df_wide = df %>% spread(date, datediff) 

names(df_wide) <-  c("id", "first_diff", "second_diff", "third_diff")

The output is: 输出为:

# A tibble: 3 x 4
# Groups:   id [3]
      id first_diff second_diff third_diff
* <fctr>     <time>      <time>     <time>
1      A    NA days     31 days    27 days
2      B    NA days     31 days    27 days
3      C    NA days     31 days    27 days

First use ave to create a seq column that is 0 for the first occurrence of any ID , 1 for the second and so on. 首先使用ave创建一个seq列,对于任何ID的第一次出现,该seq 0,对于第二个出现的则为1。

Then use tapply to create a matrix mat with each ID being a row and each seq being a column and the content being the number of months since the Epoch. 然后使用tapply创建一个矩阵mat ,每个ID是一个行和每一seq是一列,内容是从纪元的月数。 as.yearmon internally converts each date to a year + frac where jan = 0, feb = 1/12, etc., tapply will convert that to a number and multiplying by 12 gives the number of months. as.yearmon内部将每个日期转换为year + frac,其中jan = 0,feb = 1/12等, tapply会将其转换为数字,然后乘以12得出月份数。

Finally create the result by using the rownames of mat as the ID column and for the remaining columns difference the columns of mat appropriate. 最后,通过将mat用作ID列来创建结果,对于其余的列,请使用mat相应列。 We have used NA rather than 0 to indicate a value that cannot be calculated since otherwise we would not be able to distinguish two occurrences that are 0 months apart and a value that cannot be calculated. 我们使用NA而不是0表示无法计算的值,因为否则我们将无法区分相距0个月的两次事件和无法计算的值。

library(zoo)

DF2 <- transform(DF, seq = ave(ID, ID, FUN = seq_along) - 1)
mat <- with(DF2, 12 * tapply(Date, list(ID, seq), as.yearmon, format = "%d/%m/%Y"))
cbind(ID = rownames(mat), as.data.frame(mat[, -1] - mat[, -ncol(mat)]))

giving: 给予:

     ID  1  2
209 209  1 NA
439 439  4 36
474 474 12 24
489 489 13 NA
502 502 14 NA
534 534 24 12
566 566 24 24
591 591 NA NA

Variation 变异

We could also write the above code in a magrittr pipeline: 我们也可以在magrittr管道中编写以上代码:

library(zoo)
library(magrittr)

DF %>%
   transform(seq = ave(ID, ID, FUN = seq_along) - 1) %>%
   with(12 * tapply(Date, list(ID, seq), as.yearmon, format = "%d/%m/%Y")) %>%
   { cbind(ID = rownames(.), as.data.frame(.[, -1] - mat[, -ncol(.)])) }

Note 注意

The input in reproducible form: 可复制形式的输入:

Lines <- "Date    ID
1/3/2006    209
1/3/2006    489
1/3/2006    502
1/3/2006    439
1/3/2006    534
1/3/2006    474
1/3/2006    566
1/3/2006    591
1/4/2006    209
1/4/2007    489
1/5/2007    502
1/7/2006    439
1/3/2008    534
1/3/2007    474
1/3/2008    566
1/7/2009    439
1/3/2009    534
1/3/2009    474
1/3/2010    566"
DF <- read.table(text = Lines, header = TRUE)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM