[英]Finding the difference between the dates in months based on particular ID and solving in R
Below is my sample data set which has many unique ID
's and I have to find out the difference between the first occurrence and the second occurrence in the First_Diff
column and the difference between the second occurrence and the third in Second_Diff
column and so on. 下面是我的示例数据集,它具有许多唯一的
ID
,我必须在First_Diff
列中找出第一次出现和第二次出现之间的差异,以及在Second_Diff
列中Second_Diff
第二出现和第三次出现之间的差异,依此类推。 They are many occurrences in the table and trying to give a sample. 它们在表中多次出现,并试图提供示例。
Input: 输入:
Date ID
1/3/2006 209
1/3/2006 489
1/3/2006 502
1/3/2006 439
1/3/2006 534
1/3/2006 474
1/3/2006 566
1/3/2006 591
1/4/2006 209
1/4/2007 489
1/5/2007 502
1/7/2006 439
1/3/2008 534
1/3/2007 474
1/3/2008 566
1/7/2009 439
1/3/2009 534
1/3/2009 474
1/3/2010 566
Output: 输出:
ID First_Diff Second_Diff Third_DIff
209 1 0 0
489 13 0 0
502 14 0 0
439 3 0 0
534 24 0 0
474 12 0 0
566 24 12 0
591 0 0 0
Can anyone please help me in this. 谁能帮我这个忙。 As this very complicated for me and did not able to solve this for my further findings.
由于这对我来说非常复杂,无法解决我的进一步发现。
This could help with your problem 这可以帮助您解决问题
# Create fake data frame
dates = c(rep(c("2015-01-09", "2015-02-09", "2015-03-08"), 3))
id = c(rep(c("A", "B", "C"), each = 3))
df =data.frame(date = as.Date(dates), id = as.factor(id))
You can calculate the differences using lag()
function 您可以使用
lag()
函数计算差异
# Calculate difference between times
df = df %>%
group_by(id) %>%
mutate(datediff = difftime(date, lag(date)))
And then transform to wide format using spread
然后使用
spread
转换为宽格式
df_wide = df %>% spread(date, datediff)
names(df_wide) <- c("id", "first_diff", "second_diff", "third_diff")
The output is: 输出为:
# A tibble: 3 x 4
# Groups: id [3]
id first_diff second_diff third_diff
* <fctr> <time> <time> <time>
1 A NA days 31 days 27 days
2 B NA days 31 days 27 days
3 C NA days 31 days 27 days
First use ave
to create a seq
column that is 0 for the first occurrence of any ID
, 1 for the second and so on. 首先使用
ave
创建一个seq
列,对于任何ID
的第一次出现,该seq
0,对于第二个出现的则为1。
Then use tapply
to create a matrix mat
with each ID
being a row and each seq
being a column and the content being the number of months since the Epoch. 然后使用
tapply
创建一个矩阵mat
,每个ID
是一个行和每一seq
是一列,内容是从纪元的月数。 as.yearmon
internally converts each date to a year + frac where jan = 0, feb = 1/12, etc., tapply
will convert that to a number and multiplying by 12 gives the number of months. as.yearmon
内部将每个日期转换为year + frac,其中jan = 0,feb = 1/12等, tapply
会将其转换为数字,然后乘以12得出月份数。
Finally create the result by using the rownames of mat
as the ID
column and for the remaining columns difference the columns of mat
appropriate. 最后,通过将
mat
用作ID
列来创建结果,对于其余的列,请使用mat
相应列。 We have used NA rather than 0 to indicate a value that cannot be calculated since otherwise we would not be able to distinguish two occurrences that are 0 months apart and a value that cannot be calculated. 我们使用NA而不是0表示无法计算的值,因为否则我们将无法区分相距0个月的两次事件和无法计算的值。
library(zoo)
DF2 <- transform(DF, seq = ave(ID, ID, FUN = seq_along) - 1)
mat <- with(DF2, 12 * tapply(Date, list(ID, seq), as.yearmon, format = "%d/%m/%Y"))
cbind(ID = rownames(mat), as.data.frame(mat[, -1] - mat[, -ncol(mat)]))
giving: 给予:
ID 1 2
209 209 1 NA
439 439 4 36
474 474 12 24
489 489 13 NA
502 502 14 NA
534 534 24 12
566 566 24 24
591 591 NA NA
We could also write the above code in a magrittr pipeline: 我们也可以在magrittr管道中编写以上代码:
library(zoo)
library(magrittr)
DF %>%
transform(seq = ave(ID, ID, FUN = seq_along) - 1) %>%
with(12 * tapply(Date, list(ID, seq), as.yearmon, format = "%d/%m/%Y")) %>%
{ cbind(ID = rownames(.), as.data.frame(.[, -1] - mat[, -ncol(.)])) }
The input in reproducible form: 可复制形式的输入:
Lines <- "Date ID
1/3/2006 209
1/3/2006 489
1/3/2006 502
1/3/2006 439
1/3/2006 534
1/3/2006 474
1/3/2006 566
1/3/2006 591
1/4/2006 209
1/4/2007 489
1/5/2007 502
1/7/2006 439
1/3/2008 534
1/3/2007 474
1/3/2008 566
1/7/2009 439
1/3/2009 534
1/3/2009 474
1/3/2010 566"
DF <- read.table(text = Lines, header = TRUE)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.