[英]Calculate Difference between dates by group in R
I'm using a logistic exposure to calculate hatching success for bird nests.我正在使用逻辑暴露来计算鸟巢的孵化成功率。 My data set is quite extensive and I have ~2,000 nests, each with a unique ID ("ClutchID). I need to calculate the number of days a given nest was exposed ("Exposure"), or more simply, the difference between the 1st and last day. I used the following code:
我的数据集非常广泛,我有大约 2,000 个巢穴,每个巢穴都有一个唯一的 ID(“ClutchID”)。我需要计算给定巢穴暴露的天数(“暴露”),或者更简单地说,是第一天也是最后一天。我使用了以下代码:
HS_Hatch$Exposure=NA
for(i in 2:nrow(HS_Hatch)){HS_Hatch$Exposure[i]=HS_Hatch$DateVisit[i]- HS_Hatch$DateVisit[i-1]}
where HS_Hatch is my dataset and DateVisit is the actual date.其中 HS_Hatch 是我的数据集,DateVisit 是实际日期。 The only problem is R is calculating an exposure value for the 1st date (which doesn't make sense).
唯一的问题是 R 正在计算第一个日期的曝光值(这没有意义)。
What I really need is to calculate the difference between the 1st and last date for a given clutch.我真正需要的是计算给定离合器的第一个日期和最后一个日期之间的差异。 I've also looked into the following:
我还研究了以下内容:
Exposure=ddply(HS_Hatch, "ClutchID", summarize,
orderfrequency = as.numeric(diff.Date(DateVisit)))
df %>%
mutate(Exposure = as.Date(HS_Hatch$DateVisit, "%Y-%m-%d")) %>%
group_by(ClutchID) %>%
arrange(Exposure) %>%
mutate(lag=lag(DateVisit), difference=DateVisit-lag)
I'm still learning R so any help would be greatly appreciated.我仍在学习 R,因此任何帮助将不胜感激。
Edit: Below is a sample of the data I'm using编辑:以下是我正在使用的数据示例
HS_Hatch <- structure(list(ClutchID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L
), DateVisit = c("3/15/2012", "3/18/2012", "3/20/2012", "4/1/2012",
"4/3/2012", "3/18/2012", "3/20/2012", "3/22/2012", "4/3/2012",
"4/4/2012", "3/22/2012", "4/3/2012", "4/4/2012", "3/18/2012",
"3/20/2012", "3/22/2012", "4/2/2012", "4/3/2012", "4/4/2012",
"3/20/2012", "3/22/2012", "3/25/2012", "3/27/2012", "4/4/2012",
"4/5/2012"), Year = c(2012L, 2012L, 2012L, 2012L, 2012L, 2012L,
2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L,
2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L,
2012L), Survive = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -25L), .Names = c("ClutchID",
"DateVisit", "Year", "Survive"), spec = structure(list(cols = structure(list(
ClutchID = structure(list(), class = c("collector_integer",
"collector")), DateVisit = structure(list(), class = c("collector_character",
"collector")), Year = structure(list(), class = c("collector_integer",
"collector")), Survive = structure(list(), class = c("collector_integer",
"collector"))), .Names = c("ClutchID", "DateVisit", "Year",
"Survive")), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
Collecting some of the comments...收集一些评论...
dplyr
dplyr
We need only the dplyr
package for this problem.我们只需要
dplyr
包来解决这个问题。 If we load other packages, eg plyr
, it can cause conflicts if both packages have functions with the same name.如果我们加载其他包,例如
plyr
,如果两个包都具有相同名称的函数,则可能会导致冲突。 Let's load only dplyr
.让我们只加载
dplyr
。
library(dplyr)
In the future, you may wish to load tidyverse
instead -- it includes dplyr
and other related packages, for graphics, etc.将来,您可能希望加载
tidyverse
它包括dplyr
和其他相关包,用于图形等。
Let's convert the DateVisit
variable from character strings to something R can interpret as a date.让我们将
DateVisit
变量从字符串转换为 R 可以解释为日期的内容。 Once we do this, it allows R to calculate differences in days by subtracting two dates from each other.一旦我们这样做,它允许 R 通过将两个日期相减来计算天数差异。
HS_Hatch <- HS_Hatch %>%
mutate(date_visit = as.Date(DateVisit, "%m/%d/%Y"))
The date format %m/%d/%Y
is different from your original code.日期格式
%m/%d/%Y
与您的原始代码不同。 This date format needs to match how dates look in your data.此日期格式需要与日期在数据中的外观相匹配。
DateVisit
has dates as month/day/year, so we use %m/%d/%Y
. DateVisit
日期为月/日/年,因此我们使用%m/%d/%Y
。
Also, you don't need to specify the dataset for DateVisit
inside mutate
, as in HS_Hatch$DateVisit
, because it's already looking in HS_Hatch
.此外,您不需要在
mutate
为DateVisit
指定数据集,就像在HS_Hatch$DateVisit
,因为它已经在HS_Hatch
中HS_Hatch
。 The code HS_Hatch %>% ...
says 'use HS_Hatch
for the following steps'.代码
HS_Hatch %>% ...
表示“将HS_Hatch
用于以下步骤”。
To calculate exposure, we need to find the first date, last date, and then the difference between the two, for each set of rows by ClutchID
.要计算曝光度,我们需要通过
ClutchID
为每组行找到第一个日期、最后一个日期,然后找到两者之间的差异。 We use summarize
, which collapses the data to one row per ClutchID
.我们使用
summarize
,它将数据折叠为每个ClutchID
一行。
exposure <- HS_Hatch %>%
group_by(ClutchID) %>%
summarize(first_visit = min(date_visit),
last_visit = max(date_visit),
exposure = last_visit - first_visit)
first_visit = min(date_visit)
will find the minimum date_visit
for each ClutchID
separately, since we are using group_by(ClutchID)
. first_visit = min(date_visit)
将分别找到每个ClutchID
的最小date_visit
,因为我们使用的是group_by(ClutchID)
。
exposure = last_visit - first_visit
takes the newly-calculated first_visit
and last_visit
and finds the difference in days. exposure = last_visit - first_visit
采用新计算的first_visit
和last_visit
并找出天数差异。
This creates the following result:这将创建以下结果:
ClutchID first_visit last_visit exposure
<int> <date> <date> <dbl>
1 1 2012-03-15 2012-04-03 19
2 2 2012-03-18 2012-04-04 17
3 3 2012-03-22 2012-04-04 13
4 4 2012-03-18 2012-04-04 17
5 5 2012-03-20 2012-04-05 16
If you want to keep all the original rows, you can use mutate
in place of summarize
.如果要保留所有原始行,可以使用
mutate
代替summarize
。
Here is a similar solutions if you look for a difftime results in days, from a vector date
, without NA values produce in the new column, and if you expect to group by several conditions/groups.这是一个类似的解决方案,如果您从向量
date
查找以天为单位的 difftime 结果,而在新列中没有 NA 值,并且您希望按多个条件/组进行分组。
make sure that your vector of date as been converting in the good format as previously explained.确保您的日期向量以之前解释的良好格式进行转换。
dat2 <- dat %>%
select(group1, group2, date) %>%
arrange(group1, group2, date) %>%
group_by(group1, group2) %>%
mutate(diff_date = c(0,diff(date)))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.