For instance, suppose I have the following dataframe:
ID<-c("A", "A", "B", "B", "B", "C")
StartDate<-as.Date(c("2018-01-01", "2019-02-05", "2016-04-18", "2020-03-03", "2021-12-13", "2014-03-03"), "%Y-%m-%d")
TermDate<-as.Date(c("2018-02-01", NA, "2016-05-18", "2020-04-03", "2021-12-15", "2014-04-03"), "%Y-%m-%d")
df<-data.frame(ID=ID, StartDate=StartDate, TermDate=TermDate)
ID StartDate TermDate
1 A 2018-01-01 2018-02-01
2 A 2019-02-05 <NA>
3 B 2016-04-18 2016-05-18
4 B 2020-03-03 2020-04-03
5 B 2021-12-13 2021-12-15
6 C 2014-03-03 2014-04-03
What I'm ultimately trying to get is the following:
ID StartDate TermDate
1 A 2018-01-01 <NA>
2 B 2016-04-18 2021-12-15
3 C 2014-03-03 2014-04-03
There are functions first
and last
in dplyr
and data.table
that could help here.
library(dplyr)
df %>%
group_by(ID) %>%
summarise(StartDate = first(StartDate),
TermDate = last(TermDate))
# ID StartDate TermDate
#* <chr> <date> <date>
#1 A 2018-01-01 NA
#2 B 2016-04-18 2021-12-15
#3 C 2014-03-03 2014-04-03
With data.table
:
library(data.table)
setDT(df)[, .(StartDate = first(StartDate), TermDate = last(TermDate)), ID]
Using min
and max
instead of first
and last
will eliminate the need for sorting the data, if not already
df %>% group_by(ID) %>%
summarise(StartDate = min(StartDate),
TermDate = max(TermDate))
# A tibble: 3 x 3
ID StartDate TermDate
* <chr> <date> <date>
1 A 2018-01-01 NA
2 B 2016-04-18 2021-12-15
3 C 2014-03-03 2014-04-03
See if your df is like this
> df
ID StartDate TermDate
1 A 2019-02-05 <NA>
2 A 2018-01-01 2018-02-01
3 B 2016-04-18 2016-05-18
4 B 2020-03-03 2020-04-03
5 B 2021-12-13 2021-12-15
6 C 2014-03-03 2014-04-03
df %>% group_by(ID) %>%
summarise(StartDate = first(StartDate),
TermDate = last(TermDate))
# A tibble: 3 x 3
ID StartDate TermDate
* <chr> <date> <date>
1 A 2019-02-05 2018-02-01
2 B 2016-04-18 2021-12-15
3 C 2014-03-03 2014-04-03
We can also do
library(dplyr)
df %>%
group_by(ID) %>%
summarise(StartDate = StartDate[1]),
TermDate = TermDate[n()])
Another data.table
option
setDT(df)[
,
as.list(
setNames(
data.frame(.SD)[cbind(c(1, .N), c(1, 2))],
names(.SD)
)
), ID
]
gives
ID StartDate TermDate
1: A 2018-01-01 <NA>
2: B 2016-04-18 2021-12-15
3: C 2014-03-03 2014-04-03
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.