简体   繁体   English

在R数据框中的行之间比较日期

[英]Comparing Dates among rows in an R dataframe

novice R user here...I'm trying to compare dates for each id and determine which entry is earlier or later. 新手R用户在这里...我正在尝试比较每个ID的日期,并确定哪个条目是更早或更晚。 The input data would look something like this: 输入数据如下所示:

id    date
101   18-Sep-12
101   21-Aug-12
102   25-Mar-13
102   15-Apr-13

And the output would look something like this: 输出看起来像这样:

id    date         Category
101   18-Sep-12    Late
101   21-Aug-12    Early
102   25-Mar-13    Early
102   15-Apr-13    Late

-Justin -Justin

If your data frame is df : 如果您的数据帧是df

df$date <- as.Date(df$date, format="%d-%b-%y")
df = df[order(df$id, df$date),]
df$Category = c("Early", "Late")

You can use plyr here : 您可以在这里使用plyr

library(plyr)
loc <- Sys.setlocale("LC_TIME", "ENGLISH")
dat$date <- as.Date(dat$date, format = "%d-%b-%y")
ddply(dat, .(id), transform, cat = ifelse(date == min(date), "EARLY", "LATE"))
##    id       date   cat
## 1 101 2012-09-18  LATE
## 2 101 2012-08-21 EARLY
## 3 102 2013-03-25 EARLY
## 4 102 2013-04-15  LATE
Sys.setlocale("LC_TIME", loc)

I would probably look into using the "data.table" package. 我可能会考虑使用“ data.table”包。

The general approach I would use is to use order or rank to create your "category" column. 我将使用的一般方法是使用orderrank来创建“类别”列。 The thing that's nice here is that you are not really limited by comparing two dates. 这里的好处是比较两个日期并没有真正限制您。

DT <- data.table(df)
DT[, category := order(date), by = id]
DT
#     id       date category
# 1: 101 2012-09-18        2
# 2: 101 2012-08-21        1
# 3: 102 2013-03-25        1
# 4: 102 2013-04-15        2

If you wanted text labels, you can use factor : 如果需要文本标签,可以使用factor

DT[, category := factor(category, labels = c("Early", "Late"))]
DT
#     id       date category
# 1: 101 2012-09-18     Late
# 2: 101 2012-08-21    Early
# 3: 102 2013-03-25    Early
# 4: 102 2013-04-15     Late

For convenience, this is the "df" that I started with: 为了方便起见,这是我开始使用的“ df”:

df <- structure(list(id = c(101L, 101L, 102L, 102L), 
    date = structure(c(15601, 15573, 15789, 15810), class = "Date")), 
    .Names = c("id", "date"), row.names = c(NA, -4L), class = "data.frame")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM