簡體   English   中英

從數據框中的行中提取最新日期

[英]Extracting most recent date from rows in a data frame

我正在使用一個具有多個相互關聯的日期的數據框,但最重要的是我需要提取最新日期。 我在這里看到了示例,但沒有找到我想要的東西。 因此,我的示例數據幀如下:

ID    date1    date2    date3
1   01/12/15 02/04/07 07/06/16
2   03/29/12 02/16/16 09/01/10
3   12/01/15 07/07/07 11/13/12

但是我想要得到的輸出可以給我:

ID    date1    date2    date3 max
1   01/12/15 02/04/07 07/06/16 07/06/16
2   03/29/12 02/16/16 09/01/10 02/16/16
3   12/01/15 07/07/07 11/13/12 12/01/15

我看到人們使用plyr和dplyr,但是我對那些軟件包不熟悉。 任何幫助表示贊賞!

編輯:我能夠運行@akrun給出的內容,但是遇到了空字段日期的問題。 我提供了一個示例,如下所示:

ID    date1    date2    date3
1   01/12/15 NA 07/06/16
2   NA 02/16/16 09/01/10
3   12/01/15 07/07/07 NA

因此,對於那些空白點,我仍然希望按以下方式轉換數據幀:

ID    date1    date2    date3 max
1   01/12/15 NA 07/06/16 07/06/16
2   NA 02/16/16 09/01/10 02/16/16
3   12/01/15 07/07/07 NA 12/01/15

我們可以轉換為Date類,然后使用max.col獲取列索引,與行索引cbind ,從'df1'中提取元素並創建'max'列。

df1$max <- df1[cbind(1:nrow(df1), max.col(sapply(df1[-1], as.Date, format = "%m/%d/%y"))+1)]
df1
#  ID    date1    date2    date3      max
#1  1 01/12/15 02/04/07 07/06/16 07/06/16
#2  2 03/29/12 02/16/16 09/01/10 02/16/16
#3  3 12/01/15 07/07/07 11/13/12 12/01/15

或者另一種選擇是applyMARGIN = 1

df1$max <- apply(df1[-1], 1, function(x) x[which.max(as.Date(x, "%m/%d/%y"))])

數據

df1 <- structure(list(ID = 1:3, date1 = c("01/12/15", "03/29/12", "12/01/15"
), date2 = c("02/04/07", "02/16/16", "07/07/07"), date3 = c("07/06/16", 
"09/01/10", "11/13/12")), .Names = c("ID", "date1", "date2", 
"date3"), class = "data.frame", row.names = c("1", "2", "3"))

轉換為Date對象后使用pmax

dat[-1] <- lapply(dat[-1], as.Date, format="%m/%d/%y")
dat$max <- do.call(pmax, dat[-1])

#  ID      date1      date2      date3        max
#1  1 2015-01-12 2007-02-04 2016-07-06 2016-07-06
#2  2 2012-03-29 2016-02-16 2010-09-01 2016-02-16
#3  3 2015-12-01 2007-07-07 2012-11-13 2015-12-01

dat用作:

dat <- structure(list(ID = 1:3, date1 = structure(1:3, .Label = c("01/12/15", 
"03/29/12", "12/01/15"), class = "factor"), date2 = structure(1:3, .Label = c("02/04/07", 
"02/16/16", "07/07/07"), class = "factor"), date3 = structure(1:3, .Label = c("07/06/16", 
"09/01/10", "11/13/12"), class = "factor")), .Names = c("ID", 
"date1", "date2", "date3"), class = "data.frame", row.names = c("1", 
"2", "3"))

如果您對使用SQL感到更自在,則sqldf庫為您提供了獲取最后日期的另一種方法:

data1<-data.frame(id=c("1","2","3"),
                  date1=as.Date(c("01/12/15","03/29/12","12/01/15"),"%m/%d/%y"),
                  date2=as.Date(c("02/04/07","02/16/16","07/07/07"),"%m/%d/%y"),
                  date3=as.Date(c("07/06/16","09/01/10","11/13/12"),"%m/%d/%y"))


library(sqldf)
data2 = sqldf("SELECT id,
              max(date1,date2,date3) as 'max__Date'
              FROM data1", method = "name__class")  

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM