简体   繁体   English

R sapply vs apply vs lapply + as.data.frame

[英]R sapply vs apply vs lapply + as.data.frame

I'm working with some Date columns and trying to cleanse for obviously incorrect dates. 我正在使用某些“ Date列,并尝试清除明显不正确的日期。 I've written a function using the safe.ifelse function mentioned here . 我已经使用这里提到的safe.ifelse函数编写了一个函数。

Here's my toy data set: 这是我的玩具数据集:

df1 <- data.frame(id = 1:25
    , month1 = seq(as.Date('2012-01-01'), as.Date('2014-01-01'), by = 'month'  )
    , month2 = seq(as.Date('2012-01-01'), as.Date('2014-01-01'), by = 'month'  )
    , month3 = seq(as.Date('2012-01-01'), as.Date('2014-01-01'), by = 'month'  )
    , letter1 = letters[1:25]
    )

This works fine for a single column: 这对于单列工作正常:

df1$month1 <- safe.ifelse(df1$month1 > as.Date('2013-10-01'), as.Date('2013-10-01'), df1$month1)

As I have multiple columns I'd like to use a function and apply to take care of all Date columns at once: 由于我有多个列,因此我想使用一个函数并申请同时处理所有Date列:

capDate <- function(x){
today1 <- Sys.Date()
    safe.ifelse <- function(cond, yes, no){ class.y <- class(yes)
                                  X <- ifelse(cond,yes,no)
                                  class(X) <-class.y; return(X)}

    x <- safe.ifelse(as.Date(x) > as.Date(today1), as.Date(today1), as.Date(x))
 }

However when I try to use sapply() 但是当我尝试使用sapply()

df1[,dateCols1] <- sapply(df1[,dateCols1], capDate)

or apply() apply()

df1[,dateCols1] <- apply(df1[,dateCols1],2, capDate))

the Date columns lose their Date formatting. Date列将丢失其Date格式。 The only way I've found to get around this is by using lapply() and then converting back to a data.frame() . 我发现解决此问题的唯一方法是使用lapply() ,然后再转换回data.frame() Can anyone explain this? 谁能解释一下?

df1[,dateCols1] <- as.data.frame(lapply(df1[,dateCols1], capDate))

Both sapply and apply convert the result to matrices. sapplyapply将结果转换为矩阵。 as.data.frame(lapply(...)) is a safe way to loop over data frame columns. as.data.frame(lapply(...))是一种遍历数据帧列的安全方法。

as.data.frame(
  lapply(
    df1, 
    function(column) 
    {
      if(inherits(column, "Date")) 
      {
        pmin(column, Sys.Date())
      } else column
    }
  )
)

It's a little cleaner with ddply from plyr . ddplyplyr有点清洁。

library(plyr)
ddply(
  df1, 
  .(id), 
  colwise(
    function(column) 
    {
      if(inherits(column, "Date")) 
      { 
        pmin(column, Sys.Date()) 
      } else column
    }
  )
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM