[英]Using dplyr to apply a function of several columns of an R data frame
Using dplyr's “verbs,” how can I apply a (general) function to a column of an R data frame, if that function depends on multiple columns of the data frame? 使用dplyr的“动词”,如果该函数依赖于数据帧的多个列,该如何将(通用)函数应用于R数据帧的列?
Here's a concrete example of the type of situation that I face. 这是我面临的情况类型的具体示例。 I have a data frame like this: 我有一个像这样的数据框:
df <- data.frame(
d1 = c('2016-01-30 08:40:00 UTC', '2016-03-06 09:30:00 UTC'),
d2 = c('2016-01-30 16:20:00 UTC', '2016-03-06 13:20:00 UTC'),
tz = c('America/Los_Angeles', 'America/Chicago'), stringsAsFactors = FALSE)
I want to convert the UTC times to local times, to get a data frame like this: 我想将UTC时间转换为本地时间,以获得这样的数据帧:
d1 d2 tz
1 2016-01-30 00:40:00 2016-01-30 08:20:00 America/Los_Angeles
2 2016-03-06 03:30:00 2016-03-06 07:20:00 America/Chicago
To do this, I would like to apply the following function, which converts UTC time to local time using the lubridate library, to the date columns: 为此,我想将以下函数应用于日期列,该函数使用lubridate库将UTC时间转换为本地时间:
getLocTime <- function(d, tz) {
as.character(with_tz(ymd_hms(d), tz))
}
Using dplyr, it seems that the transformation 使用dplyr,看来转型
df %>% mutate(d1 = getLocTime(d1, tz), d2 = getLocTime(d2, tz))
should do the trick. 应该可以。 However, it fails with the complaint Error in eval(expr, envir, enclos): invalid 'tz' value
. 但是,它失败并显示Error in eval(expr, envir, enclos): invalid 'tz' value
。
The only way I've managed to do the conversion to local time is with the rather ungainly assignment 我设法将时间转换为本地时间的唯一方法是分配费力
df[c('d1', 'd2')] <- lapply(c('d1', 'd2'),
function(x) unlist(Map(getLocTime, df[[x]], df$tz)))
Is there in fact a natural way to perform this transformation using dplyr idioms? 实际上,是否存在使用dplyr习语执行此转换的自然方法?
As mentioned by lukeA, the problem occurs because getLocTime
is not vectorized. 如lukeA所述,由于没有对getLocTime
进行矢量化,因此会出现问题。 So either you vectorize the function as proposed, or you perform your function rowwise: 因此,您可以按照建议对函数进行矢量化处理,或者按行执行函数:
df %>% rowwise() %>% mutate(d1 = getLocTime(d1, tz), d2 = getLocTime(d2, tz))
which makes sure that getLocTime
is called with a single number and not a vector. 这样可以确保使用单个数字而不是向量调用getLocTime
。 I leave it up to you to determine which approach is faster. 我由您决定哪种方法更快。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.