简体   繁体   English

使用dplyr应用R数据帧中几列的功能

[英]Using dplyr to apply a function of several columns of an R data frame

Using dplyr's “verbs,” how can I apply a (general) function to a column of an R data frame, if that function depends on multiple columns of the data frame? 使用dplyr的“动词”,如果该函数依赖于数据帧的多个列,该如何将(通用)函数应用于R数据帧的列?

Here's a concrete example of the type of situation that I face. 这是我面临的情况类型的具体示例。 I have a data frame like this: 我有一个像这样的数据框:

df <- data.frame(
    d1 = c('2016-01-30 08:40:00 UTC', '2016-03-06 09:30:00 UTC'),
    d2 = c('2016-01-30 16:20:00 UTC', '2016-03-06 13:20:00 UTC'),
    tz = c('America/Los_Angeles', 'America/Chicago'), stringsAsFactors = FALSE)

I want to convert the UTC times to local times, to get a data frame like this: 我想将UTC时间转换为本地时间,以获得这样的数据帧:

                   d1                  d2                  tz
1 2016-01-30 00:40:00 2016-01-30 08:20:00 America/Los_Angeles
2 2016-03-06 03:30:00 2016-03-06 07:20:00     America/Chicago

To do this, I would like to apply the following function, which converts UTC time to local time using the lubridate library, to the date columns: 为此,我想将以下函数应用于日期列,该函数使用lubridate库将UTC时间转换为本地时间:

getLocTime <- function(d, tz) {
    as.character(with_tz(ymd_hms(d), tz))
}

Using dplyr, it seems that the transformation 使用dplyr,看来转型

df %>% mutate(d1 = getLocTime(d1, tz), d2 = getLocTime(d2, tz))

should do the trick. 应该可以。 However, it fails with the complaint Error in eval(expr, envir, enclos): invalid 'tz' value . 但是,它失败并显示Error in eval(expr, envir, enclos): invalid 'tz' value

The only way I've managed to do the conversion to local time is with the rather ungainly assignment 我设法将时间转换为本地时间的唯一方法是分配费力

df[c('d1', 'd2')] <- lapply(c('d1', 'd2'),
                            function(x) unlist(Map(getLocTime, df[[x]], df$tz)))

Is there in fact a natural way to perform this transformation using dplyr idioms? 实际上,是否存在使用dplyr习语执行此转换的自然方法?

As mentioned by lukeA, the problem occurs because getLocTime is not vectorized. 如lukeA所述,由于没有对getLocTime进行矢量化,因此会出现问题。 So either you vectorize the function as proposed, or you perform your function rowwise: 因此,您可以按照建议对函数进行矢量化处理,或者按行执行函数:

 df %>% rowwise() %>% mutate(d1 = getLocTime(d1, tz), d2 = getLocTime(d2, tz))

which makes sure that getLocTime is called with a single number and not a vector. 这样可以确保使用单个数字而不是向量调用getLocTime I leave it up to you to determine which approach is faster. 我由您决定哪种方法更快。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 R 中的 For 循环将相同的 function 应用于数据帧中的多个变量 - Using For loops in R to apply same function to several variables in a data frame 在R中,按组(dplyr)在几列上应用复杂函数(不是基础) - In R, apply a complicated function (not base) on several columns by group (dplyr) dplyr的删除功能? 计算R中数据框中几列的平均值 - dplyr's removed function? Calculating a mean for several columns in the data frame in R 使用dplyr的向量在数据帧上的几列乘积 - Product of several columns on a data frame by a vector using dplyr 使用dplyr :: mutate将函数的参数化应用于单个数据框列,然后将结果保存到新列? - Using dplyr::mutate to apply parameterizations of a function to a single data frame column, then save the results to new columns? 如何使用dplyr将函数应用于多个输入的数据框并使用输出创建列? - How to apply a function to a data frame for multiple inputs and create columns with the outputs using dplyr? R在数据框列上应用功能 - R Apply function on data frame columns R:如何应用为多列输出数据帧的函数(使用dplyr)? - R: How to apply a function that outputs a dataframe for multiple columns (using dplyr)? 使用 dplyr 将 function 应用于 data.frame 中的一行 - Apply function to a row in a data.frame using dplyr 如何在R中使用dplyr将数据框中的行与多列配对? - How to pair rows in a data frame with many columns using dplyr in R?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM