如何将 function 应用于 dataframe 中的特定列并替换原始列？

Question

I have got a large dataframe containing medical data ( my.medical.data ).我有一个包含医疗数据 ( my.medical.data ) 的大型 dataframe。
A number of columns contain dates (eg hospital admission date), the names of each of these columns end in "_date".许多列包含日期（例如入院日期），每列的名称以“_date”结尾。
I would like to apply the lubridate::dmy() function to the columns that contain dates and overwrite my original dataframe with the output of this function. I would like to apply the lubridate::dmy() function to the columns that contain dates and overwrite my original dataframe with the output of this function.
It would be great to have a general solution that can be applied using any function, not just my dmy() example.拥有一个可以使用任何 function 应用的通用解决方案会很棒，而不仅仅是我的dmy()示例。

Essentially, I want to apply the following to all of my date columns:本质上，我想将以下内容应用于我的所有日期列：

my.medical.data$admission_date <- lubridate::dmy(my.medical.data$admission_date)
my.medical.data$operation_date <- lubridate::dmy(my.medical.data$operation_date)
etc.

I've tried this:我试过这个：

date.columns <- select(ICB, ends_with("_date"))
date.names <- names(date.columns)
date.columns <- transmute_at(my.medical.data, date.names, lubridate::dmy)

Now date.columns contains my date columns, in the "Date" format, rather than the original factors.现在date.columns包含我的日期列，采用“日期”格式，而不是原始因素。 Now I want to replace the date columns in my.medical.data with the new columns in the correct format.现在我想用正确格式的新列替换my.medical.data中的日期列。

my.medical.data.new <- full_join(x = my.medical.data, y = date.columns)

Now I get:现在我得到：

Error: cannot join a Date object with an object that is not a Date object错误：无法将日期 object 与不是日期 object 的 object 连接起来

I'm a bit of an R novice, but I suspect that there is an easier way to do this (eg process the original dataframe directly), or maybe a correct way to join / merge the two dataframes.我有点像 R 新手，但我怀疑有更简单的方法可以做到这一点（例如直接处理原始 dataframe），或者可能是加入/合并两个数据帧的正确方法。

Answer 1

As usual it's difficult to answer without an example dataset, but this should do the work:像往常一样，没有示例数据集很难回答，但这应该可以完成工作：

library(dplyr)

my.medical.data <- my.medical.data %>%
  mutate_at(vars(ends_with('_date')), lubridate::dmy)

This will mutate in place each variable that end with '_date', applying the function.这将改变以“_date”结尾的每个变量，应用 function。 It can also apply multiple functions.它还可以应用多种功能。 See ?mutate_at (which is also the help for mutate_if )请参阅?mutate_at （这也是mutate_if的帮助）

Answer 2

Several ways to do that.有几种方法可以做到这一点。

If you work with voluminous data, I think data.table is the best approach (will bring you flexibility, speed and memory efficiency)如果您处理大量数据，我认为data.table是最好的方法（将为您带来灵活性、速度和 memory 效率）

data.table data.table

You can use the := (update by reference operator) together with lapplỳ to apply lubridate::ymd to all columns defined in .SDcols dimension您可以使用:= （按引用更新运算符）与lapplỳ一起将lubridate::ymd应用于.SDcols维度中定义的所有列

library(data.table)
setDT(my.medical.data)

cols_to_change <- endsWith("_date", colnames(my.medical.date))

my.medical.data[, c(cols_to_change) := lapply(.SD, lubridate::ymd), .SDcols = cols_to_change]

base R底座 R

A standard lapply can also help.标准的lapply也可以提供帮助。 You could try something like that (I did not test it)你可以尝试类似的东西（我没有测试过）

my.medical.data[, cols_to_change] <- lapply(cols_to_change, function(d) lubridate::ymd(my.medical.data[,d]))

如何将 function 应用于 dataframe 中的特定列并替换原始列？

问题描述

2 个解决方案

解决方案1
2 2020-04-03 10:55:39

解决方案2
1 已采纳 2020-04-03 10:57:29

data.table data.table

base R底座 R

如何将 function 应用于 dataframe 中的特定列并替换原始列？

问题描述

2 个解决方案

解决方案1 2 2020-04-03 10:55:39

解决方案2 1 已采纳 2020-04-03 10:57:29

data.table data.table

base R底座 R

解决方案1
2 2020-04-03 10:55:39

解决方案2
1 已采纳 2020-04-03 10:57:29