[英]How to pass column variables to apply function?
I have this data.frame
: 我有这个
data.frame
:
id | amount1 | amount2 | day1 | day2
---------------------------------------------
A | 10 | 32 | 0 | 34
B | 54 | 44 | 8 | 43
C | 45 | 66 | 16 | 99
df <- data.frame(id=c('A','B','C'), amount1=c(10,54,45), amount2=c(32,44,66), day1=c(0,8,16), day2=c(34,43,99))
on which I would like to apply
a function 我想
apply
一个功能
df$res <- apply(df, 1, myfunc)
where 哪里
myfunc <- function(x,y) sum(x) * mean(y)
only I'd like to pass the column variables as argument to the function, so that it basically should read 只有我想将列变量作为参数传递给函数,所以它基本上应该读取
apply(df, 1, myfunc, c(amount1, amount2), c(day1, day2))
for the first row this is 对于第一行,这是
myfunc(c(10,32),c(0,34))
# [1] 714
Can this be done? 可以这样做吗?
The data.table
solution. data.table
解决方案。
require(data.table)
dt <- data.table(df) # don't depend on `id` column as it may not be unique
# instead use 1:nrow(dt) in `by` argument
dt[, res := myfunc(c(amount1,amount2), c(day1, day2)), by=1:nrow(dt)]
> dt
# id amount1 amount2 day1 day2 res
# 1: A 10 32 0 34 714.0
# 2: B 54 44 8 43 2499.0
# 3: C 45 66 16 99 6382.5
When you have a lot of days
columns that you'd want to take the mean
of and multiply with the sum
of amount1
and amount2
, then I'd do it in this manner, without using myfunc
. 当你有很多的
days
,你会想采取的列mean
,并与繁殖sum
的amount1
和amount2
,然后我会做它以这种方式,不使用myfunc
。 But it should be straightforward to implement one if you REALLY need a function. 但是如果真的需要一个函数,那么实现一个应该是直截了当的。
# dummy example
set.seed(45)
df <- data.frame(matrix(sample(1:100, 200, replace=T), ncol=10))
names(df) <- c(paste0("amount", 1:2), paste0("day", 1:8))
df$idx <- 1:nrow(df) # idx column for uniqueness
# create a data.table
require(data.table)
calc_res <- function(df) {
dt <- data.table(df)
# first get the mean
id1 <- setdiff(names(dt), grep("day", names(dt), value=TRUE))
dt[, res := rowMeans(.SD), by=id1]
# now product of sum(amounts) and current res
id2 <- setdiff(names(dt), names(dt)[1:2])
dt[, res := sum(.SD) * res, by=id2]
}
dt.fin <- calc_res(df)
Like this: 像这样:
df$res <- apply(df, 1, function(x) myfunc(as.numeric(x[c("amount1", "amount2")]),
as.numeric(x[c("day1", "day2")])))
but consider plyr::adply
as an alternative: 但请考虑将
plyr::adply
作为替代方案:
library(plyr)
adply(df, 1, transform, res = myfunc(c(amount1, amount2), c(day1, day2)))
# id amount1 amount2 day1 day2 res
# 1 A 10 32 0 34 714.0
# 2 B 54 44 8 43 2499.0
# 3 C 45 66 16 99 6382.5
This works for your example. 这适用于您的示例。 Perhaps the same technique can be used for the real problem:
也许同样的技术可以用于真正的问题:
> apply(df[-1], 1, function(x) myfunc(x[1:2], x[3:4]))
## [1] 714.0 2499.0 6382.5
As flodel indicates, it is best to use the names for one of the subsetting operations, to ensure that only these columns are used for apply. 正如flodel所指出的,最好使用其中一个子集化操作的名称,以确保只使用这些列进行应用。 A subset is necessary to prevent the vector passed by
apply
from being converted to character, and specifying the columns explicitly means that additional columns in the data frame will not cause this problem. 必须使用子集来防止
apply
传递的向量转换为字符,并且明确指定列意味着数据框中的其他列不会导致此问题。
apply(df[c("amount1", "amount2", "day1", "day2")], 1,
function(x) myfunc(x[1:2], x[3:4])
)
In practice, I would be more likely to code something like this: 在实践中,我更有可能编写类似这样的代码:
amount <- c("amount1", "amount2")
day <- c("day1", "day2")
df$res <- apply(df[c(amount, day)], 1, function(x) myfunc(x[amount], x[day]))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.