I have this data.frame
:
id | amount1 | amount2 | day1 | day2
---------------------------------------------
A | 10 | 32 | 0 | 34
B | 54 | 44 | 8 | 43
C | 45 | 66 | 16 | 99
df <- data.frame(id=c('A','B','C'), amount1=c(10,54,45), amount2=c(32,44,66), day1=c(0,8,16), day2=c(34,43,99))
on which I would like to apply
a function
df$res <- apply(df, 1, myfunc)
where
myfunc <- function(x,y) sum(x) * mean(y)
only I'd like to pass the column variables as argument to the function, so that it basically should read
apply(df, 1, myfunc, c(amount1, amount2), c(day1, day2))
for the first row this is
myfunc(c(10,32),c(0,34))
# [1] 714
Can this be done?
The data.table
solution.
require(data.table)
dt <- data.table(df) # don't depend on `id` column as it may not be unique
# instead use 1:nrow(dt) in `by` argument
dt[, res := myfunc(c(amount1,amount2), c(day1, day2)), by=1:nrow(dt)]
> dt
# id amount1 amount2 day1 day2 res
# 1: A 10 32 0 34 714.0
# 2: B 54 44 8 43 2499.0
# 3: C 45 66 16 99 6382.5
When you have a lot of days
columns that you'd want to take the mean
of and multiply with the sum
of amount1
and amount2
, then I'd do it in this manner, without using myfunc
. But it should be straightforward to implement one if you REALLY need a function.
# dummy example
set.seed(45)
df <- data.frame(matrix(sample(1:100, 200, replace=T), ncol=10))
names(df) <- c(paste0("amount", 1:2), paste0("day", 1:8))
df$idx <- 1:nrow(df) # idx column for uniqueness
# create a data.table
require(data.table)
calc_res <- function(df) {
dt <- data.table(df)
# first get the mean
id1 <- setdiff(names(dt), grep("day", names(dt), value=TRUE))
dt[, res := rowMeans(.SD), by=id1]
# now product of sum(amounts) and current res
id2 <- setdiff(names(dt), names(dt)[1:2])
dt[, res := sum(.SD) * res, by=id2]
}
dt.fin <- calc_res(df)
Like this:
df$res <- apply(df, 1, function(x) myfunc(as.numeric(x[c("amount1", "amount2")]),
as.numeric(x[c("day1", "day2")])))
but consider plyr::adply
as an alternative:
library(plyr)
adply(df, 1, transform, res = myfunc(c(amount1, amount2), c(day1, day2)))
# id amount1 amount2 day1 day2 res
# 1 A 10 32 0 34 714.0
# 2 B 54 44 8 43 2499.0
# 3 C 45 66 16 99 6382.5
This works for your example. Perhaps the same technique can be used for the real problem:
> apply(df[-1], 1, function(x) myfunc(x[1:2], x[3:4]))
## [1] 714.0 2499.0 6382.5
As flodel indicates, it is best to use the names for one of the subsetting operations, to ensure that only these columns are used for apply. A subset is necessary to prevent the vector passed by apply
from being converted to character, and specifying the columns explicitly means that additional columns in the data frame will not cause this problem.
apply(df[c("amount1", "amount2", "day1", "day2")], 1,
function(x) myfunc(x[1:2], x[3:4])
)
In practice, I would be more likely to code something like this:
amount <- c("amount1", "amount2")
day <- c("day1", "day2")
df$res <- apply(df[c(amount, day)], 1, function(x) myfunc(x[amount], x[day]))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.