简体   繁体   English

如何传递列变量来应用函数?

[英]How to pass column variables to apply function?

I have this data.frame : 我有这个data.frame

  id  |  amount1  | amount2  |  day1  |  day2
 ---------------------------------------------
  A   |    10     |    32    |   0    |   34
  B   |    54     |    44    |   8    |   43
  C   |    45     |    66    |   16   |   99    

df <- data.frame(id=c('A','B','C'), amount1=c(10,54,45), amount2=c(32,44,66),  day1=c(0,8,16), day2=c(34,43,99))

on which I would like to apply a function 我想apply一个功能

df$res <-  apply(df, 1, myfunc)

where 哪里

myfunc <- function(x,y) sum(x) * mean(y)

only I'd like to pass the column variables as argument to the function, so that it basically should read 只有我想将列变量作为参数传递给函数,所以它基本上应该读取

 apply(df, 1, myfunc, c(amount1, amount2), c(day1, day2))

for the first row this is 对于第一行,这是

myfunc(c(10,32),c(0,34))
# [1] 714

Can this be done? 可以这样做吗?

The data.table solution. data.table解决方案。

require(data.table)
dt <- data.table(df) # don't depend on `id` column as it may not be unique
# instead use 1:nrow(dt) in `by` argument
dt[, res := myfunc(c(amount1,amount2), c(day1, day2)), by=1:nrow(dt)]
> dt
#    id amount1 amount2 day1 day2    res
# 1:  A      10      32    0   34  714.0
# 2:  B      54      44    8   43 2499.0
# 3:  C      45      66   16   99 6382.5

When you have a lot of days columns that you'd want to take the mean of and multiply with the sum of amount1 and amount2 , then I'd do it in this manner, without using myfunc . 当你有很多的days ,你会想采取的列mean ,并与繁殖sumamount1amount2 ,然后我会做它以这种方式,不使用myfunc But it should be straightforward to implement one if you REALLY need a function. 但是如果真的需要一个函数,那么实现一个应该是直截了当的。

# dummy example
set.seed(45)
df <- data.frame(matrix(sample(1:100, 200, replace=T), ncol=10))
names(df) <- c(paste0("amount", 1:2), paste0("day", 1:8))
df$idx <- 1:nrow(df) # idx column for uniqueness

# create a data.table
require(data.table)
calc_res <- function(df) {
    dt <- data.table(df)
    # first get the mean
    id1 <- setdiff(names(dt), grep("day", names(dt), value=TRUE))
    dt[, res := rowMeans(.SD), by=id1]
    # now product of sum(amounts) and current res
    id2 <- setdiff(names(dt), names(dt)[1:2])
    dt[, res := sum(.SD) * res, by=id2]
}
dt.fin <- calc_res(df)

Like this: 像这样:

df$res <- apply(df, 1, function(x) myfunc(as.numeric(x[c("amount1", "amount2")]),
                                          as.numeric(x[c("day1", "day2")])))

but consider plyr::adply as an alternative: 但请考虑将plyr::adply作为替代方案:

library(plyr)
adply(df, 1, transform, res = myfunc(c(amount1, amount2), c(day1, day2)))
#   id amount1 amount2 day1 day2    res
# 1  A      10      32    0   34  714.0
# 2  B      54      44    8   43 2499.0
# 3  C      45      66   16   99 6382.5

This works for your example. 这适用于您的示例。 Perhaps the same technique can be used for the real problem: 也许同样的技术可以用于真正的问题:

> apply(df[-1], 1, function(x) myfunc(x[1:2], x[3:4]))
## [1]  714.0 2499.0 6382.5

As flodel indicates, it is best to use the names for one of the subsetting operations, to ensure that only these columns are used for apply. 正如flodel所指出的,最好使用其中一个子集化操作的名称,以确保只使用这些列进行应用。 A subset is necessary to prevent the vector passed by apply from being converted to character, and specifying the columns explicitly means that additional columns in the data frame will not cause this problem. 必须使用子集来防止apply传递的向量转换为字符,并且明确指定列意味着数据框中的其他列不会导致此问题。

apply(df[c("amount1", "amount2", "day1", "day2")], 1, 
      function(x) myfunc(x[1:2], x[3:4])
     )

In practice, I would be more likely to code something like this: 在实践中,我更有可能编写类似这样的代码:

amount <- c("amount1", "amount2")
day    <- c("day1", "day2")

df$res <- apply(df[c(amount, day)], 1, function(x) myfunc(x[amount], x[day]))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM