如何傳遞列變量來應用函數？

Question

我有這個data.frame ：

  id  |  amount1  | amount2  |  day1  |  day2
 ---------------------------------------------
  A   |    10     |    32    |   0    |   34
  B   |    54     |    44    |   8    |   43
  C   |    45     |    66    |   16   |   99    

df <- data.frame(id=c('A','B','C'), amount1=c(10,54,45), amount2=c(32,44,66),  day1=c(0,8,16), day2=c(34,43,99))

我想apply一個功能

df$res <-  apply(df, 1, myfunc)

哪里

myfunc <- function(x,y) sum(x) * mean(y)

只有我想將列變量作為參數傳遞給函數，所以它基本上應該讀取

 apply(df, 1, myfunc, c(amount1, amount2), c(day1, day2))

對於第一行，這是

myfunc(c(10,32),c(0,34))
# [1] 714

可以這樣做嗎？

Answer 1

data.table解決方案。

require(data.table)
dt <- data.table(df) # don't depend on `id` column as it may not be unique
# instead use 1:nrow(dt) in `by` argument
dt[, res := myfunc(c(amount1,amount2), c(day1, day2)), by=1:nrow(dt)]
> dt
#    id amount1 amount2 day1 day2    res
# 1:  A      10      32    0   34  714.0
# 2:  B      54      44    8   43 2499.0
# 3:  C      45      66   16   99 6382.5

當你有很多的days ，你會想采取的列mean ，並與繁殖sum的amount1和amount2 ，然后我會做它以這種方式，不使用myfunc 。 但是如果真的需要一個函數，那么實現一個應該是直截了當的。

# dummy example
set.seed(45)
df <- data.frame(matrix(sample(1:100, 200, replace=T), ncol=10))
names(df) <- c(paste0("amount", 1:2), paste0("day", 1:8))
df$idx <- 1:nrow(df) # idx column for uniqueness

# create a data.table
require(data.table)
calc_res <- function(df) {
    dt <- data.table(df)
    # first get the mean
    id1 <- setdiff(names(dt), grep("day", names(dt), value=TRUE))
    dt[, res := rowMeans(.SD), by=id1]
    # now product of sum(amounts) and current res
    id2 <- setdiff(names(dt), names(dt)[1:2])
    dt[, res := sum(.SD) * res, by=id2]
}
dt.fin <- calc_res(df)

Answer 2

像這樣：

df$res <- apply(df, 1, function(x) myfunc(as.numeric(x[c("amount1", "amount2")]),
                                          as.numeric(x[c("day1", "day2")])))

但請考慮將plyr::adply作為替代方案：

library(plyr)
adply(df, 1, transform, res = myfunc(c(amount1, amount2), c(day1, day2)))
#   id amount1 amount2 day1 day2    res
# 1  A      10      32    0   34  714.0
# 2  B      54      44    8   43 2499.0
# 3  C      45      66   16   99 6382.5

Answer 3

這適用於您的示例。 也許同樣的技術可以用於真正的問題：

> apply(df[-1], 1, function(x) myfunc(x[1:2], x[3:4]))
## [1]  714.0 2499.0 6382.5

正如flodel所指出的，最好使用其中一個子集化操作的名稱，以確保只使用這些列進行應用。 必須使用子集來防止apply傳遞的向量轉換為字符，並且明確指定列意味着數據框中的其他列不會導致此問題。

apply(df[c("amount1", "amount2", "day1", "day2")], 1, 
      function(x) myfunc(x[1:2], x[3:4])
     )

在實踐中，我更有可能編寫類似這樣的代碼：

amount <- c("amount1", "amount2")
day    <- c("day1", "day2")

df$res <- apply(df[c(amount, day)], 1, function(x) myfunc(x[amount], x[day]))

如何傳遞列變量來應用函數？

問題描述

3 個解決方案

解決方案1
4 2013-01-21 15:36:08

解決方案2
3 已采納 2013-01-21 15:23:47

解決方案3
1 2013-01-21 15:13:16

如何傳遞列變量來應用函數？

問題描述

3 個解決方案

解決方案1 4 2013-01-21 15:36:08

解決方案2 3 已采納 2013-01-21 15:23:47

解決方案3 1 2013-01-21 15:13:16

解決方案1
4 2013-01-21 15:36:08

解決方案2
3 已采納 2013-01-21 15:23:47

解決方案3
1 2013-01-21 15:13:16