使用Apply或Vectorize将自定义函数应用于数据框

Question

I am attempting to apply a custom function that calls components of that dataframe to do a calculation. 我试图应用一个自定义函数，该函数调用该数据框的组件进行计算。 I have made a trivial example below because my actual problem is very hard to make a reproducible example. 我在下面做了一个琐碎的例子，因为我的实际问题很难做出可复制的例子。 In the below example I want to have the first two columns be added together to create a third column which is the sum of them. 在下面的示例中，我希望将前两列加在一起以创建第三列，即它们的总和。 Below is an example I found online that gets close to what I want: 下面是我在网上找到的一个接近我想要的示例：

celebrities=data.frame(name=c("Andrew","matt","Dany","Philip","John","bing","Monica"),
                       age=c(28,23,49,29,38,23,29),
                       income=c(25.2,10.5,11,21.9,44,11.5,45))
f=function(x,output){
  name=x[1]
  income=x[3]
  cat(name,income,"\n")
}
apply(celebrities,1,f)

But when I try to take it and apply mathematical function it doesn't work: 但是，当我尝试使用它并应用数学函数时，它不起作用：

  f2=function(x,output){
  age=x[2]
  income=x[3]
  sum(age,income)
}
apply(celebrities,1,f2)

In essence what I need is for apply to take a dataset, go through every row of that dataset using the values in that row as inputs into the function and add a third column to the dataset with the results of the function. 本质上，我需要申请以获取数据集，使用该行中的值作为函数的输入遍历该数据集的每一行，并向该数据集添加第三列以及函数的结果。 Please let me know how I can clarify this question if needed. 请让我知道如何在需要时澄清这个问题。 I have referred to the questions below, but they don't seem to work for me. 我已经提到了以下问题，但它们似乎对我没有用。

Apply a function to every row of a matrix or a data frame 将函数应用于矩阵或数据框的每一行

How to assign new values from lapply to new column in dataframes in list 如何将新值从lapply分配给列表中数据框的新列

Call apply-like function on each row of dataframe with multiple arguments from each row 在数据框的每一行上调用类似应用的函数，每一行具有多个参数

Answer 1

For the particular task requested it could be 对于请求的特定任务，可能是

celebrities$newcol <- with(celebrities, age + income)

The + function is inherently vectorized. +函数本质上是矢量化的。 Using apply with sum is inefficient. 使用apply与sum是低效的。 Using apply could have been greatly simplified by omitting the first column because that would avoid the coercion to a character matrix caused by the first column. 通过省略第一列可以大大简化了apply使用，因为这样可以避免强制转换为由第一列引起的字符矩阵。

 celebrities$newcol <- apply(celebrities[-1], function(x) sum(x) )

That way you would avoid coercing the vectors to "character" and then needing to coerce back the formerly-numeric columns to numeric . 这样，您就可以避免将向量强制转换为“字符”，然后需要将之前的数字列强制转换回numeric 。 Using sum inside apply does get around the fact that sum is not vectorized, but it's an example of inefficient R coding. 使用sum内适用不回避的事实，和没有矢量得到的，但它的效率低下[R编码的一个例子。

You get automatic vectorization if the "inner" algorithm can be constructed completely from vectorized functions: the Math and Ops groups being the usual components. 如果“内部”算法可以完全由矢量化函数构造而成，则可以实现自动矢量化：Math和Ops组是通常的组件。 See ?Ops . 请参阅?Ops Otherwise, you may need to use mapply or Vectorize . 否则，您可能需要使用mapply或Vectorize 。

Answer 2

Taking hints from @r2evans and @user2738526 I have made the modification to your function. 来自@ r2evans和@ user2738526的提示我已经对您的函数进行了修改。 Explicitly convert numbers to numeric. 将数字显式转换为数字。 The below code snippet works for your case: 以下代码段适用于您的情况：

f2=function(x,output){
  age=as.numeric(x[2])
  income=as.numeric(x[3])
  sum(age,income)
}
apply(celebrities,1,f2)

[1] 53.2 33.5 60.0 50.9 82.0 34.5 74.0

Answer 3

Give this a try: 试试看：

library(dplyr)
celebrities=data.frame(name=c("Andrew","matt","Dany","Philip","John","bing","Monica"),
                       age=c(28,23,49,29,38,23,29),
                       income=c(25.2,10.5,11,21.9,44,11.5,45)) 

celebrities %>% 
  rowwise %>% 
  mutate(age_plus_income = sum(age, income))

(Obviously, for summing two columns, you'd be better off using mutate(celebrities, age_plus_income = age + income) , but I assume your real example uses a more complicated function.) （很明显，对于两列的求和，最好使用mutate(celebrities, age_plus_income = age + income) ，但我认为您的实际示例使用的是更复杂的函数。）

使用Apply或Vectorize将自定义函数应用于数据框

问题描述

3 个解决方案

解决方案1
2 已采纳 2018-07-12 06:24:35

解决方案2
1 2018-07-12 06:19:37

解决方案3
1 2018-07-12 16:40:53

使用Apply或Vectorize将自定义函数应用于数据框

问题描述

3 个解决方案

解决方案1 2 已采纳 2018-07-12 06:24:35

解决方案2 1 2018-07-12 06:19:37

解决方案3 1 2018-07-12 16:40:53

解决方案1
2 已采纳 2018-07-12 06:24:35

解决方案2
1 2018-07-12 06:19:37

解决方案3
1 2018-07-12 16:40:53