I am attempting to apply a custom function that calls components of that dataframe to do a calculation. I have made a trivial example below because my actual problem is very hard to make a reproducible example. In the below example I want to have the first two columns be added together to create a third column which is the sum of them. Below is an example I found online that gets close to what I want:
celebrities=data.frame(name=c("Andrew","matt","Dany","Philip","John","bing","Monica"),
age=c(28,23,49,29,38,23,29),
income=c(25.2,10.5,11,21.9,44,11.5,45))
f=function(x,output){
name=x[1]
income=x[3]
cat(name,income,"\n")
}
apply(celebrities,1,f)
But when I try to take it and apply mathematical function it doesn't work:
f2=function(x,output){
age=x[2]
income=x[3]
sum(age,income)
}
apply(celebrities,1,f2)
In essence what I need is for apply to take a dataset, go through every row of that dataset using the values in that row as inputs into the function and add a third column to the dataset with the results of the function. Please let me know how I can clarify this question if needed. I have referred to the questions below, but they don't seem to work for me.
Apply a function to every row of a matrix or a data frame
How to assign new values from lapply to new column in dataframes in list
Call apply-like function on each row of dataframe with multiple arguments from each row
For the particular task requested it could be
celebrities$newcol <- with(celebrities, age + income)
The +
function is inherently vectorized. Using apply
with sum
is inefficient. Using apply
could have been greatly simplified by omitting the first column because that would avoid the coercion to a character matrix caused by the first column.
celebrities$newcol <- apply(celebrities[-1], function(x) sum(x) )
That way you would avoid coercing the vectors to "character" and then needing to coerce back the formerly-numeric columns to numeric
. Using sum
inside apply does get around the fact that sum is not vectorized, but it's an example of inefficient R coding.
You get automatic vectorization if the "inner" algorithm can be constructed completely from vectorized functions: the Math and Ops groups being the usual components. See ?Ops
. Otherwise, you may need to use mapply
or Vectorize
.
Taking hints from @r2evans and @user2738526 I have made the modification to your function. Explicitly convert numbers to numeric. The below code snippet works for your case:
f2=function(x,output){
age=as.numeric(x[2])
income=as.numeric(x[3])
sum(age,income)
}
apply(celebrities,1,f2)
[1] 53.2 33.5 60.0 50.9 82.0 34.5 74.0
Give this a try:
library(dplyr)
celebrities=data.frame(name=c("Andrew","matt","Dany","Philip","John","bing","Monica"),
age=c(28,23,49,29,38,23,29),
income=c(25.2,10.5,11,21.9,44,11.5,45))
celebrities %>%
rowwise %>%
mutate(age_plus_income = sum(age, income))
(Obviously, for summing two columns, you'd be better off using mutate(celebrities, age_plus_income = age + income)
, but I assume your real example uses a more complicated function.)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.