I have a data frame with 3 columns describing accounts:
Age, Users, and Cost
The Age column ranges from 1-20 and what I want to do is to calculate the average Cost by Age and divide that by Average Users by Age.
So for example, What is the average number of Users who are all Age 1 and what is the average Cost of accounts age 1.
The data frame is huge and I prefer not to just type in df = data[data$age_month == 1,] and then applying means to the columns 1 by 1.
Age Users Cost
1 2 5
2 15 7
2 124 10
2 43 100
3 232 21212
4 234 21212
4 12 10000
4 10 3
5 11 89
6 4 11
6 8 12
6 10 15
So I would want Mean of Cost column where Age = 1 divided by Mean of Users Column where Age = 1 and that for all Ages
Thanks in advance,
Try:
CostbyAge <- with(dat, ave(Cost, Age, FUN=mean) )
UsersbyAge <- with(dat, ave(Users, Age, FUN=mean))
CostbyAge/UsersbyAge
# [1] 2.5000000 0.6428571 0.6428571 0.6428571 91.4310345 121.9335938
# [7] 121.9335938 121.9335938 8.0909091 1.7272727 1.7272727 1.7272727
Here's a way using doBy::summaryBy
. Assume dat
is your sample data
> library(doBy)
> ( s <- summaryBy(Users+Cost~Age, data = dat) )
# Age Users.mean Cost.mean
# 1 1 2.000000 5.00000
# 2 2 60.666667 39.00000
# 3 3 232.000000 21212.00000
# 4 4 85.333333 10405.00000
# 5 5 11.000000 89.00000
# 6 6 7.333333 12.66667
> s$Cost.mean / s$Users.mean
# [1] 2.5000000 0.6428571 91.4310345 121.9335938 8.0909091 1.7272727
Here's a way to do it with dplyr
:
library(dplyr)
dat %>%
group_by(Age) %>%
summarize(count=length(Age),
users_mean=round(mean(Users),2),
cost_mean=round(mean(Cost),2),
cost_per_user=round(cost_mean/users_mean,2))
Age count users_mean cost_mean cost_per_user
1 1 1 2.00 5.00 2.50
2 2 3 60.67 39.00 0.64
3 3 1 232.00 21212.00 91.43
4 4 3 85.33 10405.00 121.94
5 5 1 11.00 89.00 8.09
6 6 3 7.33 12.67 1.73
data.table
solution
library(data.table)
setDT(dat)[, list(User_mean = mean(Users),
Mean_Cost = mean(Cost),
Cost_per_User = mean(Cost)/mean(Users)), by = Age]
Base R, using aggregate
aggdat <- aggregate(cbind(Users, Cost) ~ Age, dat, mean)
aggdat$Cost_per_User <- aggdat$Cost/aggdat$Users
Since no one mention it, you can use also from base R split
in combination with lapply
:
> lapply(split(dat,dat$Age),colMeans)
To output the result as a dataframe and not a list will require this additional step:
> do.call(rbind,lapply(split(dat,dat$Age),colMeans))
Age Users Cost
1 1 2.000000 5.00000
2 2 60.666667 39.00000
3 3 232.000000 21212.00000
4 4 85.333333 10405.00000
5 5 11.000000 89.00000
6 6 7.333333 12.66667
split
take your dataframe and creates a list of dataframes split accordingly, then with lapply
you do your operation on all sub-dataframe at once (here to compute the mean you can use simply colMeans
). Then do.call(rbind,...)
take your result list and turn it back into a dataframe.
The last step to get cost per user is the same as in the other solutions.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.