简体   繁体   English

如何按特定的列值拆分数据帧,然后将函数应用于数据集中的列?

[英]How do I split a data frame by a specific column value, and then apply functions to columns within the data set?

I have a data frame with 3 columns describing accounts: 我有一个包含3列描述帐户的数据框:

Age, Users, and Cost 年龄,使用者和费用

The Age column ranges from 1-20 and what I want to do is to calculate the average Cost by Age and divide that by Average Users by Age. “年龄”列的范围是1到20,我想做的是按年龄计算平均费用,然后按年龄划分“平均用户数”。

So for example, What is the average number of Users who are all Age 1 and what is the average Cost of accounts age 1. 因此,例如,年龄均为1的平均用户数是多少,年龄为1的平均帐户成本是多少。

The data frame is huge and I prefer not to just type in df = data[data$age_month == 1,] and then applying means to the columns 1 by 1. 数据框很大,我不希望仅输入df = data [data $ age_month == 1],然后将均值乘以1到列。

Age  Users   Cost
1     2       5
2     15      7
2     124     10
2     43      100
3     232     21212
4     234     21212 
4     12      10000 
4     10      3
5     11      89
6     4       11
6     8       12
6     10      15

So I would want Mean of Cost column where Age = 1 divided by Mean of Users Column where Age = 1 and that for all Ages 因此,我希望将“年龄= 1”的“均值成本”列除以“年龄= 1”以及所有年龄段的“用户均值”列

Thanks in advance, 提前致谢,

Try: 尝试:

CostbyAge <- with(dat, ave(Cost, Age, FUN=mean) )
UsersbyAge <- with(dat, ave(Users, Age, FUN=mean))
CostbyAge/UsersbyAge
# [1]   2.5000000   0.6428571   0.6428571   0.6428571  91.4310345 121.9335938
# [7] 121.9335938 121.9335938   8.0909091   1.7272727   1.7272727   1.7272727

Here's a way using doBy::summaryBy . 这是使用doBy::summaryBy Assume dat is your sample data 假设dat是您的样本数据

> library(doBy)
> ( s <- summaryBy(Users+Cost~Age, data = dat) )
#   Age Users.mean   Cost.mean
# 1   1   2.000000     5.00000
# 2   2  60.666667    39.00000
# 3   3 232.000000 21212.00000
# 4   4  85.333333 10405.00000
# 5   5  11.000000    89.00000
# 6   6   7.333333    12.66667
> s$Cost.mean / s$Users.mean
# [1]   2.5000000   0.6428571  91.4310345 121.9335938   8.0909091   1.7272727

Here's a way to do it with dplyr : 这是使用dplyr的一种方法:

library(dplyr)

dat %>%
  group_by(Age) %>%
  summarize(count=length(Age),
            users_mean=round(mean(Users),2),
            cost_mean=round(mean(Cost),2),
            cost_per_user=round(cost_mean/users_mean,2))

  Age count users_mean cost_mean cost_per_user
1   1     1       2.00      5.00          2.50
2   2     3      60.67     39.00          0.64
3   3     1     232.00  21212.00         91.43
4   4     3      85.33  10405.00        121.94
5   5     1      11.00     89.00          8.09
6   6     3       7.33     12.67          1.73

data.table solution data.table解决方案

library(data.table)
setDT(dat)[, list(User_mean = mean(Users), 
                  Mean_Cost = mean(Cost), 
                  Cost_per_User = mean(Cost)/mean(Users)), by = Age]

Base R, using aggregate 基本R,使用aggregate

aggdat <- aggregate(cbind(Users, Cost) ~ Age, dat,  mean)
aggdat$Cost_per_User <- aggdat$Cost/aggdat$Users

Since no one mention it, you can use also from base R split in combination with lapply : 由于没有人提及,因此您还可以将base R splitlapply结合使用:

> lapply(split(dat,dat$Age),colMeans)

To output the result as a dataframe and not a list will require this additional step: 要将结果输出为数据框而不是列表,将需要以下附加步骤:

> do.call(rbind,lapply(split(dat,dat$Age),colMeans))
  Age      Users        Cost
1   1   2.000000     5.00000
2   2  60.666667    39.00000
3   3 232.000000 21212.00000
4   4  85.333333 10405.00000
5   5  11.000000    89.00000
6   6   7.333333    12.66667

split take your dataframe and creates a list of dataframes split accordingly, then with lapply you do your operation on all sub-dataframe at once (here to compute the mean you can use simply colMeans ). split取数据框并创建一个相应的数据lapply列表,然后通过lapply对所有子数据lapply进行操作(此处计算平均值,您可以简单地使用colMeans )。 Then do.call(rbind,...) take your result list and turn it back into a dataframe. 然后do.call(rbind,...)获取结果列表,并将其返回到数据框。

The last step to get cost per user is the same as in the other solutions. 获得每位用户成本的最后一步与其他解决方案相同。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM