[英]r - Sum based on Distinct values on another Column
I'm looking for a tidyverse solution to sum a column based on unique values of an ID column, while still summing other columns based on all values.我正在寻找一种 tidyverse 解决方案来根据 ID 列的唯一值对列进行求和,同时仍根据所有值对其他列求和。
Example data:示例数据:
dat <- data.frame(
manager = c("Adam", "Adam", "Adam", "Bill", "Bill", "Charlie", "Dan"),
manager_age = c(30, 30, 30, 33, 33, 35, 35),
sales = c(4, 12, 7, 4, 2, 15, 10))
dat
manager manager_age sales
1 Adam 30 4
2 Adam 30 12
3 Adam 30 7
4 Bill 33 4
5 Bill 33 2
6 Charlie 35 15
7 Dan 35 10
I want to sum all values of sales but only sum one value per manager for manager_age .我想总结所有的销售价值,但每个经理只为manager_age总结一个价值。
Desired output:所需的 output:
unique_managers total_sales total_age
4 54 133
I'm most of the way there, but need help with the summed age:我大部分时间都在那里,但在总年龄方面需要帮助:
results <- dat %>%summarize(unique_managers = n_distinct(manager), total_sales = sum(sales))
results
Thanks in advance!提前致谢!
Edit: Updated example data to include two managers with same age.编辑:更新了示例数据以包括两个年龄相同的经理。
This should do it for you:这应该为你做:
library(dplyr)
results <- dat %>%
summarize(unique_managers = n_distinct(manager),
total_sales = sum(sales)) %>%
cbind(dat %>%
select(manager, manager_age) %>%
group_by(manager) %>%
unique() %>%
ungroup() %>%
summarize(total_age = sum(manager_age)))
Which gives us:这给了我们:
> results
unique_managers total_sales total_age
1 3 44 98
Edit:编辑:
If you have two managers with the same age:如果您有两个年龄相同的经理:
dat <- data.frame(
manager = c("Adam", "Adam", "Adam", "Bill", "Bill", "Charlie", "Dante"),
manager_age = c(30, 30, 30, 33, 33, 35, 30),
sales = c(4, 12, 7, 4, 2, 15, 14))
Gives us:给我们:
unique_managers total_sales total_age
1 4 58 128
Attention: in order to avoid the case that different person have the same age, the unique
operation on age should be within each personal group firstly.注意:为了避免不同人年龄相同的情况,年龄的
unique
操作应该首先在每个人的组内进行。
library(data.table)
dat <- data.frame(
manager = c("Adam", "Adam", "Adam", "Bill", "Bill", "Charlie"),
manager_age = c(30, 30, 30, 33, 33, 35),
sales = c(4, 12, 7, 4, 2, 15))
setDT(dat)[,.(managers=.NGRP,
sales = sum(sales),
age=unique(manager_age)),
by=manager][,.(unique_managers = unique(managers),
total_sales = sum(sales),
total_age = sum(age))]
#> unique_managers total_sales total_age
#> 1: 3 44 98
Created on 2021-05-04 by the reprex package (v2.0.0)由代表 package (v2.0.0) 于 2021 年 5 月 4 日创建
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.