简体   繁体   English

r - 根据另一列上的不同值求和

[英]r - Sum based on Distinct values on another Column

I'm looking for a tidyverse solution to sum a column based on unique values of an ID column, while still summing other columns based on all values.我正在寻找一种 tidyverse 解决方案来根据 ID 列的唯一值对列进行求和,同时仍根据所有值对其他列求和。

Example data:示例数据:

   dat <- data.frame(
        manager = c("Adam", "Adam", "Adam", "Bill", "Bill", "Charlie", "Dan"),
        manager_age = c(30, 30, 30, 33, 33, 35, 35),
        sales = c(4, 12, 7, 4, 2, 15, 10))
   dat

  manager manager_age sales
1    Adam          30     4
2    Adam          30    12
3    Adam          30     7
4    Bill          33     4
5    Bill          33     2
6 Charlie          35    15
7     Dan          35    10

I want to sum all values of sales but only sum one value per manager for manager_age .我想总结所有销售价值,但每个经理只为manager_age总结一个价值。

Desired output:所需的 output:

  unique_managers total_sales total_age
               4          54      133

I'm most of the way there, but need help with the summed age:我大部分时间都在那里,但在总年龄方面需要帮助:

results <- dat %>%summarize(unique_managers = n_distinct(manager), total_sales = sum(sales))
results

Thanks in advance!提前致谢!

Edit: Updated example data to include two managers with same age.编辑:更新了示例数据以包括两个年龄相同的经理。

This should do it for you:这应该为你做:

library(dplyr)   

results <- dat %>% 
  summarize(unique_managers = n_distinct(manager),
            total_sales = sum(sales)) %>% 
  cbind(dat %>% 
          select(manager, manager_age) %>% 
          group_by(manager) %>% 
          unique() %>% 
          ungroup() %>% 
          summarize(total_age = sum(manager_age)))

Which gives us:这给了我们:

> results
  unique_managers total_sales total_age
1               3          44        98

Edit:编辑:

If you have two managers with the same age:如果您有两个年龄相同的经理:

dat <- data.frame(
  manager = c("Adam", "Adam", "Adam", "Bill", "Bill", "Charlie", "Dante"),
  manager_age = c(30, 30, 30, 33, 33, 35, 30),
  sales = c(4, 12, 7, 4, 2, 15, 14))

Gives us:给我们:

  unique_managers total_sales total_age
1               4          58       128

Attention: in order to avoid the case that different person have the same age, the unique operation on age should be within each personal group firstly.注意:为了避免不同人年龄相同的情况,年龄的unique操作应该首先在每个人的组内进行。

library(data.table)
dat <- data.frame(
    manager = c("Adam", "Adam", "Adam", "Bill", "Bill", "Charlie"),
    manager_age = c(30, 30, 30, 33, 33, 35),
    sales = c(4, 12, 7, 4, 2, 15))

setDT(dat)[,.(managers=.NGRP,
       sales = sum(sales),
       age=unique(manager_age)),
       by=manager][,.(unique_managers = unique(managers),
                      total_sales = sum(sales),
                      total_age = sum(age))]
#>    unique_managers total_sales total_age
#> 1:               3          44        98

Created on 2021-05-04 by the reprex package (v2.0.0)代表 package (v2.0.0) 于 2021 年 5 月 4 日创建

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM