简体   繁体   English

r 中列中唯一值组合的总和

[英]Sum of unique combination of values in columns in r

My dataframe is as below我的 dataframe 如下

df <- data.frame(Webpage = c(111, 111, 111, 111, 222, 222), 
             Dept = c(101, 101, 101, 102, 102, 103), 
              Emp_Id = c(1, 1, 2, 3, 4, 4),
              weights = c(5,5,2,3,4,5))

Webpage Dept Emp_Id weights
111     101      1       5
111     101      1       5
111     101      2       2
111     102      3       3  
222     102      4       4
222     103      4       5

I want for each webpage what is the number of employee seen that webpage in terms of their weights and weight percentage.对于每个网页,我想根据权重和权重百分比查看该网页的员工人数是多少。 Unique employee are unique combination of Dept and Emp_ID唯一员工是 Dept 和 Emp_ID 的唯一组合

For eg webpage 111 is seen by Emp_ID 1,2 and 3. So number of employee seen is sum of their weights ie 5+2+3 =10 and weight percentage is 0.52(10/19).例如,Emp_ID 1,2 和 3 可以看到网页 111。因此,看到的员工人数是其权重的总和,即 5+2+3 =10,权重百分比为 0.52(10/19)。 19 is the total sum of weights of unique employee(which is the unique combination of Dept and Emp_ID) 19 是唯一员工的权重总和(这是 Dept 和 Emp_ID 的唯一组合)

Webpage    Number_people_seen    seen_percentage
111                 10            0.52
222                  9            0.47

What I tried is below but not sure how to get the sum of weights.我尝试的是下面但不确定如何获得权重的总和。

library(dplyr)
df %>% group_by(Webpage) %>% distinct(Dept,Emp_Id)
df <- data.frame(Webpage = c(111, 111, 111, 111, 222, 222), 
                 Dept = c(101, 101, 101, 102, 102, 103), 
                 Emp_Id = c(1, 1, 2, 3, 4, 4),
                 weights = c(5,5,2,3,4,5))

library(tidyverse)
df %>% 
  group_by(Webpage) %>% 
  distinct(Dept,Emp_Id, .keep_all = T) %>% 
  summarise(Number_people_seen = sum(weights)) %>% 
  mutate(seen_percentage = prop.table(Number_people_seen))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 2 x 3
#>   Webpage Number_people_seen seen_percentage
#>     <dbl>              <dbl>           <dbl>
#> 1     111                 10           0.526
#> 2     222                  9           0.474

Created on 2021-04-05 by the reprex package (v0.3.0)代表 package (v0.3.0) 于 2021 年 4 月 5 日创建

df %>% group_by(Webpage, Emp_Id) %>%
  summarise(no_of_ppl_seen = unique(weights)) %>%
  group_by(Webpage) %>%
  summarise(no_of_ppl_seen = sum(no_of_ppl_seen)) %>%
  mutate(seen_percentage = no_of_ppl_seen/sum(no_of_ppl_seen))

# A tibble: 2 x 3
  Webpage no_of_ppl_seen seen_percentage
    <dbl>          <dbl>           <dbl>
1     111             10           0.526
2     222              9           0.474

OR或者

df %>% filter(!duplicated(across(everything()))) %>%
  group_by(Webpage) %>%
  summarise(number_ppl_seen = sum(weights)) %>%
  mutate(seen_perc = number_ppl_seen/sum(number_ppl_seen))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM