[英]Sum of unique combination of values in columns in r
My dataframe is as below我的 dataframe 如下
df <- data.frame(Webpage = c(111, 111, 111, 111, 222, 222),
Dept = c(101, 101, 101, 102, 102, 103),
Emp_Id = c(1, 1, 2, 3, 4, 4),
weights = c(5,5,2,3,4,5))
Webpage Dept Emp_Id weights
111 101 1 5
111 101 1 5
111 101 2 2
111 102 3 3
222 102 4 4
222 103 4 5
I want for each webpage what is the number of employee seen that webpage in terms of their weights and weight percentage.对于每个网页,我想根据权重和权重百分比查看该网页的员工人数是多少。 Unique employee are unique combination of Dept and Emp_ID
唯一员工是 Dept 和 Emp_ID 的唯一组合
For eg webpage 111 is seen by Emp_ID 1,2 and 3. So number of employee seen is sum of their weights ie 5+2+3 =10 and weight percentage is 0.52(10/19).例如,Emp_ID 1,2 和 3 可以看到网页 111。因此,看到的员工人数是其权重的总和,即 5+2+3 =10,权重百分比为 0.52(10/19)。 19 is the total sum of weights of unique employee(which is the unique combination of Dept and Emp_ID)
19 是唯一员工的权重总和(这是 Dept 和 Emp_ID 的唯一组合)
Webpage Number_people_seen seen_percentage
111 10 0.52
222 9 0.47
What I tried is below but not sure how to get the sum of weights.我尝试的是下面但不确定如何获得权重的总和。
library(dplyr)
df %>% group_by(Webpage) %>% distinct(Dept,Emp_Id)
df <- data.frame(Webpage = c(111, 111, 111, 111, 222, 222),
Dept = c(101, 101, 101, 102, 102, 103),
Emp_Id = c(1, 1, 2, 3, 4, 4),
weights = c(5,5,2,3,4,5))
library(tidyverse)
df %>%
group_by(Webpage) %>%
distinct(Dept,Emp_Id, .keep_all = T) %>%
summarise(Number_people_seen = sum(weights)) %>%
mutate(seen_percentage = prop.table(Number_people_seen))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 2 x 3
#> Webpage Number_people_seen seen_percentage
#> <dbl> <dbl> <dbl>
#> 1 111 10 0.526
#> 2 222 9 0.474
Created on 2021-04-05 by the reprex package (v0.3.0)由代表 package (v0.3.0) 于 2021 年 4 月 5 日创建
df %>% group_by(Webpage, Emp_Id) %>%
summarise(no_of_ppl_seen = unique(weights)) %>%
group_by(Webpage) %>%
summarise(no_of_ppl_seen = sum(no_of_ppl_seen)) %>%
mutate(seen_percentage = no_of_ppl_seen/sum(no_of_ppl_seen))
# A tibble: 2 x 3
Webpage no_of_ppl_seen seen_percentage
<dbl> <dbl> <dbl>
1 111 10 0.526
2 222 9 0.474
OR或者
df %>% filter(!duplicated(across(everything()))) %>%
group_by(Webpage) %>%
summarise(number_ppl_seen = sum(weights)) %>%
mutate(seen_perc = number_ppl_seen/sum(number_ppl_seen))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.