简体   繁体   中英

Count number of unique values in two columns by group

I have a data frame with IDs for web page ('Webpage'), department ('Dept') and employee ('Emp_ID'):

df <- data.frame(Webpage = c(111, 111, 111, 111, 222, 222), 
                 Dept = c(101, 101, 101, 102, 102, 103), 
                 Emp_ID = c(1, 1, 2, 3, 4, 4)) 

#   Webpage Dept Emp_ID
# 1     111  101      1
# 2     111  101      1
# 3     111  101      2
# 4     111  102      3
# 5     222  102      4
# 6     222  103      4

I want to know how many unique individual has seen the different webpages.

在此处输入图像描述

For eg in the following dataset webpage 111 has been seen by three individual (unique combination of Dept and emp ID). So webpage 111 has been seen by emp_ID 1,2 and 3 in Dept 101 and 102. Similarly webpage 222 has been seen by two different individual.

My first attempt is:

nrow(unique(data[ , c("Dept", "Emp_ID)]))  

Using unique I can do for one web page, but can someone please suggest how I can calculate this for all web pages

For each Webpage count unique number based on two columns using duplicated .

library(dplyr)

df %>%
  group_by(Webpage) %>%
  summarise(n_viewers = sum(!duplicated(cur_data())))

#  Webpage n_viewers
#    <dbl>     <int>
#1     111         3
#2     222         2

data

Provide data in a reproducible format which is easier to copy rather than an image.

df <- data.frame(Webpage = c(111, 111, 111, 111, 222, 222), 
                 Dept = c(101, 101, 101, 102, 102, 103), 
                 Emp_Id = c(1, 1, 2, 3, 4, 4))
df <- data.frame(Webpage = c(111, 111, 111, 111, 222, 222), 
                 Dept = c(101, 101, 101, 102, 102, 103), 
                 Emp_Id = c(1, 1, 2, 3, 4, 4))
library(dplyr)

df %>% 
  group_by(Webpage) %>% 
  summarise(n = n_distinct(Dept, Emp_Id))
#> # A tibble: 2 x 2
#>   Webpage     n
#>     <dbl> <int>
#> 1     111     3
#> 2     222     2

library(data.table)
setDT(df)[, list(n = uniqueN(paste0(Dept, Emp_Id))), by = Webpage]
#>    Webpage n
#> 1:     111 3
#> 2:     222 2

Created on 2021-03-30 by the reprex package (v1.0.0)

Hope aggregate can help

> aggregate(cbind(n_viewer = Emp_Id) ~ Webpage, unique(df), length)
  Webpage n_viewer
1     111        3
2     222        2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM