Get ID by the group and then count unique value for these IDs

Question

I have a data frame df as below:

Name = c("Tom D Frost","Tom D Frost", "Tom D Frost", "William J Hardy", "William J Hardy", "Steven D Debauche", "Nicholas K Foster", "Sean F Williamson")
Institute = c("ASA", "ASA", "ASA", "BSC", "BSC", "BSC", "AXB", "PSDZ")
ID = c(165, 170, 189, 181, 165, 784, 165, 170)
df = data.frame(Name, Institute, ID)

#df
            Name       Institute  ID
1        Tom D Frost       ASA    165
2        Tom D Frost       ASA    170
3        Tom D Frost       ASA    189
4    William J Hardy       BSC    181
5    William J Hardy       BSC    165
6  Steven D Debauche       BSC    784
7  Nicholas K Foster       AXB    165
8  Sean F Williamson      PSDZ    170

For each Name, I would like to get a group of ID and then count unique Name for those group of ID. For this sample df, I am expecting a result like this:

Name             Institute    UniqueCountofNamebyIDGroup
Tom D Frost        ASA            4
William J Hardy    BSC            3
Steven D Debauche  BSC            1
Nicholas K Foster  AXB            3
Sean F Williamson  PSDZ           2

In the data frame, I would like to count Names for group of IDs. For example, "Tom D Frost" has 3 ID such as 165, 170, and 189. So, I would like to count the unique names for those IDs. So, 165, 170 and 189 has 4 unique names such as "Tom D Frost", "William J Hardy", "Nicholas K Foster" and "Sean F Williamson". Therefore, unique count will be 4 for "Tom D Frost"

I tried using dplyr

library(dplyr)
df %>%
group_by(Name) %>%
summarise(uniqueCount = n())

This just gives me frequency for each Name as below:

           Name                      UniqueCount
         <fctr>                      <int>
1     Nicholas K Foster                1
2     Sean F Williamson                1
3     Steven D Debauche                1
4       Tom D Frost                    3
5     William J Hardy                  2

As mentioned above, I would like to count unique Name for group of IDs that belong to each Name in df not their frequency.

Any help and support is greatly appreciated. Thank you very much.

Answer 1

You can do a self-join:

df %>%
  inner_join(df, by="ID") %>% 
  group_by(Name.x, Institute.x) %>% 
  summarise(UniqueCount = n_distinct(Name.y, Institute.y))

# Source: local data frame [5 x 3]
# Groups: Name.x [?]
# 
#              Name.x Institute.x UniqueCount
#              <fctr>      <fctr>       <int>
# 1 Nicholas K Foster         AXB           3
# 2 Sean F Williamson        PSDZ           2
# 3 Steven D Debauche         BSC           1
# 4       Tom D Frost         ASA           4
# 5   William J Hardy         BSC           3

Get ID by the group and then count unique value for these IDs

Question

1 answers

solution1
1 ACCPTED 2017-03-08 11:12:58

Get ID by the group and then count unique value for these IDs

Question

1 answers

solution1 1 ACCPTED 2017-03-08 11:12:58

solution1
1 ACCPTED 2017-03-08 11:12:58