简体   繁体   中英

Get ID by the group and then count unique value for these IDs

I have a data frame df as below:

Name = c("Tom D Frost","Tom D Frost", "Tom D Frost", "William J Hardy", "William J Hardy", "Steven D Debauche", "Nicholas K Foster", "Sean F Williamson")
Institute = c("ASA", "ASA", "ASA", "BSC", "BSC", "BSC", "AXB", "PSDZ")
ID = c(165, 170, 189, 181, 165, 784, 165, 170)
df = data.frame(Name, Institute, ID)

#df
            Name       Institute  ID
1        Tom D Frost       ASA    165
2        Tom D Frost       ASA    170
3        Tom D Frost       ASA    189
4    William J Hardy       BSC    181
5    William J Hardy       BSC    165
6  Steven D Debauche       BSC    784
7  Nicholas K Foster       AXB    165
8  Sean F Williamson      PSDZ    170

For each Name, I would like to get a group of ID and then count unique Name for those group of ID. For this sample df, I am expecting a result like this:

Name             Institute    UniqueCountofNamebyIDGroup
Tom D Frost        ASA            4
William J Hardy    BSC            3
Steven D Debauche  BSC            1
Nicholas K Foster  AXB            3
Sean F Williamson  PSDZ           2

In the data frame, I would like to count Names for group of IDs. For example, "Tom D Frost" has 3 ID such as 165, 170, and 189. So, I would like to count the unique names for those IDs. So, 165, 170 and 189 has 4 unique names such as "Tom D Frost", "William J Hardy", "Nicholas K Foster" and "Sean F Williamson". Therefore, unique count will be 4 for "Tom D Frost"

I tried using dplyr

library(dplyr)
df %>%
group_by(Name) %>%
summarise(uniqueCount = n())

This just gives me frequency for each Name as below:

           Name                      UniqueCount
         <fctr>                      <int>
1     Nicholas K Foster                1
2     Sean F Williamson                1
3     Steven D Debauche                1
4       Tom D Frost                    3
5     William J Hardy                  2

As mentioned above, I would like to count unique Name for group of IDs that belong to each Name in df not their frequency.

Any help and support is greatly appreciated. Thank you very much.

You can do a self-join:

df %>%
  inner_join(df, by="ID") %>% 
  group_by(Name.x, Institute.x) %>% 
  summarise(UniqueCount = n_distinct(Name.y, Institute.y))

# Source: local data frame [5 x 3]
# Groups: Name.x [?]
# 
#              Name.x Institute.x UniqueCount
#              <fctr>      <fctr>       <int>
# 1 Nicholas K Foster         AXB           3
# 2 Sean F Williamson        PSDZ           2
# 3 Steven D Debauche         BSC           1
# 4       Tom D Frost         ASA           4
# 5   William J Hardy         BSC           3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM