简体   繁体   中英

Check if all rows are equal by group ID and return boolean value

I have a data frame where a unique ID is given to each unique instance where there is a string in either title.1 or title.2 . Each ID is coded with one or more names. See below:

title.1 title.2 name ID
A A1 fruit 1
A A1 fruit 1
B1 fruit 2
B fruit, vegetable 3
C C1 vegetable, poultry, grain 4
C C1 vegetable, poultry 4
C C1 vegetable, poultry 4
D1 poultry 5
D1 vegetable 5

I need to identify which IDs have the same name across rows and which do not. To do this, I'd like to group by ID and test to see if all name values are the same across all rows with that ID. Then, I'd like to append a new column with a boolean value indicating which IDs meet this condition and which do not. The output should look like this:

title.1 title.2 name ID names.equal
A A1 fruit 1 TRUE
A A1 fruit 1 TRUE
B1 fruit 2 TRUE
B fruit, vegetable 3 TRUE
C C1 vegetable, poultry, grain 4 FALSE
C C1 vegetable, poultry 4 FALSE
C C1 vegetable, poultry 4 FALSE
D1 poultry 5 FALSE
D1 vegetable 5 FALSE

We may use n_distinct on name to get the unique count and create logical with the count after grouping by ID

library(dplyr)
df1 %>%
   group_by(ID) %>%
   mutate(names.equal = n_distinct(name) == 1) %>%
   ungroup

-output

# A tibble: 9 × 5
  title.1 title.2 name                         ID names.equal
  <chr>   <chr>   <chr>                     <int> <lgl>      
1 A       A1      fruit                         1 TRUE       
2 A       A1      fruit                         1 TRUE       
3 <NA>    B1      fruit                         2 TRUE       
4 B       <NA>    fruit, vegetable              3 TRUE       
5 C       C1      vegetable, poultry, grain     4 FALSE      
6 C       C1      vegetable, poultry            4 FALSE      
7 C       C1      vegetable, poultry            4 FALSE      
8 <NA>    D1      poultry                       5 FALSE      
9 <NA>    D1      vegetable                     5 FALSE     

data

df1 <- structure(list(title.1 = c("A", "A", NA, "B", "C", "C", "C", 
NA, NA), title.2 = c("A1", "A1", "B1", NA, "C1", "C1", "C1", 
"D1", "D1"), name = c("fruit", "fruit", "fruit", "fruit, vegetable", 
"vegetable, poultry, grain", "vegetable, poultry", "vegetable, poultry", 
"poultry", "vegetable"), ID = c(1L, 1L, 2L, 3L, 4L, 4L, 4L, 5L, 
5L)), class = "data.frame", row.names = c(NA, -9L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM