简体   繁体   中英

How can I paste text from dataframe rows, keeping only unique values in R

I have a dataframe where each row represents a person, and the columns represent the names they have gone by. Some of the values are NAs or are duplicates. The data looks like the dataframe immediately below.

Name Name1 Name2 Name3 Name4
Tom Tom Thomas Tom Tommy
Jim NA James NA Jimmy
Dave Dave David NA Davey
Tim NA Timothy Tim Timmy
Rob Rob NA Rob Robby
Sam NA NA Sam NA

I would like to combine the unique names from each row and put them into a new column, where each name only occurs once. I know i can use the paste function to produce a column where all the text values appear like this:

Name Name1 Name2 Name3 Name4 unique
Tom Tom Thomas NA Tommy Tom, Tom, Thomas, NA, Tommy

but I don't want the same text to appear multiple times in the unique column. How can I combine the row data such that each name only occurs once in the new $unique cell?

Name Name1 Name2 Name3 Name4 unique
Tom Tom Thomas Tom Tommy Tom, Thomas, Tommy
Jim NA James NA Jimmy Jim, James, Jimmy
Dave Dave David NA Davey Dave, David, Davey
Tim NA Timothy Tim Timmy Tim, Timothy, Timmy
Rob Rob NA Rob Robby Rob, Robert, Robby
Sam NA NA Sam NA Sam

Using tidyverse

library(dplyr)
df1 %>% 
 rowwise %>% 
  mutate(unique = toString(unique(na.omit(c_across(everything()))))) %>% 
  ungroup

-output

# A tibble: 6 × 6
  Name  Name1 Name2   Name3 Name4 unique             
  <chr> <chr> <chr>   <chr> <chr> <chr>              
1 Tom   Tom   Thomas  Tom   Tommy Tom, Thomas, Tommy 
2 Jim   <NA>  James   <NA>  Jimmy Jim, James, Jimmy  
3 Dave  Dave  David   <NA>  Davey Dave, David, Davey 
4 Tim   <NA>  Timothy Tim   Timmy Tim, Timothy, Timmy
5 Rob   Rob   <NA>    Rob   Robby Rob, Robby         
6 Sam   <NA>  <NA>    Sam   <NA>  Sam       

apply ing unique row-wise, na.omit and collapse toString .

transform(dat, unique=apply(dat, 1, \(x) toString(na.omit(unique(x)))))
#   Name Name1   Name2 Name3 Name4              unique
# 1  Tom   Tom  Thomas   Tom Tommy  Tom, Thomas, Tommy
# 2  Jim  <NA>   James  <NA> Jimmy   Jim, James, Jimmy
# 3 Dave  Dave   David  <NA> Davey  Dave, David, Davey
# 4  Tim  <NA> Timothy   Tim Timmy Tim, Timothy, Timmy
# 5  Rob   Rob    <NA>   Rob Robby          Rob, Robby
# 6  Sam  <NA>    <NA>   Sam  <NA>                 Sam

You could additionally implement a sort if you like.


Data:

dat <- structure(list(Name = c("Tom", "Jim", "Dave", "Tim", "Rob", "Sam"
), Name1 = c("Tom", NA, "Dave", NA, "Rob", NA), Name2 = c("Thomas", 
"James", "David", "Timothy", NA, NA), Name3 = c("Tom", NA, NA, 
"Tim", "Rob", "Sam"), Name4 = c("Tommy", "Jimmy", "Davey", "Timmy", 
"Robby", NA)), class = "data.frame", row.names = c(NA, -6L))
df%>%
 rowid_to_column()%>%
 left_join(pivot_longer(.,-rowid)%>%
 group_by(rowid)%>%    
 summarise(value=toString(na.omit(unique(value))), .groups = 'drop'))
  rowid Name Name1   Name2 Name3 Name4               value
1     1  Tom   Tom  Thomas   Tom Tommy  Tom, Thomas, Tommy
2     2  Jim  <NA>   James  <NA> Jimmy   Jim, James, Jimmy
3     3 Dave  Dave   David  <NA> Davey  Dave, David, Davey
4     4  Tim  <NA> Timothy   Tim Timmy Tim, Timothy, Timmy
5     5  Rob   Rob    <NA>   Rob Robby          Rob, Robby
6     6  Sam  <NA>    <NA>   Sam  <NA>                 Sam

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM