R Dataframe - Extract Unique rows from columns

Question

I have a dataframe:

source= c("A", "A", "B") 
target = c("B", "C", "C") 
source_A = c(5, 5, 6) 
target_A = c(6, 7, 7) 
source_B = c(10, 10, 11)
target_B = c(11, 12, 12) 
c = c(0.5, 0.6, 0.7) 
df = data.frame(source, target, source_A, target_A, source_B, target_B, c) 

> df
  source target source_A target_A source_B target_B   c
1      A      B        5        6       10       11 0.5
2      A      C        5        7       10       12 0.6
3      B      C        6        7       11       12 0.7

How can I reduce this dataframe to return only the values for the unique source and target values and return (ignoring column c).

For the Values [ABC]

At the moment I do something like this:

df1 <- df[,c("source","source_A", "source_B")]
df2 <- df[,c("target","target_A", "target_B")]

names(df1)[names(df1) == 'source'] <- 'id'
names(df1)[names(df1) == 'source_A'] <- 'A'
names(df1)[names(df1) == 'source_B'] <- 'B'
names(df2)[names(df2) == 'target'] <- 'id'
names(df2)[names(df2) == 'target_A'] <- 'A'
names(df2)[names(df2) == 'target_B'] <- 'B'

df3 <- rbind(df1,df2)
df3[!duplicated(df3$id),]

  id A  B
1  A 5 10
3  B 6 11
5  C 7 12

In reality, I have tens of columns so this is non-viable long term.

How can I do this more succinctly (and ideally, generaliseable to more columns)?

Answer 1

library(dplyr)
library(magrittr)

df1 <- subset(df, select = ls(pattern = "source"))
df2 <- subset(df, select = ls(pattern = "target"))

names(df1) <- names(df2)
df <- bind_rows(df1, df2)
df %<>% group_by(target, target_A, target_B) %>% slice(1)

This should do it, but I do not quite know how you want to generalize it. I don't think this is the most elegant solution in the world, but it serves the purpose. Hopefully the columns that you intend to use can be targeted by the column name string pattern!

Answer 2

Here's a more general method with dplyr functions. You basically need to gather everything into a long format, where you can rename the variable accordingly, then spread them back into id, A, B :

library(dplyr)
library(tidyr)

df %>% 
  select(-c) %>% 
  mutate(index = row_number()) %>% 
  gather(key , value, -index) %>%  
  separate(key, c("type", "name"), fill = "right") %>% 
  mutate(name = ifelse(is.na(name), "id", name)) %>% 
  spread(key = name, value = value) %>% 
  select(id, matches("[A-Z]", ignore.case = FALSE)) %>% 
  distinct

R Dataframe - Extract Unique rows from columns

Question

2 answers

solution1
0 2017-10-18 09:48:43

solution2
0 2017-10-18 17:35:14

R Dataframe - Extract Unique rows from columns

Question

2 answers

solution1 0 2017-10-18 09:48:43

solution2 0 2017-10-18 17:35:14

solution1
0 2017-10-18 09:48:43

solution2
0 2017-10-18 17:35:14