简体   繁体   中英

Removal extra column from a dataframe after full_join function in R

I'm currently involved in Data Manipulation Task in R and trying to combine two datasets on chosen columns (= using primary and foreign keys - Column2)

Column1 <- c("Name1", "Name2", "Name3", "Name4")
Column2 <- c("ID1", "ID2", "ID3", "ID4")
Column3 <- c(4, 5, 6, 7)
Column4 <- c(8, 9, 10, 11)
Column5 <- c(1, 2, 3, 4)

table1 <- data.frame(Column1, Column2, Column3, Column4, Column5)
Column1 <- c("Name1", "Name2", "Name3", "Name4")
Column2 <- c("ID4", "ID5", "ID6", "ID7")
Column3 <- c(22, 33, 44, 66)
Column4 <- c(66, 55, 77, 77)
Column5 <- c(1, 2, 3, 4)

table2 <- data.frame(Column1, Column2, Column3, Column4, Column5)
table3 <- full_join(table1, table2, by = "Column2")

I've opted for a full join function as it may help to solve my task but encountered with a problem: using full_join function R shows the Column1.y column from the second table instead of listing the values of this column down to Column.1.x

For example, R produces: Column.1.x then Column2, Column3.x, Column4.x, Column5.x, next to the Column5.x I want to display Columns "Column3.y", "Column4.y", "Column5.y", but "Column.1.y" is displayed right after Column5.x instead of to be displayed down to "Column.1.x" where all the names are listed.

How can I fix it? :)

I go with @DarwinsBeard, you can "remove" the unwanted column, Column1.y. Keep in mind that you can perform joins with more than 1 key. That is the reason why you get Column1.x and Column1.y as this is not a join key and appears in both tables.

Check the following:

df1 <- tibble( Column1 = c("Name1","Name2","Name3","Name4")
              ,Column2 = c("ID1","ID2","ID3","ID4")
              # I save Column3 and Column4 
              ,Column5 = c(1,2,3,4)
              )
df2 <- tibble( Column1 = c("Name4","Name5","Name6","Name7")
              ,Column2 = c("ID4","ID5","ID6","ID7")
              ,Yes     = c(8,5,6,7) 
              ,No      = c(13,10,11,12)
              ,Neither = NA
              )

# full join keeps columns of both data frames, but replicates Column1
# as the join was only performed on the id-column, i.e. Column2
# as suggested above, remove the unwanted Column1.y with a select(-...) call
df12 <- full_join(df1, df2, by = c("Column2"))
df12

# what I think you want
df12 <- full_join(df1, df2, by = c("Column1","Column2"))
df12

The latter gives you a fully merged data set by keeping both key columns intact.

在此处输入图像描述

Note: You can reshuffle the sequence of columns to your liking with a select() call. Eg try: df12 %>% select(Yes, No, Either, everything()) to see what happens.

df3 <- df1 %>% 
  full_join(df2,by = c("Column1" = "Column1"))
df3

Output:

  Column1 Column2.x Column5 Column2.y   Yes    No Neither
  <chr>   <chr>       <dbl> <chr>     <dbl> <dbl> <lgl>  
1 Name1   ID1             1 NA           NA    NA NA     
2 Name2   ID2             2 NA           NA    NA NA     
3 Name3   ID3             3 NA           NA    NA NA     
4 Name4   ID4             4 ID4           8    13 NA     
5 Name5   NA             NA ID5           5    10 NA     
6 Name6   NA             NA ID6           6    11 NA     
7 Name7   NA             NA ID7           7    12 NA   

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM