I'm currently involved in Data Manipulation Task in R and trying to combine two datasets on chosen columns (= using primary and foreign keys - Column2)
Column1 <- c("Name1", "Name2", "Name3", "Name4")
Column2 <- c("ID1", "ID2", "ID3", "ID4")
Column3 <- c(4, 5, 6, 7)
Column4 <- c(8, 9, 10, 11)
Column5 <- c(1, 2, 3, 4)
table1 <- data.frame(Column1, Column2, Column3, Column4, Column5)
Column1 <- c("Name1", "Name2", "Name3", "Name4")
Column2 <- c("ID4", "ID5", "ID6", "ID7")
Column3 <- c(22, 33, 44, 66)
Column4 <- c(66, 55, 77, 77)
Column5 <- c(1, 2, 3, 4)
table2 <- data.frame(Column1, Column2, Column3, Column4, Column5)
table3 <- full_join(table1, table2, by = "Column2")
I've opted for a full join function as it may help to solve my task but encountered with a problem: using full_join function R shows the Column1.y column from the second table instead of listing the values of this column down to Column.1.x
For example, R produces: Column.1.x then Column2, Column3.x, Column4.x, Column5.x, next to the Column5.x I want to display Columns "Column3.y", "Column4.y", "Column5.y", but "Column.1.y" is displayed right after Column5.x instead of to be displayed down to "Column.1.x" where all the names are listed.
How can I fix it? :)
I go with @DarwinsBeard, you can "remove" the unwanted column, Column1.y. Keep in mind that you can perform joins with more than 1 key. That is the reason why you get Column1.x and Column1.y as this is not a join key and appears in both tables.
Check the following:
df1 <- tibble( Column1 = c("Name1","Name2","Name3","Name4")
,Column2 = c("ID1","ID2","ID3","ID4")
# I save Column3 and Column4
,Column5 = c(1,2,3,4)
)
df2 <- tibble( Column1 = c("Name4","Name5","Name6","Name7")
,Column2 = c("ID4","ID5","ID6","ID7")
,Yes = c(8,5,6,7)
,No = c(13,10,11,12)
,Neither = NA
)
# full join keeps columns of both data frames, but replicates Column1
# as the join was only performed on the id-column, i.e. Column2
# as suggested above, remove the unwanted Column1.y with a select(-...) call
df12 <- full_join(df1, df2, by = c("Column2"))
df12
# what I think you want
df12 <- full_join(df1, df2, by = c("Column1","Column2"))
df12
The latter gives you a fully merged data set by keeping both key columns intact.
Note: You can reshuffle the sequence of columns to your liking with a select() call. Eg try: df12 %>% select(Yes, No, Either, everything())
to see what happens.
df3 <- df1 %>%
full_join(df2,by = c("Column1" = "Column1"))
df3
Output:
Column1 Column2.x Column5 Column2.y Yes No Neither
<chr> <chr> <dbl> <chr> <dbl> <dbl> <lgl>
1 Name1 ID1 1 NA NA NA NA
2 Name2 ID2 2 NA NA NA NA
3 Name3 ID3 3 NA NA NA NA
4 Name4 ID4 4 ID4 8 13 NA
5 Name5 NA NA ID5 5 10 NA
6 Name6 NA NA ID6 6 11 NA
7 Name7 NA NA ID7 7 12 NA
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.