I have a data frame that has an ID number and corresponding data, some of the ID numbers are repeated in multiple rows, and I want to merge this data frame with another that has one ID number per row. So the result would be to add multiple columns to each row/ID to cover the duplicates.
I've been playing around with the merge() and aggregate() functions trying to get this to work, but have not come close to what I want. I've also spent a lot of time searching stack overflow to find a solution and haven't been able to find anything.
This is what the first data frame looks like:
df1 <- data.frame(ID = c(90051, 90051, 90051, 90229, 90229, 91120, 91120, 89649),
SPP = c("ABLA", "PICO", "POTR5", "ABLA", "PICO", "ABLA", "POTR5", "ABLA"),
COUNT = c(5, 4, 1, 7, 1, 3, 5, 11))
This is what the data frame that I want to modify looks like
df2 <- data.frame(ID = c(85470, 90051, 90229, 91120, 89649, 84364),
COUNTY = c(49, 57, 107, 107, 117, 37), STATUS = c(1, 1, 1, 2, 1, 3))
And this is what I want my resulting data frame to look like
df3 <- data.frame(ID = c(85470, 90051, 90229, 91120, 89649, 84364),
COUNTY = c(49, 57, 107, 107, 117, 37), STATUS = c(1, 1, 1, 2, 1, 3),
ABLA = c(NA, 5, 7, 3, 11, NA), PICO = c(NA, 4, 1, NA, NA, NA), POTR5 = c(NA, 7, NA, 5, NA, NA))
I believe this should do it. By using all.x = TRUE you make the merge like a left outer join from SQL.
merge(x = df1, y = df2, by = "ID", all.x = TRUE)
I think you can first use spread
and then you can do a right_join
.
library(tidyr)
library(dplyr)
result <- spread(df1, key = SPP, value = COUNT) %>%
right_join(df2, by = "ID")
Giving you the desired result:
> result
ID ABLA PICO POTR5 COUNTY STATUS
1 85470 NA NA NA 49 1
2 90051 5 4 1 57 1
3 90229 7 1 NA 107 1
4 91120 3 NA 5 107 2
5 89649 11 NA NA 117 1
6 84364 NA NA NA 37 3
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.