简体   繁体   中英

Combine only different rows in r

I have two data.frames similar to these:

#Dt1
Id  Date Weight  
1    2     10
2    3     20
3    4     30
4    4     30
5    6     40

and

#DT2
Id  Date Weight late 
1    2     10     3
2    3     20     4
3    4     30     5
8    5     10     6

I would like to merge these files considering only the different ID between them like this:

#Dt.final
Id  Date Weight late
4    4     30    NA
5    6     40    NA
8    5     10     6

My originals files are bigger than these, thanks.

besides @yarnabrina answer, The anti_join in dplyr is also what you need, but we have to apply twice. anti_join(x, y) drops all obs in x that have a match in y :

> full_join(anti_join(df1, df2, by = 'Id'), anti_join(df2, df1, by = 'Id'))
Joining, by = c("Id", "Date", "Weight")
  Id Date Weight late
1  4    4     30   NA
2  5    6     40   NA
3  8    5     10    6

Are you looking for something like this?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df1 <- data.frame(Id = c(1, 2, 3, 4, 5),
                  Date = c(2, 3, 4, 4, 6),
                  Weight = c(10, 20, 30, 30, 40))

df2 <- data.frame(Id = c(1, 2, 3, 8),
                  Date = c(2, 3, 4, 5),
                  Weight = c(10, 20, 30, 10),
                  late = c(3, 4, 5, 6))

full_join(x = filter(.data = df1, Id %in% setdiff(x = df1$Id, y = df2$Id)),
          y = filter(.data = df2, Id %in% setdiff(x = df2$Id, y = df1$Id)))
#> Joining, by = c("Id", "Date", "Weight")
#>   Id Date Weight late
#> 1  4    4     30   NA
#> 2  5    6     40   NA
#> 3  8    5     10    6

Created on 2019-05-03 by the reprex package (v0.2.1)

maybe this can solve your problem, it is an hand made answer but I hope not too bad :

df_1 <- data.frame(ID = factor(1:5, levels=1:8),
                   Date = c(2, 3, 4, 4, 6),
                   Weight = c(10, 20, 20, 30, 40))

df_2 <- data.frame(ID = factor(4:8, levels=1:8),
                   Date = c(2, 3, 4, 4, 6),
                   Weight = c(10, 20, 20, 30, 40),
                   late = c(1, 2, 3, 4, 5))

# Temporary dataframe
df_temp <- data.frame(
  df_1[!df_1$ID %in% df_2$ID, ],
  late = NA)

df.final <- rbind(
  df_temp,
  df_2[!df_2$ID %in% df_1$ID, ])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM