简体   繁体   中英

How to combine similar records from two dataframes in r

I intend to write a code to compare if records between two dataframes match. df1 has IDs of objects with dates when they were found and weight of the objects sampled from df2. I want to verify if the data in df1 was correctly captured from df2. The first dataframes is,

df1 <- structure(list(ID = c("A5", "A8", "A15", "B11", "B35", "B36", "B45", "B50", "C2", "C3"),
DateFound = c("15/4/2020", "16/4/2020", "16/4/2020", "16/4/2020", "16/4/2021", "16/4/2021", "16/4/2021", "16/4/2021", "16/4/2021", "16/4/2021"),
Weight = c(40L, 45L, 36L, 44L, 49L, 34L, 36L, 42L, 46L, 38L)),
class = "data.frame", row.names = c(NA, -10L))

#     ID DateFound Weight
# 1   A5 15/4/2020     40
# 2   A8 16/4/2020     45
# 3  A15 16/4/2020     36
# 4  B11 16/4/2020     44
# 5  B35 16/4/2021     49
# 6  B36 16/4/2021     34
# 7  B45 16/4/2021     36
# 8  B50 16/4/2021     42
# 9   C2 16/4/2021     46
# 10  C3 16/4/2021     38

The second is,

df2 <- structure(list(ID = c("A5", "A8", "A15", "A20", "B6", "B11", 
"B35", "B36", "B37", "B40", "B45", "B50", "C2", "C3"), X13.4.2020 = c(42L, 
45L, 38L, 34L, 39L, 46L, 34L, 44L, 39L, 35L, 39L, 55L, 51L, 55L),
X14.4.2020 = c(0L, 0L, 38L, 0L, 0L, 0L, 0L, 0L, 40L, 0L, 0L, 
50L, 0L, 0L), X15.4.2020 = c(40L, 0L, 0L, 0L, 40L, 0L, 37L, 0L, 
38L, 36L, 0L, 0L, 51L, 54L), X16.4.2020 = c(0L, 46L, 39L, 0L, 
0L, 44L, 0L, 33L, 0L, 40L, 0L, 52L, 52L, 0L), X17.4.2020 = c(NA, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 43L, 0L, 0L, 0L, 42L), X16.4.2021 = c(NA, 
NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 42L, NA, 40L)), class = "data.frame", row.names = c(NA, -14L))

#     ID X13.4.2020 X14.4.2020 X15.4.2020 X16.4.2020 X17.4.2020 X16.4.2021
# 1   A5         42          0         40          0         NA         NA
# 2   A8         45          0          0         46          0         NA
# 3  A15         38         38          0         39          0          0
# ...

The intended output as shown below: it adds a column of the object weights from df2 for the same date recorded in df1.

df1_2output_

#     ID DateFound Weight df2
# 1   A5 15/4/2020     40  40
# 2   A8 16/4/2020     45  46
# 3  A15 16/4/2020     36  39
# 4  B11 16/4/2020     44  44
# 5  B35 16/4/2021     49   0
# 6  B36 16/4/2021     34  33
# 7  B45 16/4/2021     36   0
# 8  B50 16/4/2021     42  42
# 9   C2 16/4/2021     46  NA
# 10  C3  1/4/2021     38  40

The df2 is of wide format with dates as column names, so you need to transform it to long and convert those dates to the standard date format.

library(dplyr)
library(tidyr)

df1 %>%
  mutate(DateFound = as.Date(DateFound, '%d/%m/%Y')) %>%
  left_join(
    df2 %>% pivot_longer(-ID, names_to = 'DateFound', values_to = 'Weight',
                         names_transform = list(DateFound = ~ as.Date(.x, 'X%d.%m.%Y'))),
    by = c('ID', 'DateFound')
  )

    ID  DateFound Weight.x Weight.y
1   A5 2020-04-15       40       40
2   A8 2020-04-16       45       46
3  A15 2020-04-16       36       39
4  B11 2020-04-16       44       44
5  B35 2021-04-16       49        0
6  B36 2021-04-16       34        0
7  B45 2021-04-16       36        0
8  B50 2021-04-16       42       42
9   C2 2021-04-16       46       NA
10  C3 2021-04-16       38       40

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM