I intend to write a code to compare if records between two dataframes match. df1 has IDs of objects with dates when they were found and weight of the objects sampled from df2. I want to verify if the data in df1 was correctly captured from df2. The first dataframes is,
df1 <- structure(list(ID = c("A5", "A8", "A15", "B11", "B35", "B36", "B45", "B50", "C2", "C3"),
DateFound = c("15/4/2020", "16/4/2020", "16/4/2020", "16/4/2020", "16/4/2021", "16/4/2021", "16/4/2021", "16/4/2021", "16/4/2021", "16/4/2021"),
Weight = c(40L, 45L, 36L, 44L, 49L, 34L, 36L, 42L, 46L, 38L)),
class = "data.frame", row.names = c(NA, -10L))
# ID DateFound Weight
# 1 A5 15/4/2020 40
# 2 A8 16/4/2020 45
# 3 A15 16/4/2020 36
# 4 B11 16/4/2020 44
# 5 B35 16/4/2021 49
# 6 B36 16/4/2021 34
# 7 B45 16/4/2021 36
# 8 B50 16/4/2021 42
# 9 C2 16/4/2021 46
# 10 C3 16/4/2021 38
The second is,
df2 <- structure(list(ID = c("A5", "A8", "A15", "A20", "B6", "B11",
"B35", "B36", "B37", "B40", "B45", "B50", "C2", "C3"), X13.4.2020 = c(42L,
45L, 38L, 34L, 39L, 46L, 34L, 44L, 39L, 35L, 39L, 55L, 51L, 55L),
X14.4.2020 = c(0L, 0L, 38L, 0L, 0L, 0L, 0L, 0L, 40L, 0L, 0L,
50L, 0L, 0L), X15.4.2020 = c(40L, 0L, 0L, 0L, 40L, 0L, 37L, 0L,
38L, 36L, 0L, 0L, 51L, 54L), X16.4.2020 = c(0L, 46L, 39L, 0L,
0L, 44L, 0L, 33L, 0L, 40L, 0L, 52L, 52L, 0L), X17.4.2020 = c(NA,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 43L, 0L, 0L, 0L, 42L), X16.4.2021 = c(NA,
NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 42L, NA, 40L)), class = "data.frame", row.names = c(NA, -14L))
# ID X13.4.2020 X14.4.2020 X15.4.2020 X16.4.2020 X17.4.2020 X16.4.2021
# 1 A5 42 0 40 0 NA NA
# 2 A8 45 0 0 46 0 NA
# 3 A15 38 38 0 39 0 0
# ...
The intended output as shown below: it adds a column of the object weights from df2 for the same date recorded in df1.
df1_2output_
# ID DateFound Weight df2
# 1 A5 15/4/2020 40 40
# 2 A8 16/4/2020 45 46
# 3 A15 16/4/2020 36 39
# 4 B11 16/4/2020 44 44
# 5 B35 16/4/2021 49 0
# 6 B36 16/4/2021 34 33
# 7 B45 16/4/2021 36 0
# 8 B50 16/4/2021 42 42
# 9 C2 16/4/2021 46 NA
# 10 C3 1/4/2021 38 40
The df2
is of wide format with dates as column names, so you need to transform it to long and convert those dates to the standard date format.
library(dplyr)
library(tidyr)
df1 %>%
mutate(DateFound = as.Date(DateFound, '%d/%m/%Y')) %>%
left_join(
df2 %>% pivot_longer(-ID, names_to = 'DateFound', values_to = 'Weight',
names_transform = list(DateFound = ~ as.Date(.x, 'X%d.%m.%Y'))),
by = c('ID', 'DateFound')
)
ID DateFound Weight.x Weight.y
1 A5 2020-04-15 40 40
2 A8 2020-04-16 45 46
3 A15 2020-04-16 36 39
4 B11 2020-04-16 44 44
5 B35 2021-04-16 49 0
6 B36 2021-04-16 34 0
7 B45 2021-04-16 36 0
8 B50 2021-04-16 42 42
9 C2 2021-04-16 46 NA
10 C3 2021-04-16 38 40
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.