I have 2 data sets with 2 different type of observations. The observations were made during different days and they are recorded on different time intervals.
Both records have a serial number that is used to identify a group people who conducted observations. For example serial 111 denotes a group people. This group is formed by different number of people. The number of people belonging to these groups varies. For example group 111 consists of 3 people. In the diaries we can identify people belonging to different groups by serial
and id1
variable. For example serial 111 and id1 2
means that the observation was made by person number two from the group 111. There is also a Day
variable that denotes the week day when the observation was made. The Day
variable takes values from 1(Monday) to 7 (Sunday)
.
If in df1
we have 1 observation per person in df2
each person had to conducted 2 observations. The person who made the observation can be identified based on serial, id1 and id2
. Id2
is used to make difference between the week day observations. For example id 111, id1 3 and id2 2 can be interpreted as the 2 day observation made by person number 2 from the group 111. The week day of the observation is similarly saved by the Day
variable.
I want to identify those persons who recorded information on the same day in both diaries. So, who are those individuals who filled in both records on the same day. The problem is that in df2
there are 2 observations and in df1
just one per person and this makes merging difficult.
I merged based on serial and id1
but they are not unique identifiers. I tried to create a new variable and to merge on 'Day' level.
How can I merge the 2 data sets on daily level?
library(dplyr)
df1<-df1 %>%
mutate(index = group_indices_(df1, .dots=c("serial", "id1")))
df2<-df2 %>%
mutate(index = group_indices_(df2, .dots=c("serial", "id1", "id2")))
Sample date:
df1
structure(list(serial = c(12, 123, 123, 10, 10), id1 = c(1, 1,
2, 1, 2), Day = c(1, 3, 2, 4, 2)), class = "data.frame", row.names = c(NA,
-5L))
df2
structure(list(serial = c(12, 12, 123, 123, 123, 123, 10, 10,
10, 10, 10, 10), id1 = c(1, 1, 1, 1, 2, 2, 1, 1, 2, 2, 3, 3),
id2 = c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2), Day = c(1, 6,
3, 7, 2, 7, 4, 7, 2, 7, 4, 7), index = c(7L, 8L, 9L, 10L,
11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L)), row.names = c(NA, -12L
), class = "data.frame")
Sample data outcome:
serial id1 id2 Day
12 1 1 1
123 1 1 3
123 2 1 2
10 1 1 4
10 2 1 2
You can add the corresponding id2
value from df2
to df1
with an update-join using data.table
library(data.table)
setDT(df1)
setDT(df2)
df1[df2, id2 := i.id2, on = .(serial, id1, Day)]
df1
# serial id1 Day id2
# 1: 12 1 1 1
# 2: 123 1 3 1
# 3: 123 2 2 1
# 4: 10 1 4 1
# 5: 10 2 2 1
You can try merge
like below
merge(df1,df2,all.x = T)[1:4]
such that
> merge(df1,df2,all.x = T)[1:4]
serial id1 Day id2
1 10 1 4 1
2 10 2 2 1
3 12 1 1 1
4 123 1 3 1
5 123 2 2 1
Use merge: out <- merge(d1, d2, by = c('serial', 'id1'))
and then select the columns serial, id1, id2, Day
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.