I am currently using data.table in R and I have a data set like the following:
ID mon age
1 1 22
1 2 56
1 5 106
2 1 34
2 3 65
2 4 76
I would like to create a variable called diff that calculates the difference in age within each ID's observations only if the mon variable is incrementing by 1. If it's not incrementing by 1 then I'd like diff to equal NA.
This is what I'd like the data set to look like:
ID mon age diff
1 1 22 NA
1 2 56 34
1 5 106 NA
2 1 34 NA
2 3 65 NA
2 4 76 11
I know this would need to be some type of if-else statement, but I'm not sure how utilize an if-else statement to iterate through each observation and check if the mon variable is incrementing by only 1. Any insight would be greatly appreciated.
We can group by 'ID', take the diff
erence of adjacent elements of 'age', and multiply with a logical vector created with diff
off 'mon' changed to NA
so that those places with more than 1 difference becomes NA
library(dplyr)
df1 %>%
group_by(ID) %>%
mutate(diff = c(NA, diff(age)) * c(NA, NA^(diff(mon) != 1)))
# A tibble: 6 x 4
# Groups: ID [2]
# ID mon age diff
# <int> <int> <int> <dbl>
#1 1 1 22 NA
#2 1 2 56 34
#3 1 5 106 NA
#4 2 1 34 NA
#5 2 3 65 NA
#6 2 4 76 11
You can use shift
to get the previous value of mon
and check if the difference is 1.
library(data.table)
df[, diff:= ifelse(mon - shift(mon) == 1, age - shift(age), NA), ID]
df
# ID mon age diff
#1: 1 1 22 NA
#2: 1 2 56 34
#3: 1 5 106 NA
#4: 2 1 34 NA
#5: 2 3 65 NA
#6: 2 4 76 11
Or similarly in dplyr
we can use lag
library(dplyr)
df %>%
group_by(ID) %>%
mutate(diff = if_else(mon - lag(mon) == 1, age- lag(age), NA_integer_))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.