[英]Difference in months between dates with different rows (same identifier) (in R)
cpf sep_month hire_month hire_year sep_day hire_date sep_date
4 123 4 2 2012 1 2012-02-01 2013-04-01
5 123 0 4 2013 1 2013-04-01 <NA>
6 122 10 9 2012 1 2012-09-01 2013-10-01
7 122 0 12 2013 1 2013-12-01 <NA>
structure(list(cpf = c(123L, 123L, 122L, 122L), sep_month = c(4L,
0L, 10L, 0L), hire_month = c(2L, 4L, 9L, 12L), hire_year = c(2012L,
2013L, 2012L, 2013L), sep_day = c(1L, 1L, 1L, 1L), hire_date = structure(c(15371,
15796, 15584, 16040), class = "Date"), sep_date = structure(c(15796,
NA, 15979, NA), class = "Date")), row.names = 4:7, class = "data.frame")
In my dataset, each row is a job contract.在我的数据集中,每一行都是一份工作合同。 I want to see the difference in months between sep_date and hire_date across different rows for the same CPF (identifier).我想查看同一 CPF(标识符)的不同行中 sep_date 和 hire_date 之间的月数差异。 For example, individual 123 separated from its job on 2013-04.例如,个人 123 在 2013-04 年离职。 On the following row, he was hired (under a different contract/job) on 2013-04.在下一行,他于 2013-04 年被聘用(根据不同的合同/工作)。 My goal is to create a dummy equal to 1 for individuals who separated from a contract and found a job in the same or following month.我的目标是为脱离合同并在同月或下个月找到工作的个人创建一个等于 1 的虚拟变量。 That would be the case for individual 123, but not for individual 122.个人 123 是这种情况,但个人 122 不是。
I appreciate any help.我感谢任何帮助。
library(dplyr)
dat %>%
group_by(cpf) %>%
mutate(dummy = sapply(sep_date, function(z) any(!is.na(z) & z <= hire_date))) %>%
ungroup()
# # A tibble: 4 x 8
# cpf sep_month hire_month hire_year sep_day hire_date sep_date dummy
# <int> <int> <int> <int> <int> <date> <date> <lgl>
# 1 123 4 2 2012 1 2012-02-01 2013-04-01 TRUE
# 2 123 0 4 2013 1 2013-04-01 NA FALSE
# 3 122 10 9 2012 1 2012-09-01 2013-10-01 TRUE
# 4 122 0 12 2013 1 2013-12-01 NA FALSE
If you want dummy
to be truly just 1
s and 0
s, then add a +
before it, as in dummy = +sapply(...)
( +
is a shortcut to convert logical
to integer
).如果您希望dummy
真正只是1
s 和0
s,那么在它之前添加一个+
,如dummy = +sapply(...)
( +
是将logical
转换为integer
的快捷方式)。
dat$dummy <- ave(
seq_along(dat$cpf), dat$cpf,
FUN = function(i) sapply(dat$sep_date[i], function(z) any(!is.na(z) & z <= dat$hire_date[i])))
dat
# cpf sep_month hire_month hire_year sep_day hire_date sep_date dummy
# 4 123 4 2 2012 1 2012-02-01 2013-04-01 1
# 5 123 0 4 2013 1 2013-04-01 <NA> 0
# 6 122 10 9 2012 1 2012-09-01 2013-10-01 1
# 7 122 0 12 2013 1 2013-12-01 <NA> 0
This is a little uglier because ave
doesn't like doing multi-column (multi-vector) stuff, but it gets the same results.这有点丑陋,因为ave
不喜欢做多列(多向量)的东西,但它得到相同的结果。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.