简体   繁体   English

具有不同行(相同标识符)的日期之间的月份差异(在 R 中)

[英]Difference in months between dates with different rows (same identifier) (in R)


 cpf sep_month hire_month hire_year sep_day  hire_date   sep_date
4 123         4          2      2012       1 2012-02-01 2013-04-01
5 123         0          4      2013       1 2013-04-01       <NA>
6 122        10          9      2012       1 2012-09-01 2013-10-01
7 122         0         12      2013       1 2013-12-01       <NA>

structure(list(cpf = c(123L, 123L, 122L, 122L), sep_month = c(4L, 
0L, 10L, 0L), hire_month = c(2L, 4L, 9L, 12L), hire_year = c(2012L, 
2013L, 2012L, 2013L), sep_day = c(1L, 1L, 1L, 1L), hire_date = structure(c(15371, 
15796, 15584, 16040), class = "Date"), sep_date = structure(c(15796, 
NA, 15979, NA), class = "Date")), row.names = 4:7, class = "data.frame")

In my dataset, each row is a job contract.在我的数据集中,每一行都是一份工作合同。 I want to see the difference in months between sep_date and hire_date across different rows for the same CPF (identifier).我想查看同一 CPF(标识符)的不同行中 sep_date 和 hire_date 之间的月数差异。 For example, individual 123 separated from its job on 2013-04.例如,个人 123 在 2013-04 年离职。 On the following row, he was hired (under a different contract/job) on 2013-04.在下一行,他于 2013-04 年被聘用(根据不同的合同/工作)。 My goal is to create a dummy equal to 1 for individuals who separated from a contract and found a job in the same or following month.我的目标是为脱离合同并在同月或下个月找到工作的个人创建一个等于 1 的虚拟变量。 That would be the case for individual 123, but not for individual 122.个人 123 是这种情况,但个人 122 不是。

I appreciate any help.我感谢任何帮助。

dplyr dplyr

library(dplyr)
dat %>%
  group_by(cpf) %>%
  mutate(dummy = sapply(sep_date, function(z) any(!is.na(z) & z <= hire_date))) %>%
  ungroup()
# # A tibble: 4 x 8
#     cpf sep_month hire_month hire_year sep_day hire_date  sep_date   dummy
#   <int>     <int>      <int>     <int>   <int> <date>     <date>     <lgl>
# 1   123         4          2      2012       1 2012-02-01 2013-04-01 TRUE 
# 2   123         0          4      2013       1 2013-04-01 NA         FALSE
# 3   122        10          9      2012       1 2012-09-01 2013-10-01 TRUE 
# 4   122         0         12      2013       1 2013-12-01 NA         FALSE

If you want dummy to be truly just 1 s and 0 s, then add a + before it, as in dummy = +sapply(...) ( + is a shortcut to convert logical to integer ).如果您希望dummy真正只是1 s 和0 s,那么在它之前添加一个+ ,如dummy = +sapply(...)+是将logical转换为integer的快捷方式)。

base R基地 R

dat$dummy <- ave(
  seq_along(dat$cpf), dat$cpf,
  FUN = function(i) sapply(dat$sep_date[i], function(z) any(!is.na(z) & z <= dat$hire_date[i])))
dat
#   cpf sep_month hire_month hire_year sep_day  hire_date   sep_date dummy
# 4 123         4          2      2012       1 2012-02-01 2013-04-01     1
# 5 123         0          4      2013       1 2013-04-01       <NA>     0
# 6 122        10          9      2012       1 2012-09-01 2013-10-01     1
# 7 122         0         12      2013       1 2013-12-01       <NA>     0

This is a little uglier because ave doesn't like doing multi-column (multi-vector) stuff, but it gets the same results.这有点丑陋,因为ave不喜欢做多列(多向量)的东西,但它得到相同的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM