[英]generating a dummy in a panel dataframe R
I really need your help with this. 我真的需要您的帮助。 I have a panel dataframe which looks something like this
我有一个面板数据框,看起来像这样
Name A B
1 Marco 01/09/2014 NA
2 Marco NA 01/01/2015
3 Marco 02/01/2015 NA
4 Luca 01/01/2015 NA
5 Luca NA 31/01/2015
6 Silvia NA 15/01/2015
and I want to create a dummy variable taking value 1 if (condition 1), in column A, observations do not show a 2014-date OR (condition 2) if, in column B, observations show a 2015-date AND, at the same time, there is at least another observation for that individual but none of them being associated with a 2014-date in column A. In other words, I do not know how to impose a condition for the dummy which checks all the other observations related to the same individual (identified in the column "Name"). 并且我想创建一个虚拟值为1的虚拟变量,如果(条件1)在A列中观察未显示2014年日期,或(条件2)如果在B列中观察显示2015年日期与,在同时,至少对该个人有另一个观察结果,但在A列中没有一个与2014年的日期相关联。换句话说,我不知道该如何为假人施加条件来检查所有其他与观察结果相关的条件同一个人(在“名称”列中标识)。 The result I want is something like this
我想要的结果是这样的
Name A B dummy
1 Marco 01/09/2014 NA 0
2 Marco NA 01/01/2015 0
3 Marco 02/01/2015 NA 1
4 Luca 01/01/2015 NA 1
5 Luca NA 31/01/2015 1
6 Silvia NA 15/01/2015 0
In the example above, the value of the dummy at the first observation is 0 because of the 2014-date in column A (condition 1 not verified). 在上面的示例中,由于列A中的2014年日期(条件1未得到验证),第一次观察时的虚拟值是0。 At the second observation, the dummy takes value 0 because, despite the fact of the 2015-date in column B, the same individual (Marco) presents a 2014-date in Column A in at least one of the other observations related to him (observation 1 in this case).
在第二次观察中,该假人取值为0,因为尽管在B列中存在2015年日期,但同一个人(Marco)在与他有关的其他至少一项观察中在A列中显示了2014年日期(在这种情况下,观察1)。 Observation 4 instead shows the dummy equal to 1 since the date in column A is 2015. Observation 5 shows the dummy equal to 1 since, despite the 2015-date in column B, the same individual (Luca) does not have other observations with a 2014-date in column A (it has a 2015-date in observation 4).
相反,观察值4显示的虚拟对象等于1,因为A列中的日期是2015。由于观察者5的虚拟对象等于1,因为尽管B列中的日期是2015年,但同一个人(Luca)没有其他具有A列中的2014年日期(观察值4中有2015年日期)。 Finally, the dummy associated with Silvia must be 0 since, despite the 2015-date in column B, there is no other Silvia's observation in the dataframe.
最后,与Silvia相关的虚拟对象必须为0,因为尽管B列中的日期为2015年,但数据框中没有其他Silvia的观察结果。
I hope it is not too twisted and that I expressed my idea. 我希望它不会太扭曲,我表达了我的想法。 Let me know if this is not clear.
让我知道是否不清楚。 Besides the conditions themselves, if you help me just with the way to impose conditions accross different observations related to the same individual it would already help a lot.
除了条件本身之外,如果您仅通过在与同一个人相关的不同观察结果之间施加条件的方式来帮助我,那将已经很有帮助。
Thank you all! 谢谢你们! Marco
马尔科
structure(list(Name = c("Marco", "Marco", "Marco", "Luca", "Luca", "Silvia"), A = structure(c(1409529600, NA, 1420156800, 1420070400, NA, NA), class = c("POSIXct", "POSIXt"), tzone = "UTC"), B = structure(c(NA, 1420070400, NA, NA, 1422662400, 1421280000), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
You can use library lubridate and function from it year, to receive year from date. 您可以从年份开始使用库润滑和功能,以从日期接收年份。 Other note that if NA in if condition it gives NA, that is why it is better to convert NA to some values to use in if statements.
另请注意,如果NA在if条件中给出NA,这就是为什么最好将NA转换为要在if语句中使用的某些值。 Example of code is:
代码示例是:
library(lubridate)
Marco <- read.csv("Marcoset.csv",stringsAsFactors=F )
Marco$A[is.na(Marco$A)] <- "01/01/0001"
Marco$B[is.na(Marco$B)] <- "01/01/0001"
Marco$A <- as.Date(Marco$A, "%d/%m/%Y")
Marco$B <- as.Date(Marco$B, "%d/%m/%Y")
Obs <- Marco%>%
group_by(Name)%>%
mutate(i2014 = sign(sum(ifelse(year(A)=="2014",1,0))))%>%
filter(year(A) !="2014" & year(A)!="0001")%>%
select(Name, i2014)%>%
group_by(Name, i2014)%>%
summarise(obs=n())
Marco <- Marco%>%
left_join(Obs, by="Name")%>%
mutate(dummy= ifelse(((year(A)!="2014"& year(A)!="1")|(year(B)=="2015" & obs>=2 & i2014==0)),1,0))%>%
select(-obs, -i2014)
The NA
s make it a little tricky, but here's a direct method, adding the implied condition "A is not NA
" to the first case. NA
使它有些棘手,但是这是一种直接方法,将隐含条件“ A不是NA
”添加到第一种情况。 Using %in%
instead of ==
helps with other NA
issues because 1 %in% NA
is FALSE
, but 1 == NA
is NA
. 使用
%in%
代替==
可以解决其他NA
问题,因为1 %in% NA
为FALSE
,而1 == NA
为NA
。
dd %>% group_by(Name) %>%
mutate(dummy = as.integer((
!format(A, "%Y") %in% "2014" & !is.na(A)
) | (
format(B, "%Y") %in% "2015"
& n() > 1
& !any(format(A, "%Y") %in% "2014")
)
))
# # A tibble: 6 x 4
# # Groups: Name [3]
# Name A B dummy
# <chr> <dttm> <dttm> <int>
# 1 Marco 2014-09-01 00:00:00 NA 0
# 2 Marco NA 2015-01-01 00:00:00 0
# 3 Marco 2015-01-02 00:00:00 NA 1
# 4 Luca 2015-01-01 00:00:00 NA 1
# 5 Luca NA 2015-01-31 00:00:00 1
# 6 Silvia NA 2015-01-15 00:00:00 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.