[英]In [R], gen new variable for each value of group
I have id variable and date variable where there are multiple dates for a given id (a panel). 我有id变量和date变量,其中给定id(一个面板)有多个日期。 I would like to generate a new variable based on whether ANY of the years for a given id meet a logical condition. 我想根据给定ID的年份中的任何年份是否满足逻辑条件来生成一个新变量。 I am not sure of how to code it so please don't take the following as R code, just as logical pseudocode. 我不确定如何编码,因此请不要将以下内容作为逻辑代码作为R代码。 Something like 就像是
foreach(i in min(id):max(id)) {
if(var1[yearvar[1:max(yearvar)]=="A") then { newvar==1}
}
As an example: 举个例子:
ID Year Letter
1 1999 A
1 2000 B
2 2000 C
3 1999 A
Should return newvar
1 1 0 1 应该返回newvar
1 1 0 1
Since data[ID==1]
contains A in some year, it should also ==1
in 2000 despite Letter==B
that year. 由于data[ID==1]
在某年中包含A,因此即使Letter==B
在那一年,它在2000年也应==1
。
Here's a solution using plyr
: 这是使用plyr
的解决方案:
library(plyr)
a <- ddply(dat, .(ID), summarise, newvar = as.numeric(any(Letter == "A")))
merge(ID, a, by="ID")
Here's a way of approaching it with base R: 这是使用基数R进行处理的一种方法:
#Find which ID meet first criteria
withA <- unique(dat$ID[dat$Letter == "A"])
#add new column based on whether ID is in withA
dat$newvar <- as.numeric(dat$ID %in% withA)
# ID Year Letter newvar
# 1 1 1999 A 1
# 2 1 2000 B 1
# 3 2 2000 C 0
# 4 3 1999 A 1
Without using a package: 不使用包:
dat <- data.frame(
ID = c(1,1,2,3),
Year = c(1999,2000,2000,1999),
Letter = c("A","B","C","A")
)
tableData <- table(dat[,c("ID","Letter")])
newvar <- ifelse(tableData[dat$ID,"A"]==1,1,0)
dat <- cbind(dat,newvar)
# ID Year Letter newvar
#1 1 1999 A 1
#2 1 2000 B 1
#3 2 2000 C 0
#4 3 1999 A 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.