简体   繁体   English

R中达到一定值时删除组

[英]Removing a Group When A Certain Value is Reached in R

I'm prepping a dataframe for event history analysis.我正在准备一个 dataframe 用于事件历史分析。 The "group" in question consists of US states and the outcome of interest is whether or not they adopted a specific policy.有问题的“集团”由美国各州组成,感兴趣的结果是它们是否采取了特定的政策。 Because I'm dealing with a non-repeating event (once they adopt the policy, it is assumed to be binding from the year of adoption to the end of the dataset), I want to remove a state from the panel once they adopt the policy.因为我正在处理一个非重复事件(一旦他们采用了该政策,就假定它从采用之年到数据集的结束都具有约束力),我想在他们采用后从面板中删除 state政策。

Suppose we're looking at Pennsylvania, Arizona, and Georgia with data from 2010-2015.假设我们使用 2010-2015 年的数据查看宾夕法尼亚州、亚利桑那州和乔治亚州。 Let's say Arizona adopts the policy in 2012. Setting up the data would look something like this:假设亚利桑那州在 2012 年采用了该政策。设置数据将如下所示:

# create the panel
year <- rep(2010:2015, times = 3)
state <- rep(c("AZ","PA","GA"), each = 6)

panel <- as.data.frame(cbind(year, state))

# create dummy to indicate adoption
panel$adopted <- 0

# set adopted = 1 when AZ adopts the policy
panel$adopted[panel$year == 2012 & panel$state == "AZ"] <- 1

I would then want to remove AZ's observations from the years 2013-2015 but keep all observations for GA and PA.然后,我想从 2013-2015 年删除 AZ 的观察结果,但保留 GA 和 PA 的所有观察结果。

I've thought about generating some kind of loop that identifies the rows in which the adoption variable equals 1, creating a new variable that would identify subsequent rows as ones that need to be deleted, and then filtering out those rows:我考虑过生成某种循环来标识采用变量等于 1 的行,创建一个新变量,将后续行标识为需要删除的行,然后过滤掉这些行:

df$delete <- 0 

for (row in c(1:nrow(df))) {
 if df$adopted[row] == 1 {
  df$delete[row+1] <- 1
}
}

df <- df %>% filter(delete == 0)

However, while I know how to call the next row (df$delete[row+1]), I need to know how to call each row that follows the observation in which adopted == 1 up to the last row for the state.但是,虽然我知道如何调用下一行 (df$delete[row+1]),但我需要知道如何调用观察结果之后的每一行,其中采用 == 1 直到 state 的最后一行。 Any ideas?有任何想法吗? Happy to clarify if something is unclear.如果有不清楚的地方,很高兴澄清。

Try data.table package:尝试data.table package:

# convert to a data.table
panel <- data.table(panel) 
# get a year of adoption by state. In your case both min, and max works
panel[, year_adopted := min(year[adopted == 1]), by = .(state)] 
# filter out row where year < year of adopting policy or there is no adopting policy
panel[year <= year_adopted | is.na(year_adopted)]

I think it is much easier to tackle this without a loop.我认为在没有循环的情况下解决这个问题要容易得多。 Since this will probably be something you'll want to do to every state in the dataset, a function might be useful:由于这可能是您想要对数据集中的每个 state 执行的操作,因此 function 可能有用:

rm_after_adopted = function (panel, st) {
  year_adopted = with(panel, year[adopted == 1 & state == st])
  after_adopted = with(panel, {
    which(year > year_adopted & state == st)
  })
  return(panel[-after_adopted, ])
}

rm_after_adopted(panel, st = 'AZ')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM