R中达到一定值时删除组

Question

I'm prepping a dataframe for event history analysis.我正在准备一个 dataframe 用于事件历史分析。 The "group" in question consists of US states and the outcome of interest is whether or not they adopted a specific policy.有问题的“集团”由美国各州组成，感兴趣的结果是它们是否采取了特定的政策。 Because I'm dealing with a non-repeating event (once they adopt the policy, it is assumed to be binding from the year of adoption to the end of the dataset), I want to remove a state from the panel once they adopt the policy.因为我正在处理一个非重复事件（一旦他们采用了该政策，就假定它从采用之年到数据集的结束都具有约束力），我想在他们采用后从面板中删除 state政策。

Suppose we're looking at Pennsylvania, Arizona, and Georgia with data from 2010-2015.假设我们使用 2010-2015 年的数据查看宾夕法尼亚州、亚利桑那州和乔治亚州。 Let's say Arizona adopts the policy in 2012. Setting up the data would look something like this:假设亚利桑那州在 2012 年采用了该政策。设置数据将如下所示：

# create the panel
year <- rep(2010:2015, times = 3)
state <- rep(c("AZ","PA","GA"), each = 6)

panel <- as.data.frame(cbind(year, state))

# create dummy to indicate adoption
panel$adopted <- 0

# set adopted = 1 when AZ adopts the policy
panel$adopted[panel$year == 2012 & panel$state == "AZ"] <- 1

I would then want to remove AZ's observations from the years 2013-2015 but keep all observations for GA and PA.然后，我想从 2013-2015 年删除 AZ 的观察结果，但保留 GA 和 PA 的所有观察结果。

I've thought about generating some kind of loop that identifies the rows in which the adoption variable equals 1, creating a new variable that would identify subsequent rows as ones that need to be deleted, and then filtering out those rows:我考虑过生成某种循环来标识采用变量等于 1 的行，创建一个新变量，将后续行标识为需要删除的行，然后过滤掉这些行：

df$delete <- 0 

for (row in c(1:nrow(df))) {
 if df$adopted[row] == 1 {
  df$delete[row+1] <- 1
}
}

df <- df %>% filter(delete == 0)

However, while I know how to call the next row (df$delete[row+1]), I need to know how to call each row that follows the observation in which adopted == 1 up to the last row for the state.但是，虽然我知道如何调用下一行 (df$delete[row+1])，但我需要知道如何调用观察结果之后的每一行，其中采用 == 1 直到 state 的最后一行。 Any ideas?有任何想法吗？ Happy to clarify if something is unclear.如果有不清楚的地方，很高兴澄清。

Answer 1

Try data.table package:尝试data.table package：

# convert to a data.table
panel <- data.table(panel) 
# get a year of adoption by state. In your case both min, and max works
panel[, year_adopted := min(year[adopted == 1]), by = .(state)] 
# filter out row where year < year of adopting policy or there is no adopting policy
panel[year <= year_adopted | is.na(year_adopted)]

Answer 2

I think it is much easier to tackle this without a loop.我认为在没有循环的情况下解决这个问题要容易得多。 Since this will probably be something you'll want to do to every state in the dataset, a function might be useful:由于这可能是您想要对数据集中的每个 state 执行的操作，因此 function 可能有用：

rm_after_adopted = function (panel, st) {
  year_adopted = with(panel, year[adopted == 1 & state == st])
  after_adopted = with(panel, {
    which(year > year_adopted & state == st)
  })
  return(panel[-after_adopted, ])
}

rm_after_adopted(panel, st = 'AZ')

R中达到一定值时删除组

问题描述

2 个解决方案

解决方案1
0 2022-09-27 17:38:31

解决方案2
0 2022-09-27 18:09:05

R中达到一定值时删除组

问题描述

2 个解决方案

解决方案1 0 2022-09-27 17:38:31

解决方案2 0 2022-09-27 18:09:05

解决方案1
0 2022-09-27 17:38:31

解决方案2
0 2022-09-27 18:09:05