[英]Assign value in all rows following satisfaction of first instance in R
I've got a dataframe with a set of numeric observations (Value) per grouping variable (ID). 我有一个数据框,每个分组变量(ID)都有一组数字观察值(Value)。 I'm looking for an elegant way to do the following in a new column per grouping variable (ID): if Value is -40 or lower, assign a value of 1 starting from the row that follows the first instance of reaching -40 or lower. 我正在寻找一种优雅的方式做在每分组变量(ID)的新列如下:如果值是-40或更低,分配从下面达到-40或的第一个实例的行开始的1的值降低。 Every Value preceding the -40 or lower (and including that first instance of -40 or lower) should be assigned something other than a 1 (ie, assign 0). 值在-40或更低(包括-40或更低的第一个实例)之前的每个值都应分配为1以外的其他值(即分配0)。
Example data: 示例数据:
+----+-------+-------+
| ID | Order | Value |
+----+-------+-------+
| 1 | 1 | -40 |
| 1 | 2 | 32 |
| 1 | 3 | -59 |
| 1 | 4 | -35 |
| 2 | 1 | 47 |
| 2 | 2 | 14 |
| 2 | 3 | 0 |
| 3 | 1 | 10 |
| 3 | 2 | 63 |
| 3 | 3 | -32 |
| 3 | 4 | -46 |
| 3 | 5 | -27 |
| 3 | 6 | -42 |
| 3 | 7 | 45 |
+----+-------+-------+
I am looking for something to accomplish this (below): 我正在寻找实现此目的的方法(如下):
+----+-------+-------+-------------+
| ID | Order | Value | After_Neg40 |
+----+-------+-------+-------------+
| 1 | 1 | 32 | 0 |
| 1 | 2 | -40 | 0 |
| 1 | 3 | -59 | 1 |
| 1 | 4 | -35 | 1 |
| 2 | 1 | 47 | 0 |
| 2 | 2 | 14 | 0 |
| 2 | 3 | 0 | 0 |
| 3 | 1 | 10 | 0 |
| 3 | 2 | 63 | 0 |
| 3 | 3 | -32 | 0 |
| 3 | 4 | -46 | 0 |
| 3 | 5 | -27 | 1 |
| 3 | 6 | -42 | 1 |
| 3 | 7 | 45 | 1 |
+----+-------+-------+-------------+
I tried searching for this type of problem on SO without much luck, but I also had a hard time knowing how to describe this type of problem (maybe it has already been answered, but my search terms may not have uncovered it). 我尝试在SO上搜索这种类型的问题没有太多运气,但是我也很难知道如何描述这种类型的问题(也许已经回答了,但是我的搜索词可能没有发现)。 If you have any elegant ways to solve this, I would appreciate your help. 如果您有任何优雅的方法可以解决此问题,我们将不胜感激。 Thanks! 谢谢!
Using data.table
, assuming that the data is in a data frame df
: 使用data.table
,假设数据在数据帧df
:
library(data.table)
setDT(df)[, After_Neg40:=ifelse(!is.na(the.row <- which(Value <= -40)[1]) & (1:.N) > the.row,1,0), by=ID][]
## ID Order Value After_Neg40
## 1: 1 2 32 0
## 2: 1 1 -40 0
## 3: 1 3 -59 1
## 4: 1 4 -35 1
## 5: 2 1 47 0
## 6: 2 2 14 0
## 7: 2 3 0 0
## 8: 3 1 10 0
## 9: 3 2 63 0
##10: 3 3 -32 0
##11: 3 4 -46 0
##12: 3 5 -27 1
##13: 3 6 -42 1
##14: 3 7 45 1
The logic is: 逻辑是:
which
Value
is less than or equal to 40
using which(Value <= -40)[1]
. 找到的第一行which
Value
小于或等于40
使用which(Value <= -40)[1]
Set this to the.row
. 将此设置为the.row
。 ID
, then the.row
will return NA
, so we check that with is.na
. 如果没有按ID
分组的条件满足该条件的行,则the.row
将返回NA
,因此我们使用is.na
检查。 the.row
(ie, is.na
returns FALSE
) and for those rows that are greater than the.row
, set the value to 1
, else 0
. 因此,如果存在这样的the.row
(即is.na
返回FALSE
),并且对于大于the.row
那些行,请将其值设置为1
,否则设置为0
。 Do this with an ifelse
. 做到这ifelse
。 The result matches your posted desired output, but uses the following data, which switches the first two rows of your posted input data: 结果与您发布的所需输出匹配,但是使用以下数据,这将切换发布的输入数据的前两行:
df <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 3L, 3L), Order = c(2L, 1L, 3L, 4L, 1L, 2L, 3L, 1L, 2L,
3L, 4L, 5L, 6L, 7L), Value = c(32L, -40L, -59L, -35L, 47L, 14L,
0L, 10L, 63L, -32L, -46L, -27L, -42L, 45L)), .Names = c("ID",
"Order", "Value"), class = "data.frame", row.names = c(NA, -14L
))
## ID Order Value
##1 1 2 32
##2 1 1 -40
##3 1 3 -59
##4 1 4 -35
##5 2 1 47
##6 2 2 14
##7 2 3 0
##8 3 1 10
##9 3 2 63
##10 3 3 -32
##11 3 4 -46
##12 3 5 -27
##13 3 6 -42
##14 3 7 45
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.