简体   繁体   English

在满足R中的第一个实例之后,在所有行中分配值

[英]Assign value in all rows following satisfaction of first instance in R

I've got a dataframe with a set of numeric observations (Value) per grouping variable (ID). 我有一个数据框,每个分组变量(ID)都有一组数字观察值(Value)。 I'm looking for an elegant way to do the following in a new column per grouping variable (ID): if Value is -40 or lower, assign a value of 1 starting from the row that follows the first instance of reaching -40 or lower. 我正在寻找一种优雅的方式做在每分组变量(ID)的新列如下:如果值是-40或更低,分配从下面达到-40或的第一个实例的行开始的1的值降低。 Every Value preceding the -40 or lower (and including that first instance of -40 or lower) should be assigned something other than a 1 (ie, assign 0). 值在-40或更低(包括-40或更低的第一个实例)之前的每个值都应分配为1以外的其他值(即分配0)。

Example data: 示例数据:

+----+-------+-------+
| ID | Order | Value |
+----+-------+-------+
|  1 |     1 |   -40 |
|  1 |     2 |    32 |
|  1 |     3 |   -59 |
|  1 |     4 |   -35 |
|  2 |     1 |    47 |
|  2 |     2 |    14 |
|  2 |     3 |     0 |
|  3 |     1 |    10 |
|  3 |     2 |    63 |
|  3 |     3 |   -32 |
|  3 |     4 |   -46 |
|  3 |     5 |   -27 |
|  3 |     6 |   -42 |
|  3 |     7 |    45 |
+----+-------+-------+

I am looking for something to accomplish this (below): 我正在寻找实现此目的的方法(如下):

+----+-------+-------+-------------+
| ID | Order | Value | After_Neg40 |
+----+-------+-------+-------------+
|  1 |     1 |    32 |           0 |
|  1 |     2 |   -40 |           0 |
|  1 |     3 |   -59 |           1 |
|  1 |     4 |   -35 |           1 |
|  2 |     1 |    47 |           0 |
|  2 |     2 |    14 |           0 |
|  2 |     3 |     0 |           0 |
|  3 |     1 |    10 |           0 |
|  3 |     2 |    63 |           0 |    
|  3 |     3 |   -32 |           0 |
|  3 |     4 |   -46 |           0 |
|  3 |     5 |   -27 |           1 |
|  3 |     6 |   -42 |           1 |
|  3 |     7 |    45 |           1 |
+----+-------+-------+-------------+

I tried searching for this type of problem on SO without much luck, but I also had a hard time knowing how to describe this type of problem (maybe it has already been answered, but my search terms may not have uncovered it). 我尝试在SO上搜索这种类型的问题没有太多运气,但是我也很难知道如何描述这种类型的问题(也许已经回答了,但是我的搜索词可能没有发现)。 If you have any elegant ways to solve this, I would appreciate your help. 如果您有任何优雅的方法可以解决此问题,我们将不胜感激。 Thanks! 谢谢!

Using data.table , assuming that the data is in a data frame df : 使用data.table ,假设数据在数据帧df

library(data.table)
setDT(df)[, After_Neg40:=ifelse(!is.na(the.row <- which(Value <= -40)[1]) & (1:.N) > the.row,1,0), by=ID][]
##    ID Order Value After_Neg40
## 1:  1     2    32           0
## 2:  1     1   -40           0
## 3:  1     3   -59           1
## 4:  1     4   -35           1
## 5:  2     1    47           0
## 6:  2     2    14           0
## 7:  2     3     0           0
## 8:  3     1    10           0
## 9:  3     2    63           0
##10:  3     3   -32           0
##11:  3     4   -46           0
##12:  3     5   -27           1
##13:  3     6   -42           1
##14:  3     7    45           1

The logic is: 逻辑是:

  1. Find the first row for which Value is less than or equal to 40 using which(Value <= -40)[1] . 找到的第一行which Value小于或等于40使用which(Value <= -40)[1] Set this to the.row . 将此设置为the.row
  2. If there are no rows for which this condition is true for a group by ID , then the.row will return NA , so we check that with is.na . 如果没有按ID分组的条件满足该条件的行,则the.row将返回NA ,因此我们使用is.na检查。
  3. So, if there is such a the.row (ie, is.na returns FALSE ) and for those rows that are greater than the.row , set the value to 1 , else 0 . 因此,如果存在这样的the.row (即is.na返回FALSE ),并且对于大于the.row那些行,请将其值设置为1 ,否则设置为0 Do this with an ifelse . 做到这ifelse

The result matches your posted desired output, but uses the following data, which switches the first two rows of your posted input data: 结果与您发布的所需输出匹配,但是使用以下数据,这将切换发布的输入数据的前两行:

df <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L), Order = c(2L, 1L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 
3L, 4L, 5L, 6L, 7L), Value = c(32L, -40L, -59L, -35L, 47L, 14L, 
0L, 10L, 63L, -32L, -46L, -27L, -42L, 45L)), .Names = c("ID", 
"Order", "Value"), class = "data.frame", row.names = c(NA, -14L
))
##   ID Order Value
##1   1     2    32
##2   1     1   -40
##3   1     3   -59
##4   1     4   -35
##5   2     1    47
##6   2     2    14
##7   2     3     0
##8   3     1    10
##9   3     2    63
##10  3     3   -32
##11  3     4   -46
##12  3     5   -27
##13  3     6   -42
##14  3     7    45

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM