简体   繁体   English

如何根据r中的if条件基于其他列创建新列

[英]How to create a new column based on other columns with if conditions in r

Not able to find a way to generate a new column based with if conditions for group of events in a column.无法找到基于列中事件组的 if 条件生成新列的方法。

The column called "BF" represent the (i-3) of the flow column, and is going to be the same BF for each "event" group.名为“BF”的列代表流列的 (i-3),并且对于每个“事件”组将是相同的 BF。 For example, in row 5, the value of "BF" is 39, which is the previous 3rd value of the flow column (flow for row 2) for all the "2" in the event column.例如,在第 5 行中,“BF”的值为 39,这是事件列中所有“2”的流列(第 2 行的流)的前 3 个值。 The problem is that BF[i] can't be bigger than flow[i].问题是BF[i] 不能大于flow[i]。 If BF[i] is bigger than flow[i], then the BF should be the (i-4) or (i-5) or (1-6)... of the flow until BF[i] will be equal or smaller than flow[i].如果 BF[i] 大于 flow[i],那么 BF 应该是流的 (i-4) 或 (i-5) 或 (1-6)... 直到 BF[i] 将相等或小于 flow[i]。 For example, in row 10 the value of the column "BF" is bigger than the value of the column "flow", therefore, the value of BF_1 (column I want to create) in row 10 is 37, which represent the closest lower value of flow, in this case the flow[i-6].例如,第10行“BF”列的值大于“flow”列的值,因此第10行BF_1(我要创建的列)的值为37,代表最接近的低flow 的值,在这种情况下是 flow[i-6]。

As an example, we have the following dataframe:例如,我们有以下数据框:

flow<- c(40, 39, 38, 37, 50, 49, 46, 44, 43, 45, 40, 30, 80, 75, 50, 55, 53, 51, 49, 100)
event<- c(1,1,1,1,2,2,2,2,2,3,3,3,4,4,4,5,5,5,5,6)
BF<- c(NA, NA, NA, NA, 39, 39, 39, 39, 39, 46, 46, 46, 45, 45, 45, 80, 80, 80, 80, 53)
a<- data.frame(flow, event, BF)

This is the desire output I'm looking for.这是我正在寻找的欲望输出。 I want to create the BF_1 column.我想创建 BF_1 列。

   flow event BF  BF_1
1    40   1   NA   NA
2    39   1   NA   NA
3    38   1   NA   NA
4    37   1   NA   NA
5    50   2   39   39
6    49   2   39   39
7    46   2   39   39
8    44   2   39   39
9    43   2   39   39
10   45   3   46   37
11   40   3   46   37
12   30   3   46   37
13   80   4   45   45
14   75   4   45   45
15   50   4   45   45
16   55   5   80   30
17   53   5   80   30
18   51   5   80   30
19   49   5   80   30
20  100   6   53   53

Is there a possible way to generate the column BF_1?有没有可能的方法来生成列 BF_1? please let me know any thoughts.请让我知道任何想法。 I am working with for loops and using if conditions but I am not able to hold the BF value for the entire group of the event column.我正在使用 for 循环并使用 if 条件,但我无法保存整个事件列组的 BF 值。

coding a bit inefficient, could have use dplyr etc.., but it will do the work and matching the BF_1 column given编码有点低效,可以使用 dplyr 等,但它会完成工作并匹配给定的BF_1

flow <- c(40, 39, 38, 37, 50, 49, 46, 44, 43, 45, 40, 30, 80, 75, 50, 55, 53, 51, 49, 100)
event <- c(1,1,1,1,2,2,2,2,2,3,3,3,4,4,4,5,5,5,5,6)
BF <- c(NA, NA, NA, NA, 39, 39, 39, 39, 39, 46, 46, 46, 45, 45, 45, 80, 80, 80, 80, 53)
a <- data.frame(flow, event, BF)

a$BF_1 <- NA #default to NA first

for(i in 1:length(unique(a$event))){

  if(is.na(a[a$event == i, "BF"][1])) next

  if(a[a$event == i, "BF"][1] < a[a$event == i, "flow"][1]) a[a$event == i, "BF_1"] <- a[a$event == i, "BF"][1]

  if(a[a$event == i, "BF"][1] > a[a$event == i, "flow"][1]) {
    head <- min(which(a$event==i))-6 
    if (min(head-6) < 0) head <- 1 #making sure it doesn't overflow to row 0
    a[a$event == i, "BF_1"] <- min( a[  head:min(which(a$event==i)), "flow"] ) #fill the min of the subset flow column given position
  }

}

a

One tidyverse possibility could be:一种tidyverse可能性可能是:

a %>%
 left_join(crossing(a, a) %>%
            filter(event > event1) %>%
            group_by(event) %>%
            filter(flow == first(flow)) %>%
            slice(1:(n() - 3)) %>%
            slice(which.max(cumsum(flow > flow1))) %>%
            ungroup() %>%
            transmute(event,
                      flow_flag = flow1), by = c("event" = "event")) %>%
 mutate(BF_1 = ifelse(lag(flow, 3) > flow, flow_flag, lag(flow, 3))) %>%
 group_by(event) %>%
 mutate(BF_1 = first(BF_1)) %>%
 select(-flow_flag)

    flow event    BF  BF_1
   <dbl> <dbl> <dbl> <dbl>
 1    40     1    NA    NA
 2    39     1    NA    NA
 3    38     1    NA    NA
 4    37     1    NA    NA
 5    50     2    39    39
 6    49     2    39    39
 7    46     2    39    39
 8    44     2    39    39
 9    43     2    39    39
10    45     3    46    37
11    40     3    46    37
12    30     3    46    37
13    80     4    45    45
14    75     4    45    45
15    50     4    45    45
16    55     5    80    30
17    53     5    80    30
18    51     5    80    30
19    49     5    80    30
20   100     6    53    53

It could be overcomplicated, but what it does is, first, creating all combinations of values (as the desired value can be theoretically anywhere in the data).它可能过于复杂,但它所做的是,首先,创建所有值的组合(因为所需的值理论上可以在数据中的任何位置)。 Second, it identifies the first case per group fulfilling the condition (not taking into account the previous 3rd value).其次,它确定每组满足条件的第一个案例(不考虑之前的第三个值)。 Finally, it combines it with the original df and if the 3rd previous value per group is fulfilling the condition, then returns it, otherwise returns the value first fulfilling condition to be smaller than the actual value.最后,它将它与原始 df 组合起来,如果每组的第 3 个前一个值满足条件,则返回它,否则返回第一个满足条件的值小于实际值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM