简体   繁体   English

根据下一行中另一列中的值从一列中提取值,然后找到持续时间直到再次达到该值

[英]Extract value from one column based on the value in another in the row below, then find duration until value reached again

I have a dataframe where I need to calculate the duration of a flooding event.我有一个数据框,我需要在其中计算洪水事件的持续时间。

In order to do this, I need the 'From' Datetime:为了做到这一点,我需要“从”日期时间:

Datetime of the row above when 'Flooded == TRUE' 'Flooded == TRUE' 时上一行的日期时间

and the 'To' Datetime:和“到”日期时间:

Datetime of when the Temp > or = (Temp of the row above when 'Flooded == TRUE') Temp > 或 = 时的日期时间(当 'Flooded == TRUE' 时上面行的 Temp)

   > head(FloodingDuration)
    # A tibble: 6 x 8
    # Groups:   NestID, Nest, Year [1]
      Beach  Nest  Year Datetime             Temp NestID   TempDrop Flooded
      <chr> <dbl> <dbl> <dttm>              <dbl> <fct>       <dbl> <lgl>  
    1 LB        1  2014 2014-01-12 09:00:00  27.2 LB1_2014  0       FALSE  
    2 LB        1  2014 2014-01-12 10:00:00  27.2 LB1_2014 -0.0110  FALSE  
    3 LB        1  2014 2014-01-12 11:00:00  27.2 LB1_2014 -0.0190  FALSE  
    4 LB        1  2014 2014-01-12 12:00:00  27.2 LB1_2014 -0.00300 FALSE  
    5 LB        1  2014 2014-01-12 13:00:00  27.2 LB1_2014 -0.0290  FALSE  
    6 LB        1  2014 2014-01-12 14:00:00  27.1 LB1_2014 -0.00400 FALSE

I have some of the code, see below, I need the 'From' and 'To' to work.我有一些代码,见下文,我需要“From”和“To”才能工作。

FloodingDuration = group_by (TempData, NestID, Nest, Year) %>%
      filter(minute(Datetime) == 0) %>%
      mutate(TempDrop = Temp - lag(Temp, n=1, default = first(Temp))) %>%
      mutate(Flooded = TempDrop < -0.45) %>%
      group_by(NestID) %>%

      mutate(From = Datetime of Temp at (row above "Flooded == TRUE")) %>%
      mutate(To = Datetime of Temp >= Temp at (row above "Flooded == TRUE"))

      mutate(Duration = as.numeric(difftime(From, To, unit = "days" ))) %>% 
      mutate(MaxDuration = max(Duration) %>% 
      distinct(NestID, MaxDuration)

For example:例如:

Row 8083 where Flooded==TRUE第 8083 行,其中 Flooded == TRUE

From = Datetime from row 8082从 = 日期时间从第 8082 行

To = row not seen, Datetime where Temp >= 28.920 To = 未看到行,日期时间,其中 Temp >= 28.920

FYI:供参考:

There are 112 different NestIDs, each with around 1200 rows of data.有 112 个不同的 NestID,每个都有大约 1200 行数据。

24 of the NestIDs will have at least 1 Flooded==TRUE, sometimes consecutively, sometimes after the temperature has recovered from the first event. 24 个 NestID 将至少有 1 个 Flooded==TRUE,有时是连续的,有时是在温度从第一个事件中恢复之后。

After each Flooding event, the temperature will gradually rise again.每次洪水事件后,温度会再次逐渐升高。

If it is only possible to find the Duration for the first Flooded==TRUE event for each Flooded NestID, this would still be great.如果只能找到每个 Flooded NestID 的第一个 Flooded==TRUE 事件的 Duration,这仍然很好。

在此处输入图片说明

@Bex .. to get you started, the following might help. @Bex .. 为了让您入门,以下内容可能会有所帮助。
As a rule of thumb, when you have multiple conditions, checks, etc. this may not always be able to perform in a "vectorised" manner, ie with a (single) pipe.根据经验,当您有多个条件、检查等时,这可能并不总是能够以“矢量化”方式执行,即使用(单个)管道。

I take you step by step through it.我带你一步一步地完成它。 You trim the code to make it a bit more elegant.您可以修剪代码以使其更优雅一些。

(I) reproducible data example and first steps (一)可重现数据示例和第一步
To allow others to chip in and help working out an elegant solution ... tadaaa ... a reproducible data snapshot .允许其他人参与并帮助制定一个优雅的解决方案...... tadaaa......一个可重现的数据快照 I adapted the Temp column to get the conditions you are referring to:我调整了Temp列以获得您所指的条件:

df <- data.frame(
    Datetime = seq(from = ymd_hms("2014-04-02 05:00:00"), to = ymd_hms("2014-04-02 15:00:00"), by = "1 hour")
    ,Temp = c(29.083, 29.088, 29.091, 28.920, 27.934, 27.359, 27.491, 27.99, 28.92, 28.93, 28.28)
    ,Flooded = c(rep(FALSE, 4), rep(TRUE, 2), rep(FALSE, 5)))

If you are unsure about how your algorithm develops, I recommend to code it step by step introducing interim results in new columns - this is where dplyr and friends prevail!如果你不确定你的算法是如何发展的,我建议你一步一步地编码,在新列中引入临时结果——这是dplyr和朋友们占上风的地方!

To get the Datetime for one of your instances, just use the lag() construction you applied earlier in your pipe.要获取您的一个实例的Datetime ,只需使用您之前在管道中应用的lag()构造。
Let's do the same for the Temp, as you want to check on this condition later.让我们对 Temp 执行相同的操作,因为您稍后要检查此条件。
We want to check for when conditions hold.我们想检查条件何时成立。 That is a classical "if-then".这是一个经典的“如果-那么”。 Using if_else() requires type consistency.使用if_else()需要类型一致性。 Thus, I wrap the NA in the respective datatype.因此,我将 NA 包装在相应的数据类型中。

df <- df %>% 
  mutate(From = if_else(Flooded == TRUE, lag(Datetime, default = first(Datetime)),as.POSIXct(NA))
        ,Temp_before = if_else(Flooded == TRUE, lag(Temp, default = first(Temp)),as.double(NA)))

You now have:您现在拥有:

              Datetime   Temp Flooded                From Temp_before
1  2014-04-02 05:00:00 29.083   FALSE                <NA>          NA
2  2014-04-02 06:00:00 29.088   FALSE                <NA>          NA
3  2014-04-02 07:00:00 29.091   FALSE                <NA>          NA
4  2014-04-02 08:00:00 28.920   FALSE                <NA>          NA
5  2014-04-02 09:00:00 27.934    TRUE 2014-04-02 08:00:00      28.920
6  2014-04-02 10:00:00 27.359    TRUE 2014-04-02 09:00:00      27.934
7  2014-04-02 11:00:00 27.491   FALSE                <NA>          NA
8  2014-04-02 12:00:00 27.990   FALSE                <NA>          NA
9  2014-04-02 13:00:00 28.920   FALSE                <NA>          NA
10 2014-04-02 14:00:00 28.930   FALSE                <NA>          NA
11 2014-04-02 15:00:00 28.280   FALSE                <NA>          NA

(II) check for the next temp condition (II) 检查下一个温度条件

For each Flooded == TRUE we would need to check the remainder of the dataframe.对于每个Flooded == TRUE我们需要检查数据帧的其余部分。 This is better done in a separate function that you call for each instance of Flooded == TRUE .这最好在您为Flooded == TRUE每个实例调用的单独函数中完成。
The idea is to take the dataframe, trim it to the start datetime, and then check when the next time the temp condition is met.这个想法是获取数据帧,将其修剪到开始日期时间,然后检查下一次满足临时条件的时间。

check_next_temp_condition <- function(.start_date, .df = df){
    check_df <- .df %>% 
        filter(Datetime >= .start_date) %>%             # trim to start position
        mutate(Temp_cond = Temp >= first(Temp_before))  # check temp condition
}
#-------- let's quickly test what we have
start <- lubridate::ymd_hms("2014-04-02 09:00:00")
(   check_next_temp_condition(start)   )      # brackets around call will print result

This gives这给


             Datetime   Temp Flooded                From Temp_before Temp_cond
1 2014-04-02 09:00:00 27.934    TRUE 2014-04-02 08:00:00      28.920     FALSE
2 2014-04-02 10:00:00 27.359    TRUE 2014-04-02 09:00:00      27.934     FALSE
3 2014-04-02 11:00:00 27.491   FALSE                <NA>          NA     FALSE
4 2014-04-02 12:00:00 27.990   FALSE                <NA>          NA     FALSE
5 2014-04-02 13:00:00 28.920   FALSE                <NA>          NA      TRUE
6 2014-04-02 14:00:00 28.930   FALSE                <NA>          NA      TRUE
7 2014-04-02 15:00:00 28.280   FALSE                <NA>          NA     FALSE

What follows is standard.下面是标准的。 We subset for the first time the condition is met and extract the timestamp.我们第一次满足条件并提取时间戳。

check_next_temp_condition <- function(.start_date, .df = df){
    check_df <- .df %>% filter(Datetime >= .start_date) %>%
        mutate(Temp_cond = Temp >= first(Temp_before)) %>%
        filter(Temp_cond == TRUE)   # filter for condition
    #---------- extract Datetime
    if(nrow(check_df) >= 1){ To_date = check_df$Datetime[1]   # check for result
    }else{                   To_date = as.POSIXct(NA)}        # ensure if no result
}

# Test again, if you like
(   check_next_temp_condition(start)   ) 

(III) running the check function over the dataframe (III) 在数据帧上运行检查功能

The final stretch is to apply the function based on the condition to the rows where the condition is met.最后一步是将基于条件的函数应用于满足条件的行。 ((This is where the vectorised approach for the pipe would break)). ((这是管道的矢量化方法会中断的地方))。
dplyr offers you a function rowwise() that allows to iterate over each row. dplyr为您提供了一个函数rowwise()允许迭代每一行。 You need to remove the rowwise grouping before walking further in a pipe, etc. Alternatively, you can use other loop approaches.您需要先移除按行分组,然后再在管道中走得更远,等等。或者,您可以使用其他循环方法。

Below, I just iterate over all rows, eg also rows for which the conditions does not apply.下面,我只是遍历所有行,例如,条件不适用的行。 To save processing time, you may want to make this conditional again.为了节省处理时间,您可能希望再次将其设为条件。

df <- df %>% 
    rowwise() %>% 
    mutate(To = check_next_temp_condition(Datetime)) %>% 
    ungroup()     # undo the rowwise()!

You now have a dataframe that allows you to wrap up your other calculations.您现在拥有一个数据框,可让您完成其他计算。

df
# A tibble: 11 x 6
   Datetime             Temp Flooded From                Temp_before To                 
   <dttm>              <dbl> <lgl>   <dttm>                    <dbl> <dttm>             
 1 2014-04-02 05:00:00  29.1 FALSE   NA                         NA   NA                 
 2 2014-04-02 06:00:00  29.1 FALSE   NA                         NA   NA                 
 3 2014-04-02 07:00:00  29.1 FALSE   NA                         NA   NA                 
 4 2014-04-02 08:00:00  28.9 FALSE   NA                         NA   NA                 
 5 2014-04-02 09:00:00  27.9 TRUE    2014-04-02 08:00:00        28.9 2014-04-02 13:00:00
 6 2014-04-02 10:00:00  27.4 TRUE    2014-04-02 09:00:00        27.9 2014-04-02 12:00:00
 7 2014-04-02 11:00:00  27.5 FALSE   NA                         NA   NA                 
 8 2014-04-02 12:00:00  28.0 FALSE   NA                         NA   NA                 
 9 2014-04-02 13:00:00  28.9 FALSE   NA                         NA   NA                 
10 2014-04-02 14:00:00  28.9 FALSE   NA                         NA   NA                 
11 2014-04-02 15:00:00  28.3 FALSE   NA                         NA   NA       

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 用另一列的值填充整个新列,直到到达另一列的值的行 - Fill whole new column with value of another column until the row of the value of another column is reached 根据另一列中的第一个实例值提取一个列值 - Extract one column value based on the first instance value in another column R基于没有ifelse的另一列中的值查找行值 - R find row value based on value in another column without ifelse 从特定列中查找最大行,并从另一列中提取列名和相应的行值 - Find max of rows from specific columns and extract column name and corresponding row value from another column 重复行,直到看到或达到特定值 - repeating row until specific value is seen or reached 根据另一列中的过去值在一列中标记一行 - Flag a row in one column based on past value in another column 在一列中查找组的最大值,然后引用该行中的另一列值 - Find the max of a group in one column then reference another column value in that row 从基于另一列的数据框列中提取值? - Extract value from data frame column based on another column? 对于列中 = 1 的每个值,将其下方第 3 行的值提取到新的数据框 (R) - For every value within a column that = 1, extract the the value from the 3rd row below it to a new dataframe (R) 如何根据 R 中另一列的值从一列中减去一个值? - How to subtract a value from one column based on the value of another in R?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM