简体   繁体   English

R-dplyr滞后函数

[英]R - dplyr lag function

I am trying to calculate the absolute difference between lagged values over several columns. 我正在尝试计算几列的滞后值之间的绝对差。 The first row of the resulting data set is NA, which is correct because there is no previous value to calculate the lag. 结果数据集的第一行是NA,这是正确的,因为没有先前的值可以计算滞后。 What I don't understand is why the lag isn't calculated for the last value. 我不明白的是为什么不为最后一个值计算滞后。 Note that the last value in the example below (temp) is the lag between the 2nd to last and the 3rd to last values, the lag value between the last and 2nd to last value is missing. 请注意,以下示例中的最后一个值(温度)是倒数第二个和倒数第三个之间的滞后,丢失了倒数第二个和倒数第二个之间的滞后值。

library(tidyverse)
library(purrr)
dim(mtcars) # 32 rows
temp <- map_df(mtcars, ~ abs(diff(lag(.x))))
names(temp) <- paste(names(temp), '.abs.diff.lag', sep= '') 
dim(temp) # 31 rows

It would be an awesome bonus if someone could show me how to pipe the renaming step, I played around with paste and enquo. 如果有人可以向我展示如何传递重命名步骤,那将是一个了不起的奖励,我在粘贴和进入时就玩了。 The real dataset is too long to do a gather/newcolumnname/spread approach. 实际数据集太长,无法执行收集/新列名/扩展方法。

Thanks in advance! 提前致谢!

EDIT: libraries need to run the script added 编辑:库需要运行添加的脚本

I think the lag call in your existing code is unnecessary as diff calculates the lagged difference automatically (although perhaps I don't understand properly what you are trying to do). 我认为您现有代码中的lag调用是不必要的,因为diff自动计算出滞后的差异(尽管也许我不太了解您要做什么)。 You can also use rename_all to add a suffix to all the variable names. 您也可以使用rename_all在所有变量名后添加后缀。

library(purrr)
library(dplyr)
mtcars %>%
  map_df(~ abs(diff(.x))) %>%
  rename_all(funs(paste0(., ".abs.diff.lag")))
#> # A tibble: 31 x 11
#>    mpg.abs.diff.lag cyl.abs.diff.lag disp.abs.diff.lag hp.abs.diff.lag
#>               <dbl>            <dbl>             <dbl>           <dbl>
#>  1              0.0                0               0.0               0
#>  2              1.8                2              52.0              17
#>  3              1.4                2             150.0              17
#>  4              2.7                2             102.0              65
#>  5              0.6                2             135.0              70
#>  6              3.8                2             135.0             140
#>  7             10.1                4             213.3             183
#>  8              1.6                0               5.9              33
#>  9              3.6                2              26.8              28
#> 10              1.4                0               0.0               0
#> # ... with 21 more rows, and 7 more variables: drat.abs.diff.lag <dbl>,
#> #   wt.abs.diff.lag <dbl>, qsec.abs.diff.lag <dbl>, vs.abs.diff.lag <dbl>,
#> #   am.abs.diff.lag <dbl>, gear.abs.diff.lag <dbl>,
#> #   carb.abs.diff.lag <dbl>

Maybe something like this: 也许是这样的:

dataCars <- mtcars%>%mutate(diffMPG = abs(mpg - lag(mpg)), 
                        diffHP = abs(hp - lag(hp)))

And then do this for all the columns you are interested in 然后对您感兴趣的所有列进行此操作

I was not able to reproduce your issues regarding the lag function. 我无法重现您关于滞后功能的问题。 When I am executing your sample code, I retrieve a data frame consisting of 31 row, exactly as you mentioned, but the first row is not NA , it is already the subtraction of the 1st and 2nd row. 在执行示例代码时,正如您提到的那样,我检索了一个由31行组成的数据帧, 但是第一行不是NA ,它已经是第一行和第二行的减法。

Regarding your bonus question, the answer is provided here : 关于您的奖金问题,请在此处提供答案:

temp <- map_df(mtcars, ~ abs(diff(lag(.x)))) %>% setNames(paste0(names(.), '.abs.diff.lag'))

This should result in the desired column naming. 这将导致所需的列命名。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM