简体   繁体   English

通过线性回归计算每天(组)中每个变量的斜率

[英]Calculate the slope from a linear regression for each variable for each day (group)

Example data: 示例数据:

The dataset has four columns: Time , Var1 , Var2 , Var3 . 数据集有四列: TimeVar1Var2Var3 The Time column grain is 1 minute, but the regression should be performed for each day. Time列的粒度为1分钟,但应每天进行回归。

Time <- format(seq(as.POSIXct("2018-02-01 23:12:00"), as.POSIXct("2018-02-25 08:32:00"), by="min"), tz = "EST")
df <- data.frame(Time, Var1=runif(length(Time)), Var2=runif(length(Time)), Var3=runif(length(Time)))

The question: 问题:

How to run linear regression for each variable for each day? 如何每天对每个变量进行线性回归? The output is the slope for Var1 , Var2 , and Var3 for each day. 输出是每天的Var1Var2Var3的斜率。

The close solution: 紧密的解决方案:

One close solution I can get is from this post . 我可以从这个帖子中得到一个接近的解决方案。 However, the ROC from the TTR package is not the "slope" based on the linear regression analysis. 但是,基于线性回归分析,TTR软件包中的ROC并非“斜率”。

Any ideas for this task -- calculate the slope for each variable for each day? 关于此任务的任何想法-计算每天每个变量的斜率?

My solution: 我的解决方案:

df$Time <- as.Date(df$Time) 
df$year <- format(df$Time,format="%Y") 
df$mth <- format(df$Time,format="%m") 
df$day <- format(df$Time,format="%d") 
aggregate( df$Var1 ~ year + mth + day , df , SLOPE_FUNCTION ) 
aggregate( df$Var2 ~ year + mth + day , df , SLOPE_FUNCTION ) 
aggregate( df$Var3 ~ year + mth + day , df , SLOPE_FUNCTION ) 

Can you also show me how to create the SLOPE_FUNCTION based on the lm to yield the slope result and how to apply the aggregate to each column (ie, Var1, Var2 and Var3) in one line code? 您还可以向我展示如何基于lm创建SLOPE_FUNCTION以产生斜率结果,以及如何在一个行代码中将聚合应用于每一列(即Var1,Var2和Var3)吗?

If you are simply for Time over Time changes, you could do: 如果你只是对Time超过Time的变化,你可以这样做:

library(tidyverse)
as_data_frame(df) %>%
  mutate_if(is.numeric, funs(. / lag(.)))

# # A tibble: 33,681 x 4
#    Time                  Var1   Var2    Var3
#    <fct>                <dbl>  <dbl>   <dbl>
#  1 2018-02-01 18:12:00 NA     NA     NA     
#  2 2018-02-01 18:13:00  1.06   1.17   0.433 
#  3 2018-02-01 18:14:00  0.551  0.647  2.41  
#  4 2018-02-01 18:15:00  3.12   1.34   0.134 
#  5 2018-02-01 18:16:00  1.43   0.344  6.43  
#  6 2018-02-01 18:17:00  0.189  0.790  0.823 
#  7 2018-02-01 18:18:00  0.355  3.39   1.51  
#  8 2018-02-01 18:19:00  3.62   0.604  1.17  
#  9 2018-02-01 18:20:00  0.950  0.505  0.0213
# 10 2018-02-01 18:21:00  3.86   2.34  19.5   
# # ... with 33,671 more rows

If you'd like percentage changes, you would add -1 to the funs() argument: 如果您希望更改百分比,则可以在funs()参数中添加-1

as_data_frame(df) %>%
  mutate_if(is.numeric, funs(. / lag(.) - 1))


For a lm by day, by variable, I would utilize purrr and broom : 对于lm按天,按可变的,我将利用purrrbroom

 library(tidyverse) library(lubridate) as_data_frame(df) %>% mutate(Time = ymd_hms(Time)) %>% mutate(day = floor_date(Time, unit = "day")) %>% gather(variable, value, -day, -Time) %>% nest(-day, -variable) %>% mutate(model = map(data, ~lm(as.numeric(Time) ~ value, data = .))) %>% unnest(model %>% map(broom::tidy)) # # A tibble: 150 x 7 # day variable term estimate std.error statistic p.value # <dttm> <chr> <chr> <dbl> <dbl> <dbl> <dbl> # 1 2018-02-01 00:00:00 Var1 (Intercept) 1517518845 618 2457337 0 # 2 2018-02-01 00:00:00 Var1 value 592 1091 0.543 0.588 # 3 2018-02-02 00:00:00 Var1 (Intercept) 1517571312 1337 1134724 0 # 4 2018-02-02 00:00:00 Var1 value 2902 2318 1.25 0.211 # 5 2018-02-03 00:00:00 Var1 (Intercept) 1517661220 1369 1108633 0 # 6 2018-02-03 00:00:00 Var1 value - 3981 2333 - 1.71 0.0881 # 7 2018-02-04 00:00:00 Var1 (Intercept) 1517744983 1318 1151672 0 # 8 2018-02-04 00:00:00 Var1 value 1170 2275 0.514 0.607 # 9 2018-02-05 00:00:00 Var1 (Intercept) 1517833026 1369 1109079 0 # 10 2018-02-05 00:00:00 Var1 value - 2027 2303 - 0.880 0.379 # # ... with 140 more rows 

If you would strictly like the slopes, you can add %>% filter(term == "value") to the pipeline. 如果严格希望斜率,可以将%>% filter(term == "value")到管道中。


Finally, you may prefer to visualize this data. 最后,您可能希望将这些数据可视化。 You can forgo the model building by using geom_smooth() with method = "lm" -- see below. 您可以通过将geom_smooth()method = "lm"结合使用来放弃模型构建-参见下文。 Note: I'm filtering to just a few days because the plot gets busy quickly. 注意:由于内容快速忙碌,我只过滤了几天。

 as_data_frame(df) %>% mutate(Time = ymd_hms(Time)) %>% mutate(day = floor_date(Time, unit = "day")) %>% filter(day <= ymd("2018-02-05")) %>% gather(variable, value, -day, -Time) %>% ggplot(., aes(x = Time, y = value, color = factor(day))) + geom_point(alpha = 0.1) + geom_smooth(method = "lm", se = FALSE) + facet_wrap(~ variable) 

R图

And alternatively, if you leverage interaction and group , you can plot things a bit differently depending on what you're after when it comes to interpretation: 另外,如果您利用interactiongroup ,则可以根据解释时所要遵循的内容来绘制一些不同的图:

 as_data_frame(df) %>% mutate(Time = ymd_hms(Time)) %>% mutate(day = floor_date(Time, unit = "day")) %>% filter(day <= ymd("2018-02-05")) %>% gather(variable, value, -day, -Time) %>% ggplot(., aes(x = Time, y = value, color = variable, group = interaction(variable, factor(day)))) + geom_point(alpha = 0.1) + geom_smooth(method = "lm", se = FALSE) 

另一个情节

You can do this with nlme::lmList once you organize the data properly. 正确组织数据后,可以使用nlme::lmList进行此操作。

library(tidyverse)
library(lubridate)
df2 <- df %>%
  ## reshape data to get Time repeated for each variable
  gather(var,value,-Time) %>%
  mutate(Time=ymd_hms(Time),   ## convert to date-time variable
         date=date(Time),      ## date info only
         timeval=Time-floor_date(Time,"day"),  ## time since beginning of day
         datevar=interaction(date,var))        ## date/var combo

Now you can fit all the date/var combos at once: 现在,您可以一次放入所有日期/变量组合:

nlme::lmList(value~timeval|datevar,df2)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM