简体   繁体   English

异常测试 - 线性回归与 t 或不与 t? 理解设置的问题

[英]Anomaly Testing - Linear Regression with t or not with t? Problems to understand the setup

If you want to check an anomaly in stock data many studies use a linear regression.如果您想检查库存数据中的异常,许多研究使用线性回归。 Let's say you want to check if there is a Monday effect, meaning that monday is significantly worse than other days.假设您想检查是否存在星期一效应,这意味着星期一明显比其他日子糟糕。 I understood that we can use a regression like: return = a + b DummyMon + ea is the constant, b the regression coefficient, we have the Dummy for Monday and the error term e.我知道我们可以使用这样的回归:return = a + b DummyMon + ea 是常数,b 是回归系数,我们有星期一的 Dummy 和误差项 e。 That's what I used in python: First you add a constant to the anomaly:这就是我在 python 中使用的:首先,您向异常添加一个常量:

anomaly = sm.add_constant(anomaly)

Then you build the model:然后构建模型:

model = sm.OLS(return, anomaly)

The you fit the model:你适合的模型:

results = model.fit()
  1. I wonder if this is the correct model setup.我想知道这是否是正确的模型设置。
  2. In this case a plot of the linear regression would just show two vertical areas above 0 (for no Monday) and 1 (for Monday) with all the returns.在这种情况下,线性回归图将仅显示 0(无星期一)和 1(星期一)以上的两个垂直区域,以及所有回报。 It looks pretty strange.它看起来很奇怪。 Is this correct?这样对吗?
  3. Should I somehow try to use the time (t) in the regression?我应该以某种方式尝试在回归中使用时间 (t) 吗? If so, how can I do it with python?如果是这样,我怎么能用 python 做到这一点? I thought about giving each date an increasing number, but then I wondered how to treat weekends.我想给每个日期增加一个数字,但后来我想知道如何对待周末。
  4. I would assume that with many data points both approaches are similar, if the time series is stationary, right?如果时间序列是平稳的,我会假设有很多数据点两种方法是相似的,对吗? In the end I do a cross section anaylsis and don't care about the aspect of the time series in this case, correct?最后我做了一个横截面分析,在这种情况下不关心时间序列的方面,对吗? ( I heard about GARCH models etc, where this is a different) (我听说过 GARCH 模型等,这是一个不同的地方)

Well, I am just learning and hope someone could give me some ideas about the topic.好吧,我只是在学习,希望有人能给我一些关于这个话题的想法。 Thank you very much in advance.非常感谢您提前。

For time series analysis tasks (such as forecasting or anomaly detection), you may need a more advanced model, such as Recurrent Neural Networks (RNN) in deep learning.对于时间序列分析任务(例如预测或异常检测),您可能需要更高级的模型,例如深度学习中的循环神经网络 (RNN)。 You can assign any time step to an RNN Cell, in your case, every RNN Cell can represent a day or maybe an hour or half a day etc.您可以为 RNN 单元分配任何时间步长,在您的情况下,每个 RNN 单元可以代表一天、一个小时或半天等。

The main purpose of the RNNs is to make the model understand the time dependencies in the data. RNN 的主要目的是让模型理解数据中的时间依赖性。 For example, if monday has a bad affect, then corresponding RNN Cells will have reasonable parameters.例如,如果星期一有不好的影响,那么对应的 RNN Cells 就会有合理的参数。 I would recommend you to do some further research about it.我建议你对它做一些进一步的研究。 Here there are some documentations that may help:这里有一些可能有帮助的文档:

https://colah.github.io/posts/2015-08-Understanding-LSTMs/ (Also includes different types of RNN) https://colah.github.io/posts/2015-08-Understanding-LSTMs/ (也包括不同类型的RNN)

https://towardsdatascience.com/understanding-rnn-and-lstm-f7cdf6dfc14e https://towardsdatascience.com/understanding-rnn-and-lstm-f7cdf6dfc14e

And you can use tensorflow, keras or PyTorch libraries.您可以使用 tensorflow、keras 或 PyTorch 库。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM