简体   繁体   English

使用Java中的Apache Maths对时间序列进行简单回归

[英]Simple Regression of Time Series with Apache Maths in Java

I have a question concerning the start of the date unit when doing a simple regression of a time series. 在对时间序列进行简单回归时,我有一个关于日期单位开始的问题。 Here is my code when starting the date unit the regression at t=0 and t=1. 这是我的代码,当开始日期单位时,在t = 0和t = 1处进行回归。

package main;

import java.util.ArrayList;
import java.util.Arrays;

import org.apache.commons.math3.stat.regression.SimpleRegression;

public class RegressionTest {

    public static void main(String[] args) {

        SimpleRegression simpleRegression = new SimpleRegression();

        ArrayList<Double> timeSeries = new ArrayList<Double>(Arrays.asList(3.0,
                5.0, 1.0, 7.0, 9.0, 2.0, 1.0, 8.0, 11.0));

        for(int i = 0; i < timeSeries.size(); i++) {
            simpleRegression.addData(i, timeSeries.get(i));
        }

        System.out.println("Start date unit at t = 0:");
        System.out.println("Intercept: " + simpleRegression.getIntercept());
        System.out.println("Slope    : " + simpleRegression.getSlope());


        simpleRegression = new SimpleRegression();

        for(int i = 0; i < timeSeries.size(); i++) {
            simpleRegression.addData((i+1), timeSeries.get(i));
        }

        System.out.println("\nStart date unit at t = 1:");
        System.out.println("Intercept: " + simpleRegression.getIntercept());
        System.out.println("Slope    : " + simpleRegression.getSlope());

    }



}

The output I get is: 我得到的输出是:

Start date unit at t = 0:
Intercept: 2.8222222222222224
Slope    : 0.6

Start date unit at t = 1:
Intercept: 2.2222222222222223
Slope    : 0.6

You see, the intercept is different. 您会看到,截距是不同的。 So my question is: What is the correct start unit when no date is specified for the time series? 所以我的问题是:当没有为时间序列指定日期时,正确的开始单位是什么?

Thanks for your answer. 感谢您的回答。

You just moved your line one unit to the right (you just changed x for the first point from 0 to 1) so offcourse you intercept is different and the slope are the same (plot it if you don't see it). 您只是将线向右移动了一个单位(您刚刚将第一个点的x从0更改为1),所以您所截取的偏离路线是不同的,并且斜率是相同的(如果看不到它,则进行绘图)。

Time series as the name sussgest is series of data for given times, so it must have a time (the x, or first parameter of addData) and the function value for that time (the y, or the second parameter of addData). 名称为sussgest的时间序列是给定时间的数据序列,因此它必须具有一个时间(x,即addData的第一个参数)和该时间的函数值(y,即addData的第二个参数)。

You should know what the times are for your data, so if they start at 0, 1, or maybe 1345454. You must provide pair of values (x,y) for the regression. 您应该知道数据的时间,所以如果它们从0、1或1345454开始。您必须为回归提供一对值(x,y)。

I don't think regression is the right thing here. 我认为回归并不是正确的选择。 A statistician would say that regression applies when observations are independent. 统计学家会说,当观察是独立的时,回归适用。 That's not the case with a time series: there's clearly a notion of order in time that breaks the "independent" assumption. 时间序列不是这种情况:时间顺序概念显然打破了“独立”的假设。

I wonder if a better idea would be a discrete Fourier transform. 我想知道是否有更好的主意是离散傅立叶变换。 Examining the frequency content of the signal would be more meaningful. 检查信号的频率内容将更有意义。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM