简体   繁体   English

线性趋势的可能性

[英]probability of linear trend

I have got a small amount of sample ([10 16 11 16 26 17 16 16 15 13 15 14 12 12 14 20 14 12 16 21 13 13 14 16 17 18 16 14 16 23 24 12 13 13 15 16 15 14 14 16 20 17 17 15 23 18 12 19 12 11 19 17 14 18 15 23 30 24 16 14 22 17 17 17 17 20 19 27 17 36] ): 我有少量样本([10 16 11 16 26 17 16 16 15 13 15 14 12 12 14 20 14 12 16 21 13 13 14 16 17 18 16 14 16 23 24 12 13 13 15 16 15 14 14 16 20 17 17 15 23 18 12 19 12 11 19 17 14 18 15 23 30 24 16 14 22 17 17 17 17 20 19 27 17 36]):

There are two models: 有两种模型:

  • Model A – there is not linear trend, so the center of the noise histogram is the mean of the data. 模型A –没有线性趋势,因此噪声直方图的中心是数据的平均值。
  • Model B – there is linear trend, so the center of the noise histogram is the distance from a fitted linear trendline. 模型B –存在线性趋势,因此噪声直方图的中心是与拟合的线性趋势线的距离。

Obviously, I can choice the model with smaller sigma^2 to choose the better model. 显然,我可以选择sigma ^ 2较小的模型来选择更好的模型。 Which is apparently the (B). 显然是(B)。 However, I am not confident there is really have a trend in the data, and not just the noise randomly happened like this. 但是,我不确定数据中确实存在趋势,不仅是随机产生的噪声。 So, I made a Dickey-Fuller test on both model, and both under the 1% limit ('1%': -3.529, A: -5.282, B: -6.149 ) . 因此,我对两个模型都进行了Dickey-Fuller测试,并且都在1%的限制下(“ 1%”:-3.529,A:-5.282,B:-6.149)。 Which telling me it is possible the (A) is the right model. 哪个告诉我(A)可能是正确的模型。

So I come to the question: What is the probability of (A) is the better model? 所以我问一个问题:(A)是更好模型的概率是多少?

I tried to solve this like: I assume the noise is normally distributed, so I fit the best normal distribution on the sigma separately on (A) and (B). 我试图解决这个问题:我假设噪声是正态分布的,所以我分别在(A)和(B)上将最佳正态分布放在sigma上。 So, I got two models for the noise. 因此,我得到了两种噪音模型。 After this, I have taken n (the original sample length) sample from these two models and I compared they sigma^2. 之后,我从这两个模型中获取了n个(原始样本长度)样本,并比较了它们的sigma ^ 2。 If (A) model sigma^2 was smaller I increased the possibility the model (A) is better, if not decreased. 如果(A)模型sigma ^ 2较小,则我提高模型(A)更好(如果不减少)的可能性。 I repeated this test a reasonable amount of time. 我在合理的时间内重复了此测试。

In Python code, probably more clear: 在Python代码中,可能更清楚:

model_b_mu, model_b_sigma = stats.norm.fit(model_b['residual'])
model_a_mu, model_a_sigma = stats.norm.fit(model_a['residual'])

def compare_models(modela_mu, modela_sigma,  modelb_mu, modelb_sigma, length):
    repate = 20000

    modela_better = 0
    for i in range(repate):
        modela = np.random.normal(modela_mu, modela_sigma, size = length )
        modelb = np.random.normal(modelb_mu, modelb_sigma, size = length )

        # test which sigma^2 is smaller
        sigma_a = np.sum(np.sqrt(np.power(modela, 2)))
        sigma_b = np.sum(np.sqrt(np.power(modelb, 2)))
        if sigma_a < sigma_b:
            modela_better += 1

    return modela_better/repate

model_a_better = compare_models(model_a_mu, model_a_sigma, model_b_mu, model_b_sigma, len(model_a))
print(model_a_better)

Which gave me: 0.3152. 这给了我:0.3152。 I interpreted this result: If the noise is normally distributed, 31.52% of the probability that model (A) is better. 我解释了这个结果:如果噪声呈正态分布,则模型(A)更好的概率为31.52%。

My question is: I am thinking right way? 我的问题是:我在想正确的方法吗? If not, why? 如果没有,为什么? And how should I solve the problem? 我应该如何解决这个问题?

Ps: I am not statistician, more like programmer, so it is highly possible this all above solution is wrong. 附:我不是统计学家,更像程序员,所以上述所有解决方案很可能是错误的。 Therefore, I ask some confirmation. 因此,我要求一些确认。

This is a so-called model selection problem. 这是所谓的模型选择问题。 There isn't a single right answer, although the most nearly correct way to go about it is via Bayesian inference. 尽管最接近正确的方法是通过贝叶斯推理,但没有一个正确的答案。 That is, to compute the posterior probability p(model | data) for each of the models under consideration (two or more). 即,为所考虑的每个模型(两个或多个)计算后验概率p(model | data)。 Note that the result of Bayesian inference is a probability distribution over models, not a single "this model is correct" selection; 请注意,贝叶斯推断的结果是模型上的概率分布,而不是单个“此模型正确”的选择。 any subsequent result which depends on a model is to be averaged over the distribution over models. 取决于模型的任何后续结果均应在模型分布上求平均值。 Note also that Bayesian inference requires a prior over the models, that is, it's required that you specify a probability for each model a priori, in the absence of data. 还要注意,贝叶斯推理需要先于模型的先验,也就是说,需要在没有数据的情况下为每个模型指定先验的概率。 This is a feature, not a bug. 这是一个功能,而不是错误。

Glancing at the problem as stated, it would probably be straightforward to work out the posterior probability for the two models you mention, but first you'll need to get somewhat familiar with the conceptual framework. 考虑到上述问题,为您提到的两个模型计算出后验概率可能很简单,但是首先您需要对概念框架有所了解。 A web search for Bayesian model inference should turn up a lot of resources. 在Web上搜索贝叶斯模型推断应该会占用大量资源。 Also this question is more suitable for stats.stackexchange.com. 这个问题也更适合stats.stackexchange.com。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM