简体   繁体   English

在TSQL中为多元线性回归计算'R Square'和'P-Value'

[英]Calculate 'R Square' and 'P-Value' for multiple linear regression in TSQL

We just have few built-in functions in SQL Server to do sophisticated statistical analysis but I need to calculate multiple linear regression in TSQL. 我们在SQL Server中只有很少的内置函数来进行复杂的统计分析,但是我需要在TSQL中计算多元线性回归。

Based on this post ( Multiple Linear Regression function in SQL Server ), I could be able to get Coefficients for Intercept (Y) , X1 and X2 . 基于这篇文章( SQL Server中的多元线性回归函数 ),我可以获得Intercept (Y)X1X2 Coefficients

What I need is p-value for X1 and X2 and also R Square 我需要的是X1X2以及R Square p-value

Test data: 测试数据:

DECLARE @TestData TABLE (i INT IDENTITY(1, 1), X1 FLOAT, X2 FLOAT, y FLOAT)

INSERT @TestData
    SELECT 0, 17, 210872.3034 UNION ALL
    SELECT 0, 23, 191988.2299 UNION ALL
    SELECT 0, 18, 204564.9455 UNION ALL
    SELECT 0, 4, 189528.9212 UNION ALL
    SELECT 0, 0, 200203.6364 UNION ALL
    SELECT 11, 0, 218814.1701 UNION ALL
    SELECT 5, 0, 220109.2129 UNION ALL
    SELECT 2, 0, 214377.8534 UNION ALL
    SELECT 1, 0, 204926.9208 UNION ALL
    SELECT 0, 0, 202499.4065 UNION ALL
    SELECT 0, 3, 196917.8182 UNION ALL
    SELECT 0, 9, 202286.0012

Desired output: 所需的输出:

R Square    0.4991599183412360
p-value X1  0.0264247876580807
p-value X2  0.7817597643898020

I have already been able to get following data from the above test data. 我已经能够从上述测试数据中获取以下数据。

b               Coefficients
----------------------------------
Intercept (Y)   202119.231151577
X1 C(H)         1992.8421941724
X2 C(C)         -83.8561622730127 

I know TSQL is not a good platform to obtain this but I need it to be done purely in TSQL. 我知道TSQL不是一个很好的平台,但是我需要纯粹在TSQL中完成它。

I am aware of XLeratorDB Function Packages for SQL Server 我知道用于SQL ServerXLeratorDB功能包

You could calculate R Squared by hand and create a variable 'R2' equal to (Nxysum - xsumysum)^2/ (Nx2sum - xsumxsum) (Ny2sum - ysumysum)? 您可以手动计算R平方并创建等于(Nxysum-xsumysum)^ 2 /(Nx2sum-xsumxsum)(Ny2sum-ysumysum)的变量'R2'吗?

Where xsum and ysum are the sum of your values and N is the number of observations. 其中xsum和ysum是值的总和,N是观测值的数量。

The formula for R Squared is simple enough that you don't necessarily need any function or statistical software. R Squared的公式非常简单,您不一定需要任何功能或统计软件。 Check out this link for calculating it by hand: http://sciencefair.math.iit.edu/analysis/linereg/hand/ 查看此链接以手动进行计算: http : //sciencefair.math.iit.edu/analysis/linereg/hand/

You can apply the same logic to T-SQL. 您可以将相同的逻辑应用于T-SQL。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM