简体   繁体   English

SAS和R中的回归不匹配

[英]Regression in SAS and R not matching

I'm trying to re-write a current SAS program of mine in R, and I'm checking the output to make sure it matches. 我试图用R重新编写当前的SAS程序,并检查输出以确保其匹配。 I'm starting with a very basic regression, and I can't even get that to match. 我从一个非常基本的回归开始,我什至无法做到这一点。 I also double-checked the results in Excel, and it matched the R output. 我还仔细检查了Excel中的结果,它与R输出匹配。

My SAS code for the regression is very basic: 我的SAS回归代码非常基础:

Proc Reg data=[data set];
 model DepVar = Reg1 Reg2 Reg3 Reg4 Reg5 Reg6;
run;

Here's a summary of the output: 这是输出的摘要:

VAR         SAS         R           Excel
DepVar       0.01748     0.01748     0.01748 
Reg1        (0.24815)   (0.24809)   (0.24809)
Reg2         1.19502     1.19481     1.19481 
Reg3        (0.33029)   (0.33012)   (0.33012)
Reg4         0.80502     0.80507     0.80507 
Reg5        (1.39338)   (1.39345)   (1.39345)
Reg6        (0.13034)   (0.13051)   (0.13051)

And here's the data (only 60 data points): 这是数据(仅60个数据点):

OBS DepVar  Reg1    Reg2    Reg3    Reg4    Reg5    Reg6
1   -0.0444 -0.0298 -0.0165 0.0266  0.032   0.0019  -0.0035
2   -0.0491 0.0165  -0.0072 0.0283  -0.0298 -0.0165 0.0266
3   0.1208  -0.0215 -0.0138 0.0175  0.0165  -0.0072 0.0283
4   -0.0784 -0.0278 -0.04   -0.0046 -0.0215 -0.0138 0.0175
5   0.2154  0.0353  0.0299  -0.0123 -0.0278 -0.04   -0.0046
6   0.1249  0.0045  0.0256  0.0278  0.0353  0.0299  -0.0123
7   0.0062  0.0379  0.0277  -0.0045 0.0045  0.0256  0.0278
8   0.0359  -0.0127 -0.0088 0.0141  0.0379  0.0277  -0.0045
9   0.2078  0.004   -0.0068 0.0116  -0.0127 -0.0088 0.0141
10  -0.123  -0.0214 -0.0103 -0.007  0.004   -0.0068 0.0116
11  -0.0633 0.0353  0.01    -0.0185 -0.0214 -0.0103 -0.007
12  0.0173  -0.0031 -0.0051 0.0048  0.0353  0.01    -0.0185
13  -0.0204 0.03    0.0533  0.0117  -0.0031 -0.0051 0.0048
14  -0.0143 -0.0033 -0.0031 -0.0085 0.03    0.0533  0.0117
15  0.1663  0.0142  0.0356  -0.0011 -0.0033 -0.0031 -0.0085
16  -0.099  0.0066  -0.0124 0.0308  0.0142  0.0356  -0.0011
17  -0.0148 -0.0358 -0.0304 0.0277  0.0066  -0.0124 0.0308
18  -0.0807 -0.0038 -0.0054 0.0151  -0.0358 -0.0304 0.0277
19  0.1532  -0.008  -0.0399 0.0327  -0.0038 -0.0054 0.0151
20  0.1195  0.0205  0.0083  -0.0176 -0.008  -0.0399 0.0327
21  -0.0581 0.0186  -0.0123 -0.0043 0.0205  0.0083  -0.0176
22  0.0034  0.0325  0.0164  0.0048  0.0186  -0.0123 -0.0043
23  0.0476  0.0175  0.0077  0.0048  0.0325  0.0164  0.0048
24  -0.0413 0.0086  -0.0089 0.0252  0.0175  0.0077  0.0048
25  0.0192  0.0143  0.0009  -0.0002 0.0086  -0.0089 0.0252
26  0.2577  -0.0197 0.0137  0.0024  0.0143  0.0009  -0.0002
27  0.0157  0.0071  -0.0026 0.0039  -0.0197 0.0137  0.0024
28  -0.0012 0.0353  -0.0209 -0.0097 0.0071  -0.0026 0.0039
29  0.0393  0.0323  -0.0003 -0.0015 0.0353  -0.0209 -0.0097
30  -0.0036 -0.0198 0.0076  -0.0107 0.0323  -0.0003 -0.0015
31  -0.0607 -0.0374 -0.0267 -0.0299 -0.0198 0.0076  -0.0107
32  0.0236  0.0094  -0.0014 -0.0236 -0.0374 -0.0267 -0.0299
33  -0.0363 0.0314  -0.0246 -0.0213 0.0094  -0.0014 -0.0236
34  -0.0442 0.0173  0.0021  -0.0197 0.0314  -0.0246 -0.0213
35  0.0758  -0.0485 -0.0277 -0.0109 0.0173  0.0021  -0.0197
36  -0.0076 -0.0097 0.0005  -0.0003 -0.0485 -0.0277 -0.0109
37  -0.0096 -0.065  -0.0078 0.0305  -0.0097 0.0005  -0.0003
38  0.0181  -0.0332 -0.0054 -0.0003 -0.065  -0.0078 0.0305
39  -0.056  -0.0112 0.0083  0.0028  -0.0332 -0.0054 -0.0003
40  -0.0404 0.0441  -0.0149 -0.0003 -0.0112 0.0083  0.0028
41  0.2678  0.0165  0.0298  -0.0034 0.0441  -0.0149 -0.0003
42  -0.0138 -0.0865 0.0107  -0.0102 0.0165  0.0298  -0.0034
43  -0.0568 -0.01   0.0358  0.0369  -0.0865 0.0107  -0.0102
44  -0.0234 0.0129  0.0375  0.0148  -0.01   0.0358  0.0369
45  -0.141  -0.0945 -0.0034 0.044   0.0129  0.0375  0.0148
46  -0.0227 -0.1754 -0.0228 -0.0299 -0.0945 -0.0034 0.044
47  -0.1332 -0.0813 -0.0363 -0.0494 -0.1754 -0.0228 -0.0299
48  0.1535  0.015   0.0397  -0.012  -0.0813 -0.0363 -0.0494
49  0.0309  -0.0844 -0.0098 -0.0986 0.015   0.0397  -0.012
50  0.0529  -0.1042 -0.0035 -0.069  -0.0844 -0.0098 -0.0986
51  -0.0834 0.0868  0.0073  0.026   -0.1042 -0.0035 -0.069
52  0.0413  0.0986  0.054   0.0542  0.0868  0.0073  0.026
53  -0.0006 0.0486  -0.0266 0.0056  0.0986  0.054   0.0542
54  0.0159  0.0009  0.0267  -0.0244 0.0486  -0.0266 0.0056
55  -0.0506 0.0738  0.025   0.0473  0.0009  0.0267  -0.0244
56  0.05    0.0299  -0.0051 0.0759  0.0738  0.025   0.0473
57  0.009   0.0376  0.0247  0.014   0.0299  -0.0051 0.0759
58  0.0344  -0.0293 -0.0422 -0.0437 0.0376  0.0247  0.014
59  0.0038  0.0523  -0.0265 0.0017  -0.0293 -0.0422 -0.0437
60  0.1589  0.0239  0.0579  0.0073  0.0523  -0.0265 0.0017

What am I missing? 我想念什么?

Double check your data in SAS and make sure they have the same precision, etc. I used your data and SAS and obtained identical results as your R and Excel outputs: 在SAS中仔细检查您的数据,并确保它们具有相同的精度,等等。我使用了您的数据和SAS,并获得了与R和Excel输出相同的结果:

在此处输入图片说明

And this is Stata output, if that helps with verifying: 这是Stata输出,如果可以帮助验证:

在此处输入图片说明

If I read your output correctly, then the differences show up in the fourth significant digit or even later - for only 60 data points. 如果我正确读取了输出,则差异会显示在第四个有效数字甚至更高的位置-仅60个数据点。 With only 60 data points, all measured to no more than two or three significant digits, you should not even look at anything beyond the third significant digit in your output. 仅使用60个数据点,所有数据点的位数都不超过2或3个有效数字,您甚至不应查看输出中第三个有效数字以外的任何东西。 Anything "out there" will be swamped by measurement noise. 任何“外面”的东西都会被测量噪声淹没。

Matrix inversion (more precisely, finding solutions to linear equations) is not an exact science in floating point arithmetic. 在浮点算术中,矩阵求逆(更精确地说,是找到线性方程的解)不是一门精确的科学。 Using different numerical libraries, which may use different algorithms for solving linear equations, or even the same libraries on different architectures (which I assume is not the case for you) can certainly cause divergences on the order you are observing. 使用不同的数值库(可能使用不同的算法来求解线性方程式),甚至使用不同体系结构上的相同库(我认为对您而言并非如此),肯定会导致您观察的顺序出现差异。 Check R FAQ 7.31 for more info. 有关更多信息,请参见R FAQ 7.31。 Using special exact arithmetic libraries should in principle yield the same results, but I don't even know whether OLS solutions are available in R/SAS/Excel with exact arithmetic. 原则上,使用特殊的精确算术库应该会产生相同的结果,但我什至不知道在R / SAS / Excel中是否可以使用精确算术获得OLS解决方案。

This is the precision difference. 这就是精度差异。 My guess is that PROC REG is using MLE, while R and Excel are using matrix factorization route. 我的猜测是PROC REG使用MLE,而R和Excel使用矩阵分解路由。 When using linear algebra the precision is pretty much set to close to machine precision. 使用线性代数时,精度几乎设置为接近机器精度。 In MLE you set the precision, then optimization routine will try to match it. 在MLE中,您可以设置精度,然后优化例程会尝试将其匹配。

Another guess is the conversion from character to number and rounding around it. 另一个猜测是从字符到数字的转换并四舍五入。

Thanks everyone for your input. 谢谢各位的意见。 It appears to be something going on with the data as it works its way through the SAS program. 随着数据在SAS程序中的运行,数据似乎正在发生变化。 I had originally taken a couple data sources and combined them into a single SAS dataset, and then I exported that dataset to R and Excel, which is when the differences occurred. 我最初使用了几个数据源,并将它们组合成一个SAS数据集,然后将数据集导出到R和Excel,这就是发生差异的时间。 I find now that if I do the combining of the original data sets in R and then run the regression, I get the original SAS answer. 现在,我发现,如果我在R中对原始数据集进行合并,然后运行回归,则会得到原始的SAS答案。 Also, I find (as someone above noted) that if I take the copied data and run that through SAS, I get the original R answer. 此外,我发现(如上文所述),如果我获取复制的数据并通过SAS运行该数据,则会得到原始的R答案。

So the data is being changed somewhere along the line in the SAS program. 因此,数据正在沿SAS程序行的某个位置更改。 However, I can't quite figure out how, since the precision of the original data is only what's shown in my original post. 但是,由于原始数据的精确度仅是我的原始文章中显示的内容,因此我无法完全确定。

Nevertheless, this is helpful. 但是,这是有帮助的。 Thanks! 谢谢!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM