简体   繁体   English

Statsmodels给SPSS提供了不同的ANOVA结果

[英]Statsmodels gives different ANOVA results to SPSS

I'm getting acquainted with Statsmodels so as to shift my more complicated stats completely over to python. 我正在熟悉Statsmodels,以便将我更复杂的统计数据完全转移到python。 However, I'm being cautious, so I'm cross-checking my results with SPSS, just to make sure I'm not making any obvious blunders. 但是,我很谨慎,所以我用SPSS交叉检查我的结果,只是为了确保我没有犯任何明显的错误。 Most of time, there's no difference, but I have one example of a two-way ANOVA that's throwing up very different test statistics in Statsmodels and SPSS. 大多数时候,没有区别,但我有一个双向ANOVA的例子,它在Statsmodels和SPSS中抛出了截然不同的测试统计数据。 (Relevant point: the sample sizes in the ANOVA are mismatched, so ANOVA may not be the appropriate model here.) (相关点:ANOVA中的样本量不匹配,因此ANOVA可能不是此处的合适模型。)

I'm selecting my model as follows: 我选择的模型如下:

import pandas as pd
import scipy as sp
import numpy as np
import statsmodels.api as sm
import seaborn as sns
import statsmodels
import statsmodels.api as sm
from statsmodels.formula.api import ols
import matplotlib.pyplot as plt

Body = pd.read_csv(filepath)

Body = Body.dropna()

Body_lm = ols('Effect ~ C(Fiction) + C(Condition) + C(Fiction)*C(Condition)', data = Body).fit()

table = sm.stats.anova_lm(Body_lm, typ=2)

The Statsmodels output is as below: Statsmodels输出如下:

                            sum_sq     df           F        PR(>F)
C(Fiction)               278.176684    1.0  307.624463  1.682042e-55
C(Condition)               4.294764    1.0    4.749408  2.971278e-02
C(Fiction):C(Condition)   10.776312    1.0   11.917092  5.970123e-04
Residual                 520.861599  576.0         NaN           NaN

The corresponding SPSS results are these: 相应的SPSS结果如下:

SPSS

Can anyone help explain the difference? 谁能帮助解释这个区别? Is is perhaps the unequal sample sizes being treated differently under the hood? 或许是不同的样本大小在引擎盖下被区别对待? Or am I choosing the wrong model? 或者我选择了错误的模型?

Any help appreciated! 任何帮助赞赏!

You should use sum coding when comparing the means of the variables. 在比较变量的均值时,应使用求和编码 BTW you don't need to specify each variable that are in the interaction term if * multiply operator is used: 顺便说一下,如果使用* multiply运算符,则不需要指定交互项中的每个变量:

“:” adds a new column to the design matrix with the product of the other two columns. “:”使用其他两列的乘积在设计矩阵中添加一个新列。
“*” will also include the individual columns that were multiplied together. “*”还将包括相乘的各列。

Your model should be: 你的模型应该是:

Body_lm = ols('Effect ~ C(Fiction, Sum)*C(Condition, Sum)', data = Body).fit()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM