Panda> Statsmodel：实现variance_inflation_factor的语法错误

Question

I am "using" Statsmodel for less than 2 days and am not at all familiar with the import commands etc. I want to run a simple variance_inflation_factor from here but am having some issues. 我在不到2天的时间内“使用” Statsmodel ，并且完全不熟悉导入命令等。我想从此处运行一个简单的variance_inflation_factor ，但遇到了一些问题。 My code follows: 我的代码如下：

from numpy import *
import numpy as np
import pandas as pd
from pandas import DataFrame, Series
import statsmodels.formula.api as sm
from sklearn.linear_model import LinearRegression
import scipy, scipy.stats
import matplotlib.pyplot as plt
import matplotlib
matplotlib.style.use('ggplot')
from statsmodels.api import add_constant
from numpy import linalg as LA
import statsmodels as sm

## I have been adding libraries and modules/packages with the intention of erring on the side of caution 

a = df1.years_exp
b = df1.leg_totalbills
c = df1.log_diff_rgdp
d = df1.unemployment
e = df1.expendituresfor
f = df1.direct_expenditures
g = df1.indirect_expenditures

sm.variance_inflation_factor((['a', 'b', 'c', 'd', 'e', 'f']), g)

then I get the following error:

AttributeError                            Traceback (most recent call last)
<ipython-input-61-bb126535eadd> in <module>()
----> 1 sm.variance_inflation_factor((['a', 'b', 'c', 'd', 'e', 'f']), g)

AttributeError: module 'statsmodels' has no attribute 'variance_inflation_factor'

Can someone direct me to the proper syntax for loading and executing this module? 有人可以指导我使用正确的语法来加载和执行此模块吗？ If it is more convenient that I post a link to some source code please ask. 如果我发布一些源代码的链接更方便，请询问。 However, I have a feeling that this is just a simple syntax issue. 但是，我觉得这只是一个简单的语法问题。

Answer 1

The function variance_inflation_factor is found in statsmodels.stats.outlier_influence as seen in the docs , so to use it you must import correctly, an option would be 可以在statsmodels.stats.outlier_influence找到该函数variance_inflation_factor ，如docs所示，因此，要正确使用该函数，必须正确导入，

from statsmodels.stats import outliers_influence
# code here 
outliers_influence.variance_inflation_factor((['a', 'b', 'c', 'd', 'e', 'f']), g)

Answer 2

Thanks for asking this question! 感谢您提出这个问题！ I had the same question today, except I wanted to calculate the variance inflation factor for each of the features. 我今天有相同的问题，除了我想计算每个要素的方差膨胀因子。 Here is a programmatic way to do this: 这是一种编程方式来执行此操作：

from patsy import dmatrices
from statsmodels.stats.outliers_influence import variance_inflation_factor

# 'feature_1 + feature_2 ... feature_p'
features_formula = "+".join(df1.columns - ["indirect_expenditures"])

# get y and X dataframes based on this formula:
# indirect_expenditures ~ feature_1 + feature_2 ... feature_p
y, X = dmatrices('indirect_expenditures ~' + features_formula, df1, return_type='dataframe')

# For each Xi, calculate VIF and save in dataframe
vif = pd.DataFrame() 
vif["vif"] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
vif["features"] = X.columns
vif

Please note the above code works only if you have imported pandas and df1 is a pandas DataFrame 请注意，仅当您导入了pandas并且df1是pandas DataFrame ，以上代码才pandas DataFrame

Answer 3

a = df1.years_exp
b = df1.leg_totalbills
c = df1.log_diff_rgdp
d = df1.unemployment
e = df1.expendituresfor
f = df1.direct_expenditures
g = df1.indirect_expenditures

ck=np.array([a,b,c,d,e,f,g])
outliers_influence.variance_inflation_factor(ck, 6)

Panda> Statsmodel：实现variance_inflation_factor的语法错误

问题描述

3 个解决方案

解决方案1
1 2016-05-09 20:03:55

解决方案2
1 2017-02-13 03:44:13

解决方案3
0 2016-05-09 21:54:29

Panda&gt; Statsmodel：实现variance_inflation_factor的语法错误

问题描述

3 个解决方案

解决方案1 1 2016-05-09 20:03:55

解决方案2 1 2017-02-13 03:44:13

解决方案3 0 2016-05-09 21:54:29

Panda> Statsmodel：实现variance_inflation_factor的语法错误

解决方案1
1 2016-05-09 20:03:55

解决方案2
1 2017-02-13 03:44:13

解决方案3
0 2016-05-09 21:54:29