簡體   English   中英

Panda> Statsmodel:實現variance_inflation_factor的語法錯誤

[英]Panda > Statsmodel: syntax errors implementing variance_inflation_factor

我在不到2天的時間內“使用” Statsmodel ,並且完全不熟悉導入命令等。我想從此處運行一個簡單的variance_inflation_factor ,但遇到了一些問題。 我的代碼如下:

from numpy import *
import numpy as np
import pandas as pd
from pandas import DataFrame, Series
import statsmodels.formula.api as sm
from sklearn.linear_model import LinearRegression
import scipy, scipy.stats
import matplotlib.pyplot as plt
import matplotlib
matplotlib.style.use('ggplot')
from statsmodels.api import add_constant
from numpy import linalg as LA
import statsmodels as sm

## I have been adding libraries and modules/packages with the intention of erring on the side of caution 

a = df1.years_exp
b = df1.leg_totalbills
c = df1.log_diff_rgdp
d = df1.unemployment
e = df1.expendituresfor
f = df1.direct_expenditures
g = df1.indirect_expenditures

sm.variance_inflation_factor((['a', 'b', 'c', 'd', 'e', 'f']), g)

then I get the following error:

AttributeError                            Traceback (most recent call last)
<ipython-input-61-bb126535eadd> in <module>()
----> 1 sm.variance_inflation_factor((['a', 'b', 'c', 'd', 'e', 'f']), g)

AttributeError: module 'statsmodels' has no attribute 'variance_inflation_factor'

有人可以指導我使用正確的語法來加載和執行此模塊嗎? 如果我發布一些源代碼的鏈接更方便,請詢問。 但是,我覺得這只是一個簡單的語法問題。

可以在statsmodels.stats.outlier_influence找到該函數variance_inflation_factor ,如docs所示 ,因此,要正確使用該函數,必須正確導入,

from statsmodels.stats import outliers_influence
# code here 
outliers_influence.variance_inflation_factor((['a', 'b', 'c', 'd', 'e', 'f']), g)

感謝您提出這個問題! 我今天有相同的問題,除了我想計算每個要素的方差膨脹因子。 這是一種編程方式來執行此操作:

from patsy import dmatrices
from statsmodels.stats.outliers_influence import variance_inflation_factor

# 'feature_1 + feature_2 ... feature_p'
features_formula = "+".join(df1.columns - ["indirect_expenditures"])

# get y and X dataframes based on this formula:
# indirect_expenditures ~ feature_1 + feature_2 ... feature_p
y, X = dmatrices('indirect_expenditures ~' + features_formula, df1, return_type='dataframe')

# For each Xi, calculate VIF and save in dataframe
vif = pd.DataFrame() 
vif["vif"] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
vif["features"] = X.columns
vif

請注意,僅當您導入了pandas並且df1是pandas DataFrame ,以上代碼才pandas DataFrame

a = df1.years_exp
b = df1.leg_totalbills
c = df1.log_diff_rgdp
d = df1.unemployment
e = df1.expendituresfor
f = df1.direct_expenditures
g = df1.indirect_expenditures

ck=np.array([a,b,c,d,e,f,g])
outliers_influence.variance_inflation_factor(ck, 6)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM