Python Statsmodels中GEE的自回归参数

Question

I'm trying to run a GEE using an autoregressive structure for some panel data in statsmodels, looking at differences between sales during different hours of a shift: 我正在尝试使用自回归结构为statsmodel中的某些面板数据运行GEE，查看不同时段的销售之间的差异：

ga = sm.families.Gaussian()
ar = sm.cov_struct.Autoregressive()
times = (BakeSale['Hour'].values)
ar.dep_params = 0.06
model2 = sm.GEE.from_formula("CookieSales ~ C(Hour) + Arrivals + TotalSalesPeople", groups=BakeSale["SalesPerson"],
                  data=BakeSale, family=ga, time=times, cov_struct=ar)
result2 = model2.fit(start_params=result1.params)
print(result2.summary())

This raises a ValueError: Not a bracketing interval. 这会引发ValueError：不是包围间隔。

I currently have the 'Hour' of the shift coded as an ordinal integer (ie 1-8), but also have timestamps as well. 我目前将班次的'小时'编码为序数整数（即1-8），但也有时间戳。

Any thoughts for how to overcome this? 有关如何克服这个问题的任何想法？

Full output: 全输出：

//anaconda/lib/python3.5/site-packages/statsmodels/genmod/cov_struct.py:724: RuntimeWarning: divide by zero encountered in true_divide
  wts = 1. / var
//anaconda/lib/python3.5/site-packages/statsmodels/genmod/cov_struct.py:725: RuntimeWarning: invalid value encountered in true_divide
  wts /= wts.sum()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-81-d81d0b97546e> in <module>()
      7 #CookieSales ~ C(Hour) + Arrivals + TotalSalesPeople"
      8 # Maybe try without C, or find if any with nan value or such
----> 8 result2 = model2.fit(start_params=result1.params)
      9 print(result2.summary())
     10 print(ar.summary())

//anaconda/lib/python3.5/site-packages/statsmodels/genmod/generalized_estimating_equations.py in fit(self, maxiter, ctol, start_params, params_niter, first_dep_update, cov_type, ddof_scale, scaling_factor)
   1111             if (self.update_dep and (itr % params_niter) == 0
   1112                 and (itr >= first_dep_update)):
-> 1113                 self._update_assoc(mean_params)
   1114                 num_assoc_updates += 1
   1115 

//anaconda/lib/python3.5/site-packages/statsmodels/genmod/generalized_estimating_equations.py in _update_assoc(self, params)
   1259         """
   1260 
-> 1261         self.cov_struct.update(params)
   1262 
   1263     def _derivative_exog(self, params, exog=None, transform='dydx',

//anaconda/lib/python3.5/site-packages/statsmodels/genmod/cov_struct.py in update(self, params)
    766 
    767         from scipy.optimize import brent
--> 768         self.dep_params = brent(fitfunc, brack=[b_lft, b_ctr, b_rgt])
    769 
    770 

//anaconda/lib/python3.5/site-packages/scipy/optimize/optimize.py in brent(func, args, brack, tol, full_output, maxiter)
   2001     options = {'xtol': tol,
   2002                'maxiter': maxiter}
-> 2003     res = _minimize_scalar_brent(func, brack, args, **options)
   2004     if full_output:
   2005         return res['x'], res['fun'], res['nit'], res['nfev']

//anaconda/lib/python3.5/site-packages/scipy/optimize/optimize.py in _minimize_scalar_brent(func, brack, args, xtol, maxiter, **unknown_options)
   2033                   full_output=True, maxiter=maxiter)
   2034     brent.set_bracket(brack)
-> 2035     brent.optimize()
   2036     x, fval, nit, nfev = brent.get_result(full_output=True)
   2037     return OptimizeResult(fun=fval, x=x, nit=nit, nfev=nfev,

//anaconda/lib/python3.5/site-packages/scipy/optimize/optimize.py in optimize(self)
   1839         # set up for optimization
   1840         func = self.func
-> 1841         xa, xb, xc, fa, fb, fc, funcalls = self.get_bracket_info()
   1842         _mintol = self._mintol
   1843         _cg = self._cg

//anaconda/lib/python3.5/site-packages/scipy/optimize/optimize.py in get_bracket_info(self)
   1827             fc = func(*((xc,) + args))
   1828             if not ((fb < fa) and (fb < fc)):
-> 1829                 raise ValueError("Not a bracketing interval.")
   1830             funcalls = 3
   1831         else:

ValueError: Not a bracketing interval.

Answer 1

Often in life one needs to make sure that one is starting from the right data to begin with. 通常在生活中，需要确保一个人从正确的数据开始。 For instance, examining individual Shifts rather than Salespeople: 例如，检查个别班次而不是销售人员：

model2 = sm.GEE.from_formula("CookieSales ~ C(Hour) + Arrivals + TotalSalesPeople", groups=BakeSale["Shift"],
              data=BakeSale, family=ga, time=times, cov_struct=ex)

Demonstrated that the max cluster size was suspiciously off, and the the mean cluster size was just above 8. 证明最大簇大小可疑地关闭，并且平均簇大小刚好超过8。

Review of the wrangling of the original dataset revealed that several shifts had been mistakenly coded with many, many more than the appropriate number of hours for a shift. 回顾原始数据集的争论表明，有几个班次错误地编码了许多，比转换的适当小时数多得多。 Once this was corrected, the model was able to run appropriately.... 一旦纠正，模型就能够正常运行....

Python Statsmodels中GEE的自回归参数

问题描述

1 个解决方案

解决方案1
0 2017-04-08 16:22:07

Python Statsmodels中GEE的自回归参数

问题描述

1 个解决方案

解决方案1 0 2017-04-08 16:22:07

解决方案1
0 2017-04-08 16:22:07