[英]Autoregression Parameter for GEE in Python Statsmodels
I'm trying to run a GEE using an autoregressive structure for some panel data in statsmodels, looking at differences between sales during different hours of a shift: 我正在尝试使用自回归结构为statsmodel中的某些面板数据运行GEE,查看不同时段的销售之间的差异:
ga = sm.families.Gaussian()
ar = sm.cov_struct.Autoregressive()
times = (BakeSale['Hour'].values)
ar.dep_params = 0.06
model2 = sm.GEE.from_formula("CookieSales ~ C(Hour) + Arrivals + TotalSalesPeople", groups=BakeSale["SalesPerson"],
data=BakeSale, family=ga, time=times, cov_struct=ar)
result2 = model2.fit(start_params=result1.params)
print(result2.summary())
This raises a ValueError: Not a bracketing interval. 这会引发ValueError:不是包围间隔。
I currently have the 'Hour' of the shift coded as an ordinal integer (ie 1-8), but also have timestamps as well. 我目前将班次的'小时'编码为序数整数(即1-8),但也有时间戳。
Any thoughts for how to overcome this? 有关如何克服这个问题的任何想法?
Full output: 全输出:
//anaconda/lib/python3.5/site-packages/statsmodels/genmod/cov_struct.py:724: RuntimeWarning: divide by zero encountered in true_divide
wts = 1. / var
//anaconda/lib/python3.5/site-packages/statsmodels/genmod/cov_struct.py:725: RuntimeWarning: invalid value encountered in true_divide
wts /= wts.sum()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-81-d81d0b97546e> in <module>()
7 #CookieSales ~ C(Hour) + Arrivals + TotalSalesPeople"
8 # Maybe try without C, or find if any with nan value or such
----> 8 result2 = model2.fit(start_params=result1.params)
9 print(result2.summary())
10 print(ar.summary())
//anaconda/lib/python3.5/site-packages/statsmodels/genmod/generalized_estimating_equations.py in fit(self, maxiter, ctol, start_params, params_niter, first_dep_update, cov_type, ddof_scale, scaling_factor)
1111 if (self.update_dep and (itr % params_niter) == 0
1112 and (itr >= first_dep_update)):
-> 1113 self._update_assoc(mean_params)
1114 num_assoc_updates += 1
1115
//anaconda/lib/python3.5/site-packages/statsmodels/genmod/generalized_estimating_equations.py in _update_assoc(self, params)
1259 """
1260
-> 1261 self.cov_struct.update(params)
1262
1263 def _derivative_exog(self, params, exog=None, transform='dydx',
//anaconda/lib/python3.5/site-packages/statsmodels/genmod/cov_struct.py in update(self, params)
766
767 from scipy.optimize import brent
--> 768 self.dep_params = brent(fitfunc, brack=[b_lft, b_ctr, b_rgt])
769
770
//anaconda/lib/python3.5/site-packages/scipy/optimize/optimize.py in brent(func, args, brack, tol, full_output, maxiter)
2001 options = {'xtol': tol,
2002 'maxiter': maxiter}
-> 2003 res = _minimize_scalar_brent(func, brack, args, **options)
2004 if full_output:
2005 return res['x'], res['fun'], res['nit'], res['nfev']
//anaconda/lib/python3.5/site-packages/scipy/optimize/optimize.py in _minimize_scalar_brent(func, brack, args, xtol, maxiter, **unknown_options)
2033 full_output=True, maxiter=maxiter)
2034 brent.set_bracket(brack)
-> 2035 brent.optimize()
2036 x, fval, nit, nfev = brent.get_result(full_output=True)
2037 return OptimizeResult(fun=fval, x=x, nit=nit, nfev=nfev,
//anaconda/lib/python3.5/site-packages/scipy/optimize/optimize.py in optimize(self)
1839 # set up for optimization
1840 func = self.func
-> 1841 xa, xb, xc, fa, fb, fc, funcalls = self.get_bracket_info()
1842 _mintol = self._mintol
1843 _cg = self._cg
//anaconda/lib/python3.5/site-packages/scipy/optimize/optimize.py in get_bracket_info(self)
1827 fc = func(*((xc,) + args))
1828 if not ((fb < fa) and (fb < fc)):
-> 1829 raise ValueError("Not a bracketing interval.")
1830 funcalls = 3
1831 else:
ValueError: Not a bracketing interval.
Often in life one needs to make sure that one is starting from the right data to begin with. 通常在生活中,需要确保一个人从正确的数据开始。 For instance, examining individual Shifts rather than Salespeople: 例如,检查个别班次而不是销售人员:
model2 = sm.GEE.from_formula("CookieSales ~ C(Hour) + Arrivals + TotalSalesPeople", groups=BakeSale["Shift"],
data=BakeSale, family=ga, time=times, cov_struct=ex)
Demonstrated that the max cluster size was suspiciously off, and the the mean cluster size was just above 8. 证明最大簇大小可疑地关闭,并且平均簇大小刚好超过8。
Review of the wrangling of the original dataset revealed that several shifts had been mistakenly coded with many, many more than the appropriate number of hours for a shift. 回顾原始数据集的争论表明,有几个班次错误地编码了许多,比转换的适当小时数多得多。 Once this was corrected, the model was able to run appropriately.... 一旦纠正,模型就能够正常运行....
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.