简体   繁体   中英

Reducing the time of dynamic factor model estimation with statsmodels in Python

I was trying to estimate dynamic factor model with statsmodels in Python, following the example https://www.statsmodels.org/dev/examples/notebooks/generated/statespace_dfm_coincident.html However instead of example dataset, I used my own dataset of 282 variables, with 124 observations (monthly inflation rates for different countries). However after running the code for more than six hour I've got no resuts. Experementing with different number of variables and different solvers, I've got these time estimates:

Number of variables Initial params in seconds   Model estimate in seconds
Powell solver: 
    10                  57,3                        4,9
    20                  167,6                       19,9
    40                  1498,8                      137,8
BFGS
    10                  9,1                         6,3
    20                  89,2                        18,5
    40                  597,5                       138,2

According to these calculations the time is growing by something like n^2*log(n), meaning that to calculate the model for all 280 variables with Powell solver I will need around 30 hours, which is too long. BFGS is faster, but for 20 and 40 variables I've got a worning that likelihood optimization failed to converge.

I was running it on my laptop (WIN10, 32gb, i7-4700MQ, 2.40GHz) and it didn't look like it used all the resources, only ~10gb of memory was used and ~25-50% of CPU. So the question is how I can make the estimation of DFM model faster and to converge? Could multithreading help to increase the speed, if I run this code on a cloud (say Amazone or Google with 32-64 CPUs) or there will be little improvement for paralleling of statsmodels? Does it make sence to switch to Matlab or other software for such kind of calculations? There are some solvers for large problems in scipy.optimize (like krylov, broyden2, or anderson), but I'm not sure they can be used with statsmodels.LikelihoodModel.fit.

I Will be very grateful on any thoughts how to speed up the esitmation: Code I run:

 import statsmodels.api as sm
    import time
    # Create the model
    mod = sm.tsa.DynamicFactor(data_cpi, k_factors=3, factor_order=1, error_order=1)
    tic = time.perf_counter()
    initial_res = mod.fit(method='powell', disp=True)
    toc = time.perf_counter()
    print(f"Initial params in {toc - tic:0.4f} seconds")
    res = mod.fit(initial_res.params, disp=True)
    tic = time.perf_counter()
    print(f"Model estimate in {tic - toc:0.4f} seconds")
    print(res.summary(separate_params=False)) 

One way to reduce the fitting time, if you don't need the parameters' standard errors, is by passing cov_type='none' to the fit method. But it will still be slow. Numerically optimizing the parameters of a dynamic factor model with a large number of variables will be very slow when using quasi-Newton methods like BFGS or even derivative-free methods like Powell.

Large dynamic factor models are usually made feasible by optimizing the parameters using the EM algorithm. Statsmodels doesn't have that option in v0.11, but it is likely that it will make it into the v0.12 release for dynamic factor models.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM