简体   繁体   中英

How to solve mlogit Error in solve.default(H, g[!fixed]) : system is computationally singular: reciprocal condition number = 3.03549e-18?

I have a wide format data, I'm calling mlogit.data And I tried implementing a mixed logit model using mlogit package, I have one hot encoded the categorical columns (color,size_group ) is that causing the below error?

numerical features in model_data are log1p transformed.

Complete.choice <- mlogit.data(model_data, choice = "y", 
                                 varying = 2:79, shape = "wide", sep = "__", id = "customer_id")
formula <- as.formula("y ~ price + weight + length + height + width + color_white + 
                    color_red + color_black + size_group_1 + size_group_3 + size_group_5 + 
                     size_group_4 + size_group_2 | -1")

# rpar
 features <- c("price","weight","length","height","width","color_white",
              "color_red","color_black" ,"size_group_1",
              "size_group_3","size_group_5","size_group_4","size_group_2" )
random_parameter <- rep("n", 1:length(features))
names(random_parameter) <- features

sample.mxl <- mlogit(formula, Complete.choice , rpar = random_parameter, 
                       R = 40, halton = NA, panel = TRUE, seed = 123, print.level = 0)

Error in solve.default(H, g[!fixed]) : 
  system is computationally singular: reciprocal condition number = 3.23485e-18

The error means that the Hessian matrix is singular, ie the determinant is zero, and the inverse doesn't exist. Effectively, you cannot obtain the variance-covariance matrix.

There are several reasons why this might happen:

  1. You don't have enough variation in your data to identify the model. You are trying to estimate one that is very complex and it would require a lot from your data (variation and observations).
  2. The model is over-specified (have you made the correct normalizations?)
  3. You are estimating 13 random parameters, which asks a lot from your data. I would start with a single random parameter and gradually increase to see when your model fails. Also with more than 4-5 random parameters, you shouldn't be using Halton draws, but would need some type of scrambling procedure. I would recommend scrambled Sobol draws, MLHS draws or scrambled Halton draws.
  4. You are only using R=40 . This is a very low number. It will give a poor approximation to the multidimensional integral that is the mixed logit probability. The number of draws needed is increasing in complexity of the model, available alternatives etc. Many people think 500-1000 is good, whereas others tend to use 5000 or higher. Me, I start at a 1000 and gradually increase to where my parameters stabilize. Too few draws could also cause the error you are seeing.

It is impossible to diagnose the reason without testing on the actual data, but these are at least some pointers to get you started.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM