简体   繁体   中英

Reverting to previous ordering of factor after using relevel() and fitting GLM in R

A convoluted question and I'm not sure I'm expressing it as concisely as I could, but...

I'm in a position where I want to fit multivariate generalised linear models - and because of the size and complexity of my models I'm having to use rxGlm() from the RevoScaleR package rather than the built in glm() function.

It's important that each factor in the model has a reference level of my choosing, which I can set using relevel() of course. However the nuisance here is that the factor levels are reordered, so it makes the GLM model output confusing to work with. I'd like to be able to retrieve the original factor level ordering after I've fitted the model, for presentation purposes.

A simple example:

library(RevoScaleR) # from Microsoft R Client

x <- data.frame(country = c("Australia", "Belgium", "Chile", "Belgium", "Belgium"),
                degree = c("Y", "Y", "N", "Y", "N"),
                salary = c(10000, 15000, 5000, 20000, 4000))

model <- rxGlm(salary ~ country + degree, data = x, dropFirst = TRUE)


This gives

(Intercept) country=Australia   country=Belgium     country=Chile    degree=N    degree=Y
      -3500                NA              7500              8500          NA       13500

Both factors are ordered alphabetically here so the reference level is country = Australia , degree = N . Suppose I'd like to have my reference levels as country = Belgium , degree = Y . I can do this and then rerun the model:

x$country <- relevel(x$country, ref = "Belgium")
x$degree <- relevel(x$degree, ref = "Y")

model <- rxGlm(salary ~ country + degree, data = x, dropFirst = TRUE)


This now gives the same model, but presented differently:

(Intercept)   country=Belgium country=Australia     country=Chile    degree=Y    degree=N 
      17500                NA             -7500              1000          NA      -13500 

These are the coefficients I want, but now the ordering is wrong. Is there a simple way to rearrange this item using the factor ordering I had before the relevel() commands?

Thank you.

Create a vector of names, then index your coefficients using those names. Eg:

Names <- c(
  paste('country', sort(levels(x$country)), sep = '='),
  paste('degree', sort(levels(x$degree)), sep = '=')
coefs2 <- coefs[Names]


(Intercept) country=Australia   country=Belgium     country=Chile          degree=N          degree=Y 
      17500             -7500                NA              1000            -13500                NA 


coefs <- c(
  `(Intercept)` = 17500L, `country=Belgium` = NA, `country=Australia` = -7500L, 
  `country=Chile` = 1000L, `degree=Y` = NA, `degree=N` = -13500L

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM