简体   繁体   中英

R glm function changing my column names

I have what I think is a relatively simple question, but I can't seem to find the answer.

I have a 200 X 8 matrix temp and a response matrix (200X1) Binomial Vector When I run the following line:

CLog=glm(BinomialVector~temp,family= binomial(logit)) 

I am able to run the logistic regression. What I think this is doing is really BinomialVector~tempcol1 +tempcol2+tempcol3 and so on.

However, when I press summary(CLog) the names of my factors have changed. If the first column was called trees then it has change do temptrees .Is there a way to prevent this?

As requested:

  BinomialVector
   [,1]
  [1,]    0
  [2,]    1
  [3,]    1
  [4,]    0
  [5,]    0
  [6,]    0
  [7,]    1



temp

  Net.Income.Y06. Return.on.Assets.Y06.
A         0.1929241                27.947    
AA        1.1405694                12.427
AAP       1.0302481                17.117
ABT       2.1006512                13.826

Return.on.Investment.Y06. Total.Current.Assets.Y06.
A                      39.844                 0.9274886  
AA                     20.003                 0.8830403
AAP                    30.927                 1.0439536
ABT                    21.376                 1.2447154


  Total.Current.Liabilities.Y06. IntersectionMostAdmired.2006.
A                        1.0812744                         0.000
AA                       0.9842055                         7.255
AAP                      1.1010472                         0.000
ABT                      0.7617044                         6.715

This is what possible columns of my temp matrix look like. The reason I don't like using that additive notation is that the number of columns changes, as I am using this inside a user defined function where I feed it in the temp matrix. As for using the data frame, I was under the impression that data frame is indeed the correct thing to use but I seem to get an error when it is not as.matrix. :s

Can you post a representative subset of your data and also the actual output glm gives you for that subset?

Then it will be easier to diagnose/replicate.

In the meantime, I suggest you use a data frame instead of a matrix. Here is how:

mydf<-data.frame(y=BinomialVector,temp);
CLog = glm(BinomialVector~tempcol1+tempcol2+tempcol3,data=mydf,family=binomial(logit));

Matrices are a bad format to use as data sources for regression models (for one thing, they coerce all columns to the same data type, which may or may not be part of the problem here), so I never use them. But if I had to guess, your model might be converting the matrix into one long vector? And perhaps there's a variable somewhere in there that has the value "tree"? But without example data and output, it's all guesswork. It's likely that when you run the above commands, the nature of the problem will reveal itself right there.

Using a data frame is the way to go. For one, it'll make getting predictions on new data much easier; and it'll also let you use nominal predictors (factors) without having to code up the dummy variables yourself. If the number of predictors is not fixed and you want to fit a model on all of them, use . in the formula.

df <- data.frame(y=BinomialVector, temp)
glm(y ~ ., family=binomial, data=df)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM