简体   繁体   中英

How to turn just one row of a matrix into a vector so I can do a linear regression on it?

I'm trying to do my first ever project in R and I just don't know the language, so it's really killing me here. This is the most frustrating thing I've ever encountered, mostly because it seems like there is absolutely nowhere on the internet that caters to people who don't know the language to teach you how to do things.

I am trying to run a linear regression with the data that I'm using being one of the built-in datasets that RStudio has. This is my line of code:

    lm(Income ~ Illiteracy, data=florida)

But I keep coming up with this error:

Error in model.frame.default(formula = Income ~ Illiteracy, data = florida,: 'data' must be a data.frame, not a matrix or an array

(friend who was helping me renamed state.x77 into "florida").

After getting this error and deciding that I would prefer to either do each state individually in the regression or at least a couple sample states, I decided I wanted to take the Florida row and turn it into its own vector to do the analysis on. However, I have NO idea how to do that. I keep seeing suggestions on this website but they're all taking about "naming" things and a lot of the commands have "dim" which no one explains.

Please help I'm a total beginner and I have a textbook that assumes you know R and I found another "Learn R" book that somehow also assumes you know R

R has several data structures for handling datasets. A matrix is one of them - it restricts you to a single type of variable (usually numeric ), and must have a rectangular shape.

A data.frame is similar in shape to a matrix, but each column can be a different type (eg numeric , character , or factor ). This is closer to a typical dataset, where you have a mixture of continuous / numeric, ordinal, and categorical / nominal variables.

You can check what sort of input a function requires by typing ?functionname , eg ?lm and inspecting the Arguments section:

data
an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which lm is called.

Before experimenting with regression, you can learn the basic building blocks of R with a good introductory course. One free option is DataCamp's Introduction to R , but there are many others. Once you understand the different variable types, data structures, and syntax of R, these errors are easy to correct.

In this case, you just need to write as.data.frame(florida) to "coerce" the matrix object to a data.frame object.

If you want to get a model for each state try this

data(state)

state.x77 <- as.data.frame(state.x77)
state.x77$name <- rownames(state.x77)
mod_list <- list()
for (s in unique(rownames(state.x77))) {
    m <- lm(Income ~ Illiteracy, data = subset(state.x77, name == s))
    mod_list <- c(mod_list, list(mod = m))
}
names(mod_list) <- unique(rownames(state.x77))

For linear regression of Illiterace to Income, you should do:

lm(Income ~ Illiteracy, data=as.data.frame(state.x77))

because lm accepts dataframes, not matrices.

friend who was helping me renamed state.x77 into "florida"

I don't know why would he or she do it. state.x77 is a data of 8 parameters for 50 different states. Florida is just one of them, so why on the earth would he call it "florida"? Suppose you have a dataset of population and income of 200 different countries. Would you call it "india" because India is one of the countries in the dataset?

After getting this error and deciding that I would prefer to either do each state individually in the regression

You cannot "do a state individually in the regression". Not that you cannot do it in R, you cannot do it at all, because it is mathematically absurd. Florida has (in this matrix) a population of 4815 and an illiteracy of 1.3. How you do a regression between two numbers? It is absurd.

I decided I wanted to take the Florida row and turn it into its own vector to do the analysis on.

You can take the Florida row:

foo <- state.x77["Florida",]

Now foo is the vector of 8 parameters for Florida, but what can you do with it?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM