I'm trying to do my first ever project in R and I just don't know the language, so it's really killing me here. This is the most frustrating thing I've ever encountered, mostly because it seems like there is absolutely nowhere on the internet that caters to people who don't know the language to teach you how to do things.
I am trying to run a linear regression with the data that I'm using being one of the built-in datasets that RStudio has. This is my line of code:
lm(Income ~ Illiteracy, data=florida)
But I keep coming up with this error:
Error in model.frame.default(formula = Income ~ Illiteracy, data = florida,: 'data' must be a data.frame, not a matrix or an array
(friend who was helping me renamed state.x77 into "florida").
After getting this error and deciding that I would prefer to either do each state individually in the regression or at least a couple sample states, I decided I wanted to take the Florida row and turn it into its own vector to do the analysis on. However, I have NO idea how to do that. I keep seeing suggestions on this website but they're all taking about "naming" things and a lot of the commands have "dim" which no one explains.
Please help I'm a total beginner and I have a textbook that assumes you know R and I found another "Learn R" book that somehow also assumes you know R
R has several data structures for handling datasets. A matrix
is one of them - it restricts you to a single type of variable (usually numeric
), and must have a rectangular shape.
A data.frame
is similar in shape to a matrix, but each column can be a different type (eg numeric
, character
, or factor
). This is closer to a typical dataset, where you have a mixture of continuous / numeric, ordinal, and categorical / nominal variables.
You can check what sort of input a function requires by typing ?functionname
, eg ?lm
and inspecting the Arguments section:
data
an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which lm is called.
Before experimenting with regression, you can learn the basic building blocks of R with a good introductory course. One free option is DataCamp's Introduction to R , but there are many others. Once you understand the different variable types, data structures, and syntax of R, these errors are easy to correct.
In this case, you just need to write as.data.frame(florida)
to "coerce" the matrix
object to a data.frame
object.
If you want to get a model for each state try this
data(state)
state.x77 <- as.data.frame(state.x77)
state.x77$name <- rownames(state.x77)
mod_list <- list()
for (s in unique(rownames(state.x77))) {
m <- lm(Income ~ Illiteracy, data = subset(state.x77, name == s))
mod_list <- c(mod_list, list(mod = m))
}
names(mod_list) <- unique(rownames(state.x77))
For linear regression of Illiterace to Income, you should do:
lm(Income ~ Illiteracy, data=as.data.frame(state.x77))
because lm
accepts dataframes, not matrices.
friend who was helping me renamed state.x77 into "florida"
I don't know why would he or she do it. state.x77
is a data of 8 parameters for 50 different states. Florida is just one of them, so why on the earth would he call it "florida"? Suppose you have a dataset of population and income of 200 different countries. Would you call it "india" because India is one of the countries in the dataset?
After getting this error and deciding that I would prefer to either do each state individually in the regression
You cannot "do a state individually in the regression". Not that you cannot do it in R, you cannot do it at all, because it is mathematically absurd. Florida has (in this matrix) a population of 4815 and an illiteracy of 1.3. How you do a regression between two numbers? It is absurd.
I decided I wanted to take the Florida row and turn it into its own vector to do the analysis on.
You can take the Florida row:
foo <- state.x77["Florida",]
Now foo
is the vector of 8 parameters for Florida, but what can you do with it?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.