简体   繁体   中英

How can I peform a simple linear regression model on this data?

So I would like to create a linear regression model, with rocket price (written as rocket) against the data of launch (datum). I believe I can do this by doing: lm(Y ~ X). However, how would I be able to convert the prices from chr to num, and likewise for the dates?

Thank you!

空间数据截图

Data:https://www.kaggle.com/agirlcoding/all-space-missions-from-1957

Effectively you are asking 3 different but very basic questions, which would be better learned by reading an introductory text than by posting a question on Stack Overflow.

  1. How do I convert character data to numeric data for the Rocket column?

Depending on what version of R you are using, the column spaceData$Rocket will be either a character vector or a factor vector. To cover both eventualities, you can do:

spaceData$Rocket <- as.numeric(as.character(spaceData$Rocket))

This will give you a warning that some NA values were produced. That's OK - there are some blank cells in the column, so you want these to be NA .

  1. How do I convert the column spaceData$Datum from text to actual date times?

In this case, you can use strptime , and specify how the date string is formatted. We will also wrap this in as.POSIXct to ensure that the data is formatted in a way that is easier to plot:

spaceData$Datum <- as.POSIXct(strptime(spaceData$Datum, "%a %b %d, %Y %H:%M"))
  1. How do I do a linear regression using these two variables?

Before you attempt a linear regression, it is a good idea to make sure it is sensible to do a linear regression. For a linear regression to make sense, you should know that there is an approximately linear relationship between the two variables, and that the residuals are approximately normally distributed. An easy way to examine these assumptions is to plot the two variables:

plot(spaceData$Datum, spaceData$Rocket)

在此处输入图片说明

You don't need to be a statistician to see that any straight line through these points is going to be pretty hopeless as a description of the relationship. If we try it, we can see that:

abline(lm(Rocket ~ Datum, data = spaceData), col = "red")

在此处输入图片说明

So, by running a linear regression on this data, we can predict that the price of rockets will fall to zero on the 13th May 2036. Clearly this is nonsense.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM