简体   繁体   English

R语言中预测学生成绩的回归模型

[英]Regression model to predict student's grade in R

Please I need your help!拜托我需要你的帮忙!

I have data for 2017 with the folowing variables:我有 2017 年的数据,其中包含以下变量:

Age : Numeric年龄数字

Gender : Gender Value M=Male, F=Female, X=Indeterminate/Intersex/Unspecified性别性别值 M=男性,F=女性,X=不确定/双性人/未指定

Postal Postcode : Numeric Code邮政编码数字代码

Residential postcode : 1 = Major Cities, 2 = Inner Regional,3 = Outer Regional, 4 = Remote and 5 = Very Remote Socio-Economic: *0-99 where 0 is low Socio-Economic and 99 is high *住宅邮政编码1 = 主要城市,2 = 内部区域,3 = 外部区域,4 = 偏远和 5 = 非常偏远的社会经济:*0-99,其中 0 为低社会经济,99 为高 *

School Code : Numeric Code学校代码:数字代码

Educational attainment of first parent : Numeric第一父母的教育程度数字

Educational attainment of second parent : Numeric第二位家长的教育程度数字

Grade : Numeric between 0 and 100等级0 到 100 之间的数字

I would like to training on 2017 data to predict student's grade in 2018 (for example, if we have a student got grade 80 and in 2018 we have a student with the same variables or very similar so the predicted grade should something close to 80)我想对 2017 年的数据进行训练,以预测 2018 年学生的成绩(例如,如果我们有一个学生的成绩为 80,而在 2018 年我们有一个具有相同变量或非常相似的学生,因此预测成绩应该接近 80)

//////////////////////////////////////////////////////////////////////////////// ///////////////////////////////////////////////// //////////////////////////////

Thank you, vitalious!谢谢,生机勃勃! I have used your script and I got the results!我已经使用了你的脚本,我得到了结果! Here's the script I used and the data:这是我使用的脚本和数据:

data<-read.csv("Olddata.csv")
newdata<-read.csv("Newdata.csv")

model <- lm(Age~., data=data)
nextYear <- data
nextYear$Age <- nextYear$Age + 1
results <- predict(model, newdata=nextYear, type='response')

Assume that we have only the following variables:假设我们只有以下变量:

Age Gender Postal.Postcode Grade 20 F 3191 89.6 20 M 3930 99 20 F 3126 99.2 21 M 3910 94.65年龄 性别 邮政编码 等级 20 F 3191 89.6 20 M 3930 99 20 F 3126 99.2 21 M 3910 94.65

And the newdata could be anything with the same number of variables.并且新数据可以是具有相同数量变量的任何东西。

The output was something like: 1 2 3 4输出类似于:1 2 3 4
20.09547 20.48317 19.82224 20.55038 20.09547 20.48317 19.82224 20.55038

But actually, the output I want is the actual grade for each student out of 100!但实际上,我想要的输出是每个学生的实际成绩(满分 100)!

What you're looking for is a linear regression model.您正在寻找的是线性回归模型。 In R, it's invoked with lm() .在 R 中,它是用lm()调用的。 You can read more here .您可以在此处阅读更多内容。 You'd want to fit a model predicting the grade, and then run the model on the data with the Age incremented by one, since presumably, that is the only attribute that will be changing next year.您可能想要拟合一个预测成绩的模型,然后在数据上运行模型,年龄加 1,因为据推测,这是明年将发生变化的唯一属性。

Assuming your data is in a dataframe called data, it would look something like this:假设您的数据位于名为 data 的数据框中,它看起来像这样:

model <- lm(Age~., data=data)

nextYear <- data
nextYear$Age <- nextYear$Age + 1
results <- predict(model, newdata=nextYear, type='response')

Make sure that all non-numeric columns are factors.确保所有非数字列都是因子。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM