简体   繁体   English

一个线性数组的R线性模型(lm)预测函数

[英]R linear model (lm) predict function with one single array

I have an lm model in R that I have trained and serialized. 我在R中有一个经过训练和序列化的lm模型。 Inside a function, where I pass as input the model and a feature vector (one single array), I have: 在一个函数内部,我将模型和特征向量(一个单个数组)作为输入传递,我有:

CREATE OR REPLACE FUNCTION lm_predict(
    feat_vec float[],
    model bytea
)
RETURNS float
AS
$$
    #R-code goes here.
    mdl <- unserialize(model)
    # class(feat_vec) outputs "array"
    y_hat <- predict.lm(mdl, newdata = as.data.frame.list(feat_vec))
    return (y_hat)
$$ LANGUAGE 'plr';

This returns the wrong y_hat !! 这将返回错误的y_hat I know this because this other solution works (the inputs to this function are still the model (in a bytearray) and one feat_vec (array)): 我知道这一点是因为这个其他解决方案有效(此函数的输入仍然是模型(在feat_vec数组中)和一个feat_vec (数组)):

CREATE OR REPLACE FUNCTION lm_predict(
    feat_vec float[],
    model bytea
)
RETURNS float
AS
$$
    #R-code goes here.
    mdl <- unserialize(model)
    coef = mdl$coefficients
    y_hat = coef[1] + as.numeric(coef[-1]%*%feat_vec)
    return (y_hat)
$$ LANGUAGE 'plr';

What am I doing wrong?? 我究竟做错了什么?? It is the same unserialized model, the first option should give me the right answer as well... 这是相同的非序列化模型,第一个选项也应该给我正确的答案...

The problem seems to be the use of newdata = as.data.frame.list(feat_vec) . 问题似乎是使用newdata = as.data.frame.list(feat_vec) As discussed in your previous question , this returns ugly column names. 如您在上一个问题中所讨论的,这将返回难看的列名。 While when you call predict , newdata must have column names consistent with covariates names in your model formula. 当您调用predictnewdata列名称必须与模型公式中的协变量名称一致。 You should get some warning message when you call predict . 调用predict时,您应该收到一些警告消息。

## example data
set.seed(0)
x1 <- runif(20)
x2 <- rnorm(20)
y <- 0.3 * x1 + 0.7 * x2 + rnorm(20, sd = 0.1)

## linear model
model <- lm(y ~ x1 + x2)

## new data
feat_vec <- c(0.4, 0.6)
newdat <- as.data.frame.list(feat_vec)
#  X0.4 X0.6
#1  0.4  0.6

## prediction
y_hat <- predict.lm(model, newdata = newdat)
#Warning message:
#'newdata' had 1 row but variables found have 20 rows 

What you need is 您需要的是

newdat <- as.data.frame.list(feat_vec,
                             col.names = attr(model$terms, "term.labels"))
#   x1  x2
#1 0.4 0.6

y_hat <- predict.lm(model, newdata = newdat)
#        1 
#0.5192413 

This is the same as what you can compute manually: 这与您可以手动计算的内容相同:

coef = model$coefficients
unname(coef[1] + sum(coef[-1] * feat_vec))
#[1] 0.5192413 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM