简体   繁体   English

如何使用 mgcv::gam 进行插值和“预测”?

[英]How to interpolate and `predict` using mgcv::gam?

I've begun by mastering how to use splines to interpolate 1-dimentional function.我从掌握如何使用样条线插入一维函数开始。

model = spline(bdp[,4]~bdp[,1])

I could then use然后我可以使用

predict(model, c(0))

to predict function value in point 0.预测点 0 的函数值。

Then I've searched the Internet to find something to spline 3-dimentional data and I came across an answer on stackoverflow suggesting that mgcv::gam is the best choice.然后我在互联网上搜索了一些东西来对 3 维数据进行样条分析,我在 stackoverflow 上找到了一个答案,表明mgcv::gam是最好的选择。

And so I tried:所以我试过:

model=gam(bdp[,4]~s(bdp[,1],bdp[,2],bdp[,3]))

and then I did:然后我做了:

predict(model, newdata=c(0,0,0), type="response")

hoping that it will return a value of spline interpolation for point (0,0,0).希望它会返回点(0,0,0)的样条插值值。 It calculated for a while and returned lots of multidimentional data that I could not understand.它计算了一段时间并返回了许多我无法理解的多维数据。

I must be doing something wrong.我一定做错了什么。 What do I do to receive a value for a single point from gam object?如何从 gam 对象接收单个点的值? And, just to be sure, can you agree/disagree that gam is the right choice to interpolate splines for 3D data or would you suggest something else?而且,可以肯定的是,您是否同意/不同意 gam 是为 3D 数据插入样条的正确选择,或者您会建议其他什么?

I'm adding a reproducible example.我正在添加一个可重现的示例。

This is a data file (please unpack in c:/r/) https://www.sendspace.com/file/b4mazl这是一个数据文件(请在 c:/r/ 中解压) https://www.sendspace.com/file/b4mazl

# install.packages("mgcv")

library(mgcv)

bdp = read.table("c:/r/temp_bdp.csv")
bdg=gam(bdp[,4]~s(bdp[,1],bdp[,2],bdp[,3]))

#this returns lots of data, not just function value that I wanted.
predict(bdg, newdata=data.frame(0,0,0,0), type="response")

Minimal reproducible example:最小可重现示例:

tmp = t(matrix(runif(4*200),4))
tmpgam=gam(tmp[,4]~s(tmp[,1],tmp[,2],tmp[,3]))
predict(tmpgam, newdata=data.frame(0,0,0,0), type="response")

For predict(bdg, newdata=data.frame(0,0,0,0), type="response")对于 predict(bdg, newdata=data.frame(0,0,0,0), type="response")

it returns a lot of numbers any warns that newdata didn't have enough data它返回很多数字,任何警告 newdata 没有足够的数据

for为了

predict(bdg, c(0,0,0,0), type="response")

it returns nothing and also warns about the same.它不返回任何内容并警告相同。

So with nearly all types of models you fit, if you plan to use the predict function, it's best to use a "proper" formula with column names rather than using matrix/data.frame slices.因此,对于您适合的几乎所有类型的模型,如果您打算使用predict函数,最好使用带有列名的“正确”公式,而不是使用 matrix/data.frame 切片。 The reason is that when predict runs, it matches the values in newdata to the model using the names in both so they should match identically.原因是当 predict 运行时,它使用两者中的名称将newdata中的值与模型匹配,因此它们应该完全匹配。 When you index the data.frame like that, it create weird names in the model.当您像这样索引 data.frame 时,它​​会在模型​​中创建奇怪的名称。 Do the best way to fit the model and predict is拟合模型和预测的最佳方法是

bdg <- gam(V4~s(V1,V2,V3), data=bdp)
predict(bdg, newdata=data.frame(V1=0, V2=0, V3=0))
#           1 
# 85431440244 

That's assuming那是假设

names(bdp)
# [1] "V1" "V2" "V3" "V4"

So here we fit with "V1","V2","V3" and newdata has columns "V1","V2" and "V3"所以这里我们适合“V1”、“V2”、“V3”, newdata有“V1”、“V2”和“V3”列

So i've only focused on the R-coding part.所以我只关注 R 编码部分。 As far as the question if this is an appropriate analysis is better fitted for https://stats.stackexchange.com/至于这是否是一个适当的分析的问题更适合https://stats.stackexchange.com/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM