简体   繁体   中英

How to interpolate and `predict` using mgcv::gam?

I've begun by mastering how to use splines to interpolate 1-dimentional function.

model = spline(bdp[,4]~bdp[,1])

I could then use

predict(model, c(0))

to predict function value in point 0.

Then I've searched the Internet to find something to spline 3-dimentional data and I came across an answer on stackoverflow suggesting that mgcv::gam is the best choice.

And so I tried:

model=gam(bdp[,4]~s(bdp[,1],bdp[,2],bdp[,3]))

and then I did:

predict(model, newdata=c(0,0,0), type="response")

hoping that it will return a value of spline interpolation for point (0,0,0). It calculated for a while and returned lots of multidimentional data that I could not understand.

I must be doing something wrong. What do I do to receive a value for a single point from gam object? And, just to be sure, can you agree/disagree that gam is the right choice to interpolate splines for 3D data or would you suggest something else?

I'm adding a reproducible example.

This is a data file (please unpack in c:/r/) https://www.sendspace.com/file/b4mazl

# install.packages("mgcv")

library(mgcv)

bdp = read.table("c:/r/temp_bdp.csv")
bdg=gam(bdp[,4]~s(bdp[,1],bdp[,2],bdp[,3]))

#this returns lots of data, not just function value that I wanted.
predict(bdg, newdata=data.frame(0,0,0,0), type="response")

Minimal reproducible example:

tmp = t(matrix(runif(4*200),4))
tmpgam=gam(tmp[,4]~s(tmp[,1],tmp[,2],tmp[,3]))
predict(tmpgam, newdata=data.frame(0,0,0,0), type="response")

For predict(bdg, newdata=data.frame(0,0,0,0), type="response")

it returns a lot of numbers any warns that newdata didn't have enough data

for

predict(bdg, c(0,0,0,0), type="response")

it returns nothing and also warns about the same.

So with nearly all types of models you fit, if you plan to use the predict function, it's best to use a "proper" formula with column names rather than using matrix/data.frame slices. The reason is that when predict runs, it matches the values in newdata to the model using the names in both so they should match identically. When you index the data.frame like that, it create weird names in the model. Do the best way to fit the model and predict is

bdg <- gam(V4~s(V1,V2,V3), data=bdp)
predict(bdg, newdata=data.frame(V1=0, V2=0, V3=0))
#           1 
# 85431440244 

That's assuming

names(bdp)
# [1] "V1" "V2" "V3" "V4"

So here we fit with "V1","V2","V3" and newdata has columns "V1","V2" and "V3"

So i've only focused on the R-coding part. As far as the question if this is an appropriate analysis is better fitted for https://stats.stackexchange.com/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM