简体   繁体   中英

Finding piecewise polynomial coefficients from R's b-spline bs() function

I'm writing a (much larger) data analysis and graphing program that I don't need to go into detail about. The dataset I am using is google trends for the term "Artificial Intelligence" worldwide since 2004. This gives two columns; months since 2004 and search interest level. I am trying to extract the piecewise polynomials from the inbuild b-spline function, bs, as they are necessary to be able to graph them. Specifically I have been using the R library SplinesUtils written by Zheyuan Li and referenced here , another stackOverflow thread.

My problem is not getting the package working or using the functions, it's that the supposedly correctly working functions don't seem to be giving me accurate polynomials. Here's why I think that: google trends data picture the given polynomials in R those polynomials plotted in desmos You can see that the polynomials generated don't seem to match the data. Obviously I haven't added the boundaries but they should fairly closely match the data even regardless of that.

I have emailed the creator of the library and explained my problem. However I am not totally sure it's a problem with the library and more with my usage of the bs() function. Have I got the x and y the wrong way round? Is the syntax slightly incorrect? Yeah, I'm new to R and to splines so I am not sure about all of this.

I downloaded the data from google and called it AIData.csv, and I wasn't sure how to host it so that anyone answering this question could look so I just put it in a pastebin. https://pastebin.com/itQcWWSg

library(SplinesUtils)
pyin <- c("AIData.csv","the directory you save this R file in (which should also have AIData.csv in it)")
setwd <- pyin[2]#sets working directory to the above string
csvfile <- read.csv(file=pyin[1],header=TRUE)#reads the csv file into a dataframe with headers
names(csvfile) <- c("months","searchInterest")#renames the headers becuase they're very long and cause formatting issues
model <- lm(csvfile$searchInterest ~ bs(csvfile$months, df=5))#a linear model of months against a bspline of search interest
piecewisePoly <- RegBsplineAsPiecePoly(model, "bs(csvfile$months, df = 5)",shift=FALSE)#creates the piecewise polynomials
piecewisePoly
piecewisePoly$PiecePoly$coef

I expected the piecewise polynomials to follow the same lines roughly as the graph of the google search trend. It didn't; see the desmos link above. The direct output from running the above code is this:

Loading required package: splines
3 piecewise polynomials of degree 3 are constructed!
Use 'summary' to export all of them.
The first 3 are printed below.
3.1 - 3.14 * x - 0.047 * x ^ 2 - 0.000246 * x ^ 3
-34.5 - 1.16 * x - 0.0123 * x ^ 2 - 4.27e-05 * x ^ 3
-544 + 12.3 * x + 0.107 * x ^ 2 + 0.00031 * x ^ 3
              [,1]          [,2]          [,3]
[1,]  3.0953478761 -3.448227e+01 -5.435058e+02
[2,] -3.1420823054 -1.164313e+00  1.234959e+01
[3,]  0.0469800796  1.228237e-02 -1.073097e-01
[4,] -0.0002456503 -4.273970e-05  3.100391e-04
[Finished in 0.7s]

Zheyuan Li replied to my email with some clarification. I'll post it below for anyone with the same query.

" You forget about the model intercept in model$coefficients[1]. You need to add this intercept to each piecewise polynomial in order to recover the fitted values. You can do it with

finalcoef <- piecewisePoly$PiecePoly$coef
finalcoef[1, ] <- finalcoef[1, ] + model$coefficients[1] 
finalcoef

I think this is the most confusing part of the package: the reported spline is not the fitted values. I mentioned this only weakly in examples under ?RegBsplineAsPiecePoly, in a way that is probably not clear enough. "

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM