I've been experimenting with a variety of the different spline functions available in R to characterize a very small data set. I imagine that with a much larger data set, any number of curves would behave as I would expect, but the data in this case are limited. The code below shows an example of the types of data I am working with:
library(ggplot2); library(stats)
dat <- data.frame(x = c(0.333, 0.5, 1, 2, 3, 4, 5),
y = c(5.875e-03, 1.225e-02, 3.902e-02, 8.942e-03,
4.277e-03, 1.938e-03, 1.131e-03))
mod <- splinefun(dat$x, dat$y, method = "monoH.FC")
mod <- data.frame(x = seq(0.333, 5, by = 0.1), y = mod(seq(0.333, 5, by = 0.1)))
ggplot() + geom_point(data = dat, aes(x = x, y = y)) +
geom_line(data = mod, aes(x = x, y = y))
So far, the monotone Hermite spline is what fits best, but it still has some problems.
Intuitively, I can tell you what the curve here should look like. It should have a maximum at x = 1
and should not have that dip at x = 2.5
. The curve does not seem like it should be difficult to recreate; it is asymmetric with a left skew and a predictable tail.
Is there a "better" way to produce a spline function that more properly fits (what I assume is) a common data set, or alternatively, is there a better tool than splines for fitting curves to small data sets?
It sounds like what you're after is to make the fit closer to linear in-between, I think you can force that by interpolating the midpoint as a real point:
dat2 = data.frame(x = union(dat$x,dat$x - c(0,diff(dat$x)/2)),
y = interp1(dat$x,dat$y,xi = union(dat$x,dat$x - c(0,diff(dat$x)/2))))
( interp1
may be unnecessary here, union(dat$y,dat$y - c(0,diff(dat$y)/2))
should do the same, but the code above works.)
EDIT: Note, in order for diff
to work, you need your data to be properly ordered first
this creates a new data.frame with points in between the previous ones, if you now spline it, you are weighting a more linear fit
EDIT2: You could also use smoothing splines with weights this way, and set the weights of the points in between lower than the weights of the primary points:
mod <- splinefun(dat$x, dat$y,method = 'monoH.FC')
mod2 <- data.frame(x = seq(0.333, 5, by = 0.1), y = mod(seq(0.333, 5, by = 0.1)))
# A set of weights, where each point in-between is weighted half as much
dat2$w <- rep(c(0.5,1),ceiling(length(dat2$x)/2))[-1]
# Smoothing Spline
modelspline <- smooth.spline(dat2$x, dat2$y,dat2$w)
# Plot points
xplot <- seq(min(dat2$x),max(dat2$x),by = 0.1)
# And Plot comparison
ggplot() +
geom_point(data = dat, aes(x = x, y = y)) +
geom_line(data = mod2, aes(x = x, y = y)) +
geom_line(data = data.frame(predict(modelspline,xplot)),
aes(x = x, y = y),color = 'red')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.