简体   繁体   English

R中小数据集的样条函数

[英]Spline functions for small data sets in R

I've been experimenting with a variety of the different spline functions available in R to characterize a very small data set. 我一直在尝试使用R中可用的各种不同的样条函数来表征非常小的数据集。 I imagine that with a much larger data set, any number of curves would behave as I would expect, but the data in this case are limited. 我想如果使用更大的数据集,任何数量的曲线都可以像我期望的那样运行,但是这种情况下的数据是有限的。 The code below shows an example of the types of data I am working with: 以下代码显示了我正在使用的数据类型的示例:

library(ggplot2); library(stats)

dat <- data.frame(x = c(0.333, 0.5, 1, 2, 3, 4, 5),
                  y = c(5.875e-03, 1.225e-02, 3.902e-02, 8.942e-03,
                        4.277e-03, 1.938e-03, 1.131e-03))


mod <- splinefun(dat$x, dat$y, method = "monoH.FC")
mod <- data.frame(x = seq(0.333, 5, by = 0.1), y = mod(seq(0.333, 5, by = 0.1)))

ggplot() + geom_point(data = dat, aes(x = x, y = y)) +
geom_line(data = mod, aes(x = x, y = y))

曲线示例

So far, the monotone Hermite spline is what fits best, but it still has some problems. 到目前为止,最适合使用单调Hermite样条,但仍然存在一些问题。

Intuitively, I can tell you what the curve here should look like. 直观地说,我可以告诉您这里的曲线应该是什么样子。 It should have a maximum at x = 1 and should not have that dip at x = 2.5 . 它应该在x = 1处有最大值,并且在x = 2.5处不应该有该下降。 The curve does not seem like it should be difficult to recreate; 曲线似乎不应该很难重新创建。 it is asymmetric with a left skew and a predictable tail. 它是不对称的,具有左偏斜和可预测的尾巴。

Is there a "better" way to produce a spline function that more properly fits (what I assume is) a common data set, or alternatively, is there a better tool than splines for fitting curves to small data sets? 有没有一种“更好”的方法来产生更适合拟合(我认为是)通用数据集的样条函数,或者是否有比样条线更好的工具将曲线拟合到小数据集?

It sounds like what you're after is to make the fit closer to linear in-between, I think you can force that by interpolating the midpoint as a real point: 听起来您想要的是使拟合更接近中间的线性,我认为您可以通过将中点内插为真实点来强制这样做:

dat2 = data.frame(x = union(dat$x,dat$x - c(0,diff(dat$x)/2)), 
                  y = interp1(dat$x,dat$y,xi = union(dat$x,dat$x - c(0,diff(dat$x)/2))))

( interp1 may be unnecessary here, union(dat$y,dat$y - c(0,diff(dat$y)/2)) should do the same, but the code above works.) interp1可能在这里是不必要的, union(dat$y,dat$y - c(0,diff(dat$y)/2))应该做同样的事情,但是上面的代码有效。)

EDIT: Note, in order for diff to work, you need your data to be properly ordered first 编辑:注意,为了使diff工作,您需要先对数据进行正确排序

this creates a new data.frame with points in between the previous ones, if you now spline it, you are weighting a more linear fit 这将创建一个新的data.frame,其点在先前的点之间,如果现在进行样条化,则将加权更线性的拟合

EDIT2: You could also use smoothing splines with weights this way, and set the weights of the points in between lower than the weights of the primary points: EDIT2:您还可以通过这种方式使用带有权重的平滑样条线,并将点之间的权重设置为低于主要点的权重:

mod <- splinefun(dat$x, dat$y,method = 'monoH.FC')
mod2 <- data.frame(x = seq(0.333, 5, by = 0.1), y = mod(seq(0.333, 5, by = 0.1)))


# A set of weights, where each point in-between is weighted half as much
dat2$w <- rep(c(0.5,1),ceiling(length(dat2$x)/2))[-1]

# Smoothing Spline
modelspline <- smooth.spline(dat2$x, dat2$y,dat2$w)

# Plot points
xplot <- seq(min(dat2$x),max(dat2$x),by = 0.1)

# And Plot comparison
ggplot() + 
  geom_point(data = dat, aes(x = x, y = y)) + 
  geom_line(data = mod2, aes(x = x, y = y)) + 
  geom_line(data = data.frame(predict(modelspline,xplot)),
        aes(x = x, y = y),color = 'red')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM