简体   繁体   English

R中非线性最小二乘内的样条

[英]Splines inside nonlinear least squares in R

Consider a nonlinear least squares model in R, for example of the following form): 考虑R中的非线性最小二乘模型,例如以下形式:

 y ~ theta / ( 1 + exp( -( alpha + beta * x) ) )

(my real problem has several variables and the outer function is not logistic but a bit more involved; this one is simpler but I think if I can do this my case should follow almost immediately) (我真正的问题有几个变量,外部函数不是逻辑但更多涉及;这个更简单,但我想如果我能做到这一点,我的情况应该几乎立即跟随)

I'd like to replace the term "alpha + beta * x" with (say) a natural cubic spline. 我想用(例如)自然三次样条替换术语“alpha + beta * x”。

here's some code to create some example data with a nonlinear function inside the logistic: 这里有一些代码用于在逻辑内部创建一些非线性函数的示例数据:

set.seed(438572L)
x <- seq(1,10,by=.25)
y <- 8.6/(1+exp( -(-3+x/4.4+sqrt(x*1.1)*(1.-sin(1.+x/2.9))) )) + rnorm(x, s=0.2 )

Without the need for a logistic around it, if I was in lm, I could replace a linear term with a spline term easily; 如果我在lm中不需要逻辑,我可以轻松地用样条项替换线性项; so a linear model something like this: 所以线性模型是这样的:

 lm( y ~ x ) 

then becomes 然后成为

 library("splines")
 lm( y ~ ns( x, df = 5 ) )

generating fitted values is simple and getting predicted values with the aid of (for example) the rms package seems simple enough. 生成拟合值很简单,并借助于(例如)rms包得到预测值似乎很简单。

Indeed, fitting the original data with that lm-based spline fit isn't too bad, but there's a reason I need it inside the logistic function (or rather, the equivalent in my problem). 实际上,将原始数据与基于lm的样条拟合拟合并不是太糟糕,但我有理由在逻辑函数中需要它(或者更确切地说,在我的问题中等价)。

The problem with nls is I need to provide names for all the parameters (I'm quite happy with calling them say (b1, ..., b5) for one spline fit (and say c1, ... , c6 for another variable - I'll need to be able to make several of them). nls的问题是我需要为所有参数提供名称(我很高兴他们称之为(b1,...,b5)为一个样条拟合(并说c1,...,c6为另一个变量) - 我需要能够制作其中的几个)。

Is there a reasonably neat way to generate the corresponding formula for nls so that I can replace the linear term inside the nonlinear function with a spline? 是否有一种合理的方法来生成nls的相应公式,以便我可以用样条函数替换非线性函数内的线性项?

The only ways I can figure that there could be to do it are a bit awkward and clunky and don't nicely generalize without writing a whole bunch of code. 我能想到的唯一方法就是可以做到这一点有点尴尬和笨重,如果不编写一大堆代码就不能很好地概括。

( edit for clarification ) For this small problem, I can do it by hand of course - write out an expression for inner product of every variable in the matrix generated by ns , times the vector of parameters. 编辑以供澄清 )对于这个小问题,我当然可以手工完成 - 写出由ns生成的矩阵中每个变量的内积的表达式,乘以参数的向量。 But then I have to write the whole thing out term-by-term again for each spline in every other variable, and again every time I change the df in any of the splines, and again if I want to use cs instead of ns. 但是,我必须为每个其他变量中的每个样条再次逐个编写整个项目,并且每次我在任何样条曲线中更改df时再次编写,并且如果我想使用cs而不是ns,则再次。 And then when I want to try to do some prediction(/interpolation), we get a whole new slew of issues to be dealt with. 然后,当我想尝试做一些预测(/插值)时,我们会得到一系列新的问题需要处理。 I need to keep doing it, over and over, and potentially for a substantially larger number of knots, and over several variables, for analysis after analysis - and I wondered if there was a more neat, simple way than writing out each individual term, without having to write a great deal of code. 我需要一遍又一遍地继续这样做,并且可能需要大量的结和几个变量,以便在分析后进行分析 - 我想知道是否有一种比写出每个单独术语更简洁,更简单的方法,无需编写大量代码。 I can see a fairly bull-at-a-gate way to do it that would involve a fair bit of code to get right, but being R, I suspect there's a much neater way (or more likely 3 or 4 neater ways) that's simply eluding me. 我可以看到一个相当牛逼的方式,这将涉及到相当多的代码,但是作为R,我怀疑有更简洁的方式(或更可能是3或4个更简洁的方式)只是躲避我。 Hence the question. 因此问题。

I thought I had seen someone do something like this in the past in a fairly nice way, but for the life of me I can't find it now; 我以为我曾经看到有人在过去以相当不错的方式做过这样的事情,但对于我的生活,我现在找不到它; I've tried a bunch of times to locate it. 我已经尝试了很多次来找到它。

[More particularly, I'd generally like to be able to try the fit any of several different splines in each variable - to try a couple of possibilities - in order to see if I could find a simple model, but still one where the fit is adequate for the purpose (noise is really quite low; some bias in the fit is okay to achieve a nice smooth result, but only up to a point). [更具体地说,我通常希望能够尝试适合每个变量中的几个不同样条曲线 - 尝试几种可能性 - 以便看看我是否能找到一个简单的模型,但仍然适合这个目的是足够的(噪音真的非常低;合适的偏差可以达到很好的平滑效果,但只能达到一定程度)。 It's more 'find a nice, interpretable, but adequate fitting function' than anything approaching inference and data mining isn't really an issue for this problem.] 它更像是“找到一个漂亮的,可解释的,但足够的拟合函数”,而不是任何接近推理和数据挖掘的东西都不是这个问题的真正问题。

Alternatively, if this would be much easier in say gnm or ASSIST or one of the other packages, that would be useful knowledge, but then some pointers on how to proceed on the toy problem above with them would help. 或者,如果这比gnm或ASSIST或其他包装更容易,那将是有用的知识,但是关于如何继续上述玩具问题的一些指示将有所帮助。

ns actually generates a matrix of predictors. ns实际上生成了一个预测变量矩阵。 What you can do is split that matrix out into individual variables, and feed them to nls . 您可以做的是将该矩阵拆分为单个变量,并将它们提供给nls

m <- ns(x, df=5)
df <- data.frame(y, m)  # X-variables will be named X1, ... X5
# starting values should be set as appropriate for your data
nls(y ~ theta * plogis(alpha + b1*X1 + b2*X2 + b3*X3 + b4*X4 + b5*X5), data=df,
        start=list(theta=1, alpha=0, b1=1, b2=1, b3=1, b4=1, b5=1))

ETA: here's a go at automating this for different values of df. ETA:这是针对不同df值自动执行此操作。 This constructs the formula using text munging, and then uses do.call to call nls . 这使用文本munging构造公式,然后使用do.call来调用nls Caveat: untested. 警告:未经测试。

my.nls <- function(x, y, df)
{
    m <- ns(x, df=df)
    xn <- colnames(m)
    b <- paste("b", seq_along(xn), sep="")
    fm <- formula(paste("y ~ theta * plogis(1 + alpha + ", paste(b, xn, sep="*",
          collapse=" + "), ")", sep=""))
    start <- c(1, 1, rep(1, length=length(b)))
    names(start) <- c("theta", "alpha", b)
    do.call(nls, list(fm, data=data.frame(y, m), start=start))
}

A realization I came to while clarifying my own question made me see that there's a less clunky way than I had seen before. 我在澄清自己的问题时得到的一个认识让我发现,与我以前见过的方式相比,这种方式不那么笨拙。

Even with a bit of obvious streamlining that can go in, this is still a bit inelegant to my eye, but at least bearable enough to use on a repeated basis, so I regard it as an adequate answer. 即使有一些明显的流线型可以进入,但这对我来说仍然有点不优雅,但至少可以忍受重复使用,所以我认为这是一个充分的答案。 I'm still interested in a neater way than this one below. 仍然对比下面这个更简洁的方式感兴趣。

Hong Ooi's trick of using data.frame on the matrix generated by ns to auto-name the columns is kind of cute and I have used it below. Hong Ooi在ns生成的矩阵上使用data.frame来自动命名列的技巧很可爱,我在下面使用它。 I'll likely use paste to build them in general, because I have several variables to play with. 我可能会使用粘贴来构建它们,因为我有几个变量可以使用。

Assuming the data set-up given in the question - 假设问题中给出的数据设置 -

lin.expr <- function(p,xn) {
  pn<-paste(p, 1:length(xn), sep = "")
  paste(paste(pn,xn,sep=" * "),collapse=" + ")
  }


m <- ns(x, df=3)
mydf <- data.frame(y, m)  # X-variables will be named X1, X2, ... 
xn <- names(mydf)[2:dim(mydf)[2]]

nspb <- lin.expr("b",xn)

c.form <- paste("y ~ theta * plogis( a + ",nspb,")",sep="")
stl <- list(theta=2, a=-5,b1=10, b2=10, b3=10)
nls( c.form, data=mydf, start= stl)

My actual formula will have several terms like nspb. 我的实际公式将有几个术语,如nspb。 Substantive improvements appreciated; 实质性改进得到赞赏; I'd prefer not to choose my own answer, but I guess I will pick it if there's nothing further in a day or two. 我不想选择自己的答案,但我想如果一两天内没有任何进展,我会选择它。

edit: Hong Ooi's addition (which was posted as I was typing mine in and uses similar ideas, but add a couple of nice extras) pretty much does it; 编辑:Hong Ooi的补充(发布时我正在打字并使用类似的想法,但添加了几个不错的额外内容)几乎就是这样; it's an acceptable answer, so I have checked it. 这是一个可以接受的答案,所以我已经检查过了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM