简体   繁体   English

mgcv gam()错误:模型的系数比数据多

[英]mgcv gam() error: model has more coefficients than data

I am using GAM (generalized additive models) for my dataset. 我正在为我的数据集使用GAM (广义加性模型)。 This dataset has 32 observations, with 6 predictor variables and a response variable (namely power). 该数据集具有32个观测值,其中包含6个预测变量和一个响应变量(即幂)。 I am using gam() function of the mgcv package to fit the models. 我正在使用mgcv软件包的gam()函数来拟合模型。 Whenever, I try to fit a model I do get an error message as: 每当我尝试拟合模型时,都会收到以下错误消息:

Error in gam(formula.hh, data = data, na.action = na.exclude,  : 
  Model has more coefficients than data

From this error message, I infer that I have more predictor variables as compared to the number of observations. 从该错误消息中,我推断出与观察数相比,我有更多的预测变量。 I guess this error is generated during cross-validation procedures. 我猜这个错误是在交叉验证过程中产生的。 Is there any way to handle this error? 有什么办法可以解决这个错误?

I am using following code for this, 我为此使用以下代码,

library(mgcv)
formula.hh <- as.formula(power ~ s(temperature) 
                                + s(prevday1) + s(prevday2)
                                + s(prev_2_hour) + s(prev_instant1))
model <- gam(formula.hh, data = data, na.action = na.exclude)

Here, I am attaching the data with dput() function 在这里,我使用dput()函数附加数据

> dput(data)
data <- structure(list(power = c(250.615931666667, 252.675878333333, 
1578.209605, 186.636575166667, 1062.07912666667, 1031.481235, 
1584.38902166667, 276.973836666667, 401.620463333333, 1622.50827666667, 
273.825153333333, 1511.37474333333, 291.460865, 215.138178333333, 
247.509348333333, 1140.21383833333, 1680.63441666667, 1742.44168333333, 
592.162706166667, 1610.7307, 615.857495, 1664.13551, 464.973065, 
1956.2482, 1767.94469333333, 1869.02678333333, 1806.731, 1746.3731, 
549.216605, 1425.42390166667, 1900.32575, 1766.18103333333), 
    temperature = c(31, 30, 28, 28, 27, 31, 32, 32, 30.5, 33, 
    33, 30, 32, 24, 30, 26, 28, 32, 34, 25, 32, 33, 35, 36, 36, 
    37, 35, 33, 35, 33, 35, 32), prevday1 = c(NA, 250.615931666667, 
    252.675878333333, 1578.209605, 186.636575166667, 1062.07912666667, 
    1031.481235, 1584.38902166667, 276.973836666667, 401.620463333333, 
    1622.50827666667, 273.825153333333, 1511.37474333333, 291.460865, 
    215.138178333333, 247.509348333333, 1140.21383833333, 1680.63441666667, 
    1742.44168333333, 592.162706166667, 1610.7307, 615.857495, 
    1664.13551, 464.973065, 1956.2482, 1767.94469333333, 1869.02678333333, 
    1806.731, 1746.3731, 549.216605, 1425.42390166667, 1900.32575
    ), prevday2 = c(NA, NA, 250.615931666667, 252.675878333333, 
    1578.209605, 186.636575166667, 1062.07912666667, 1031.481235, 
    1584.38902166667, 276.973836666667, 401.620463333333, 1622.50827666667, 
    273.825153333333, 1511.37474333333, 291.460865, 215.138178333333, 
    247.509348333333, 1140.21383833333, 1680.63441666667, 1742.44168333333, 
    592.162706166667, 1610.7307, 615.857495, 1664.13551, 464.973065, 
    1956.2482, 1767.94469333333, 1869.02678333333, 1806.731, 
    1746.3731, 549.216605, 1425.42390166667), prev_instant1 = c(NA, 
    237.211388333333, 455.932271666667, 367.837349666667, 1230.40137333333, 
    1080.74080166667, 1898.06056666667, 326.103031666667, 302.770571666667, 
    1859.65283333333, 281.700161666667, 1684.32288333333, 291.448878333333, 
    214.838578333333, 254.042623333333, 1380.14074333333, 824.437228333333, 
    1660.46284666667, 268.004111666667, 1715.02763333333, 1853.08503333333, 
    1821.31845, 1173.91945333333, 1859.87353333333, 1887.67635, 
    1760.29563333333, 1876.05421666667, 1743.10665, 366.382048333333, 
    1185.16379, 1713.98534666667, 1746.36006666667), prev_instant2 = c(NA, 
    275.55167, 242.638122833333, 220.635857, 1784.77271666667, 
    1195.45020333333, 590.114391666667, 310.141536666667, 1397.3184605, 
    1747.44398333333, 260.10318, 1521.77355833333, 283.317726666667, 
    206.678135, 231.428693833333, 235.600631666667, 232.455201666667, 
    281.422625, 256.470893333333, 1613.82088333333, 1564.34841666667, 
    1795.03498333333, 1551.64725666667, 1517.69289833333, 1596.66556166667, 
    2767.82433333333, 2949.38005, 328.691775, 389.83789, 1805.71815333333, 
    1153.97645666667, 1752.75968333333), prev_2_hour = c(NA, 
    219.024983, 313.393630708333, 263.748829166667, 931.193606666667, 
    699.399163791667, 754.018962083334, 272.22309625, 595.954508875, 
    1597.21487208333, 512.64361, 1236.42579666667, 281.200373333334, 
    196.983981666666, 230.327737625, 525.483920416666, 391.120302791667, 
    610.101280416667, 247.710625543785, 978.741044166665, 979.658926666667, 
    1189.25306041667, 814.840889166667, 989.059700416665, 1352.2367025, 
    1770.20417833333, 1847.11590666667, 843.191556416666, 363.50806625, 
    904.924465041666, 841.746712500002, 1747.73452958333)), .Names = c("power", 
"temperature", "prevday1", "prevday2", "prev_instant1", "prev_instant2", 
"prev_2_hour"), class = "data.frame", row.names = c(NA, 32L))

This dataset has 32 observations. 该数据集具有32个观察值。

Actually, only 30 as two rows have NA . 实际上,只有30个两行具有NA

From this error message, I infer that I have more predictor variables as compared to the number of observations. 从该错误消息中,我推断出与观察数相比,我有更多的预测变量。

Yes. 是。 By default, the s() choose basis dimension (or rank) to be 10 for 1D smoother, giving 10 raw parameters. 默认情况下,对于一维平滑器而言, s()选择基本尺寸(或等级)为10,并提供10个原始参数。 After centering constraint (see ?identifiability ) you get one fewer parameter, but you still have 9 parameters for each smooth. 在居中约束之后(请参见?identifiability ),您得到的参数减少了一个,但每个平滑度仍然有9个参数。 Note that you have 5 smooths! 请注意,您有5个平滑! So you have 45 parameters for smooth terms, plus an intercept. 因此,您有45个用于平滑项的参数以及一个截距。 This is greater than your 30 data. 这大于您的30个数据。

I guess this error is generated during cross-validation procedures. 我猜这个错误是在交叉验证过程中产生的。

No. This error is detected as soon as GAM formula has been interpreted and model frame been constructed. 不会。一旦解释了GAM公式并构建了模型框架,就可以检测到此错误。 Even before real basis construction we can already know what is n (number of data) and what is p (number of parameters). 甚至在进行实数基础构建之前,我们就已经知道什么是n (数据数量)和什么是p (参数数量)。

Is there any way to handle this error? 有什么办法可以解决这个错误?

Reduce k manually rather than using default. 手动减少k而不是使用默认值。 However for cubic spline the minimum k is 3. For example, use s(temperature, bs = 'cr', k = 3) . 但是,对于三次样条曲线,最小值k为3。例如,使用s(temperature, bs = 'cr', k = 3) Note I have also set bs = 'cr' to use natural cubic spline, not the default bs = 'tp' for thin-plate regression spline. 注意,我还设置了bs = 'cr'以使用自然三次样条,而不是薄板回归样条的默认bs = 'tp' You can use it of course, but for 1D smooth "cr" is a natural choice. 您当然可以使用它,但是对于一维平滑"cr"是很自然的选择。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 mgcv:错误模型的系数比数据多,与gam()中的参数有关 - mgcv: Error Model has more coefficients than data, related to the argument by in the gam() 使用 jsonlite 序列化 mgcv gam model 时出错 - Error when serializing an mgcv gam model with jsonlite 使用比例数据在 mgcv 中运行二项式 GAM 时出错 - Error running binomial GAM in mgcv with proportional data 拟合随机效果Z20F35E630DAF44DBFA4C3F68F53999999999999999999999999999999999999999DAFAM()而不是GAM()ZC1C425268E68E6855174C174F174140278E608ENENENENENENENENENENENENENENED时,错误 - error when fitting random effects model using bam() rather than gam() function in mgcv package, R 如何修复mgcv中gam()中的错误'terms.formula(公式,数据=数据)中的错误:ExtractVars中的无效model公式' - How to fix error in gam() in mgcv 'Error in terms.formula(formula, data = data) : invalid model formula in ExtractVars' ordisurf 与 mgcv:gam 模型 - ordisurf vs mgcv:gam model 在 R 中执行 GAM(mgcv 包)时出现“model.matrix.formula(form, data) 中的错误:数据必须是 data.frame” - "Error in model.matrix.formula(form, data) : data must be a data.frame" while doing a GAM (mgcv package) in R gam模型拟合值中的Beta族大于1且小于0。这是怎么回事? (mgcv) - Beta family in gam model fitting values greater than 1 and less than 0. Whats going on? (mgcv) 是否可以在 mgcv 中为 GAM model 添加进度条? - Is it possible to add a progess bar to GAM model in mgcv? 您如何比较gam模型和gamm模型? (mgcv) - How do you compare a gam model with a gamm model? (mgcv)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM