简体   繁体   English

限制ggplot中平滑的最大df?

[英]Limit maximum df of smooth in ggplot?

Firstly, I am very new to R, very basic statistical knowledge and have thus been winging it when it comes to my analysis.首先,我对 R 非常陌生,非常基本的统计知识,因此在我的分析中一直在努力。 This means googling the coding I need for the results, and due to how small some samples are I will have to check if they are of any statistical relevance later.这意味着搜索结果所需的编码,并且由于某些样本太小,我必须稍后检查它们是否具有任何统计相关性。 For now, though, I'm just trying to reach my goal of displaying graphs on the screen.不过,就目前而言,我只是想达到在屏幕上显示图表的目标。

I have two datasets I want to run gams for - one with 9 obs.我有两个要运行游戏的数据集 - 一个有 9 个 obs。 of 22 variables, the other with 4 obs. 22 个变量,另一个有 4 个 obs。 of 22 variables (both filtered from a source table of 44 obs. of 22 variables). 22 个变量(均从 44 个 obs. 22 个变量的源表中过滤)。 Example:例子:

Flight_Dur    Distance
 429            2396
 59.2           1096
 26.6           1174

I'm plotting the linear GAMM with mgcv with this code:我正在使用以下代码用 mgcv 绘制线性 GAMM:

GAMM_Plot <- gam(Flight_Dur ~ s(Distance, k = 4), data = my_table, method = "REML")

Since I was getting the error message "A term has fewer unique covariate combinations than specified maximum degrees of freedom", I followed this guide and added k = [number of objects I have], so 4 for one dataset and 9 for the other, to limit my df.由于我收到错误消息“一个项的唯一协变量组合少于指定的最大自由度”,我按照指南添加了 k = [我拥有的对象数],因此一个数据集为 4,另一个数据集为 9,限制我的df。 Agsin, I don't know what it does to the relevance of my results, I'm just trying to make the graphs work for now. Agsin,我不知道它对我的结果的相关性有什么影响,我只是想让图表暂时起作用。

To visualise scatterplots along with the lines, however, I used:然而,为了将散点图与线条一起可视化,我使用了:

GAMM_Plot2 <- ggplot(my_table, aes(x=Distance, y=Flight_Dur)) + 
  geom_point()+
  geom_smooth(method=gam)

Interestingly, plotting the latter won't give me an error message, however both graphs are clearly different since the second one has no limitation set for df.有趣的是,绘制后者不会给我一条错误消息,但是两个图明显不同,因为第二个图没有为 df 设置限制。 I would like to set this limitation for the ggplot code as well - how would this be possible?我也想为 ggplot 代码设置这个限制——这怎么可能?

Thank you.谢谢你。

You can specify the method to use mgcv::gam and the formula including k = 4 .您可以指定使用mgcv::gam的方法和包括k = 4的公式。

my_table <- data.frame(
  Flight_Dur = c(429, 59.2, 26.6, 30),
  Distance = c(2396, 1096, 1174, 1000)
)

library(ggplot2)
library(mgcv)
#> Loading required package: nlme
#> This is mgcv 1.8-33. For overview type 'help("mgcv-package")'.

ggplot(my_table, aes(x=Distance, y=Flight_Dur)) + 
  geom_point()+
  geom_smooth(method = mgcv::gam, formula = y ~ s(x, k = 4))

Created on 2022-09-13 by the reprex package (v1.0.0)代表 package (v1.0.0) 于 2022 年 9 月 13 日创建

However, I would be a bit careful to use a gam with so few observations.但是,我会小心地使用一个观察很少的游戏。

You can simply specify the formula in terms of y and x with the default method = "gam"您可以使用默认method = "gam"简单地根据yx指定公式

my_table <- data.frame(
  Flight_Dur = c(429, 59.2, 26.6, 30),
  Distance = c(2396, 1096, 1174, 1000)
)

library(ggplot2)
ggplot(my_table, aes(x=Distance, y=Flight_Dur)) + 
  geom_point()+
  geom_smooth(method = "gam", formula = y ~ s(x, k = 4))

If you are going to do as @starja suggests and pass a function to method , you should also set the method.args argument to also pass through method = "REML" to mgcv::gam() so as to match your actual gam() call:如果您要按照@starja 的建议进行操作并将 function 传递给method ,您还应该将method.args参数设置为也将method = "REML"传递给mgcv::gam()以匹配您的实际gam()调用:

ggplot(my_table, aes(x=Distance, y=Flight_Dur)) + 
  geom_point()+
  geom_smooth(method = mgcv::gam, formula = y ~ s(x, k = 4),
              method.args = list(method = "REML"))

I don't expect much difference without setting method.args for smooths with k = 4 , but in general GCV and REML smoothness selection can give noticeably different fits.如果不为k = 4的平滑设置method.args ,我预计不会有太大差异,但一般而言,GCV 和 REML 平滑选择可以提供明显不同的拟合。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM