簡體   English   中英

如何在 R 中的時間序列上 plot 多項式回歸線?

[英]How to plot a polynomial regression line on a time series in R?

我偶爾使用 R 中的時間序列進行數據分析,但我不熟悉使用 ARIMA 等函數進行繪圖。

以下問題源於對美國每日 COVID 病例數的評論。 確實看起來是這樣,我想簡單地運行三次回歸,其目的是在散點圖上繪制多項式曲線。 由於這是一個時間序列,我認為使用lm() function 不會起作用。

這是代碼:

options(repr.plot.width=14, repr.plot.height=10)
 
install.packages('RCurl')
require(repr) # Enables resizing of the plots.
require(RCurl)
require(foreign)
require(tidyverse) # To tip the df from long row of dates to cols (pivot_longer())

# Extracting the number of confirmed cummulative cases by country from the Johns Hopkins website:
 
x = getURL("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv")
corona <- read.csv(textConnection(x))
 
corona = (read_csv(x)
          %>% pivot_longer(cols = -c(`Province/State`, `Country/Region`, Lat, Long),
                           names_to = "date",
                           values_to = "cases")
          %>% select(`Province/State`,`Country/Region`, date, cases)
          %>% mutate(date=as.Date(date,format="%m/%d/%y"))
          %>% drop_na(cases)
          %>% rename(country="Country/Region", provinces="Province/State")
)
 
cc <- (corona
       %>% filter(country %in% c("US"))
)
 
ccw <- (cc
        %>% pivot_wider(names_from="country",values_from="cases")
        %>% filter(US>5)
)

first.der<-diff(ccw$US, lag = 1, differences = 1)

plot(ccw$date[2:length(ccw$date)-1], first.der, 
     pch = 19, cex = 1.2,
     ylab='', 
     xlab='',
     main ='Daily COVID-19 cases in US',
     col="firebrick",
     axes=FALSE,
     cex.main=1.5)
abline(h=0)
abline(v=ccw$date[length(ccw$date)-1], col='gray90')
abline(h=first.der[length(ccw$date)-1], col='firebrick', lty=2, lwd=.5)

at1 <- seq(min(ccw$date), max(ccw$date), by=2);
axis.Date(1, at=at1, format="%b %d", las=2, cex.axis=0.7)
axis(side=2, seq(min(first.der),max(first.der),1000), 
     las=2, cex.axis=1)

在此處輸入圖像描述

對於預期的多項式回歸,我們只對索引及其多項式進行回歸。 對於多項式,我們方便地使用poly和 plot 擬合值與lines 但是,這些案例似乎遵循四次曲線而不是三次曲線。

ccw$first.der <- c(NA, diff(ccw$US))  ## better add an NA and integrate in data frame
ccw$index <- 1:length(ccw$US)

fit3 <- lm(first.der ~ poly(index , 3, raw=TRUE), ccw)  ## cubic
fit4 <- lm(first.der ~ poly(index , 4, raw=TRUE), ccw)  ## quartic

plot(first.der, main="US covid-19", xaxt="n")
tck <- c(1, 50, 100, 150)
axis(1, tck, labels=FALSE)
mtext(ccw$date[tck], 1, 1, at=tck)
lines(fit3$fitted.values, col=3, lwd=2)
lines(fit4$fitted.values, col=2, lwd=2)
legend("topleft", c("cubic", "quartic"), lwd=2, col=3:2)

在此處輸入圖像描述

我無法下載您的數據,因此我提供了一個使用mtcars數據集的示例。 您可以使用poly()I()獲得多項式回歸:

set.seed(123)

qubic_model <- lm(mpg ~ hp + I(hp^2) + I(hp^3), data = mtcars)
min_hp <- min(mtcars$hp)
max_hp <- max(mtcars$hp)
grid_hp <- seq(min_hp, max_hp, by = 0.1)
qubic_model_line <- predict(qubic_model, data.frame(hp = grid_hp, `I(hp^2)` = grid_hp^2, `I(hp^3)` = grid_hp^3))

plot(mtcars$hp, mtcars$mpg, col='red',main='mpg vs hp', xlab='hp', ylab = 'mpg', pch=16)
lines(grid_hp, qubic_model_line, col='green', lwd = 3, pch=18)
legend(80, 15, legend=c("Data", "Cubic fit"),
       col=c("red", "green"), pch=c(16,18), cex=0.8)

如果您只想包含趨勢的說明,則可以使用局部多項式回歸,例如ggplot2使用的 LOESS 方法。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM