[英]Comparing two curves for difference in trend
I have some data about trends over time in drug use across the state.我有一些关于全州吸毒随时间变化的趋势的数据。 I want to know whether there have been changes in the gender difference in intravenous drug use versus gender differences in all recreational drug use over time.
我想知道随着时间的推移,静脉吸毒的性别差异与所有娱乐性吸毒的性别差异是否发生了变化。
My data is below.我的数据如下。 I think I might need to use time-series analysis, but I'm not sure.
我想我可能需要使用时间序列分析,但我不确定。 Any help would be much appreciated.
任何帮助将非常感激。
Since the description in the question does not match the data as there is no information on gender we will assume from the subject that we want to determine if the trends of illicit and iv are the same.由于问题中的描述与数据不匹配,因为没有关于性别的信息,我们将从主题中假设我们要确定非法和 iv 的趋势是否相同。
Note that there is no autocorrelation in the detrended values of iv
or illicit
so we will use ordinary linear models.请注意,
iv
或illicit
的去趋势值没有自相关,因此我们将使用普通的线性模型。
iv <- c(0.4, 0.3, 0.4, 0.3, 0.2, 0.2)
illicit <- c(5.5, 5.7, 4.8, 4.7, 6.1, 5.3)
time <- 2011:2016
ar(resid(lm(iv ~ time)))
## Call:
## ar(x = resid(lm(iv ~ time)))
##
## Order selected 0 sigma^2 estimated as 0.0024
ar(resid(lm(illicit ~ time)))
## Call:
## ar(x = resid(lm(illicit ~ time)))
##
## Order selected 0 sigma^2 estimated as 0.287
Create a 12x3 data frame long
with columns time
, value
and ind
( iv
or illicit
).创建一个长度为 12x3 的数据框
long
其中包含time
、 value
和ind
( iv
或illicit
)列。 Then run a linear model with two slopes and and another with one slope.然后运行具有两个斜率和另一个具有一个斜率的线性模型。 Both have two intercepts.
两者都有两个拦截。 Then compare them using
anova
.然后使用
anova
比较它们。 Evidently they are not significantly different so we cannot reject the hypothesis that the slopes are the same.显然,它们没有显着差异,因此我们不能拒绝斜率相同的假设。
wide <- data.frame(iv, illicit)
long <- cbind(time, stack(wide))
fm2 <- lm(values ~ ind/(time + 1) + 0, long)
fm1 <- lm(values ~ ind + time + 0, long)
anova(fm1, fm2)
giving:给予:
Analysis of Variance Table
Model 1: values ~ ind + time + 0
Model 2: values ~ ind/(time + 1) + 0
Res.Df RSS Df Sum of Sq F Pr(>F)
1 9 1.4629
2 8 1.4469 1 0.016071 0.0889 0.7732
Actually the slopes are not significant in the first place and we cannot reject the hypothesis that both the slopes are zero.实际上,斜率一开始并不重要,我们不能拒绝两个斜率都为零的假设。 Compare to a two intercept model with no slopes.
与没有斜率的两个截距模型进行比较。
fm0 <- lm(values ~ ind + 0, long)
anova(fm0, fm2)
giving:给予:
Analysis of Variance Table
Model 1: values ~ ind + 0
Model 2: values ~ ind/(time + 1) + 0
Res.Df RSS Df Sum of Sq F Pr(>F)
1 10 1.4750
2 8 1.4469 2 0.028143 0.0778 0.9258
or running a stepwise regression we find that its favored model is one with two intercepts and no slopes:或者运行逐步回归,我们发现它最喜欢的模型是一个有两个截距且没有斜率的模型:
step(fm2)
giving:给予:
Start: AIC=-17.39
values ~ ind/(time + 1) + 0
Df Sum of Sq RSS AIC
- ind:time 2 0.028143 1.4750 -21.155
<none> 1.4469 -17.386
Step: AIC=-21.15
values ~ ind - 1
Df Sum of Sq RSS AIC
<none> 1.475 -21.155
- ind 2 172.28 173.750 32.073
Call:
lm(formula = values ~ ind - 1, data = long)
Coefficients:
indiv indillicit
0.30 5.35
If we use log(values) then we similarly find no autocorrelation (not shown) but we do find the slopes of the log transformed values are significantly different.如果我们使用 log(values) 那么我们同样没有发现自相关(未显示),但我们确实发现对数转换值的斜率显着不同。
fm2log <- lm(log(values) ~ ind/(time + 1) + 0, long)
fm1log <- lm(log(values) ~ ind + time + 0, long)
anova(fm1log, fm2log)
giving:给予:
Analysis of Variance Table
Model 1: log(values) ~ ind + time + 0
Model 2: log(values) ~ ind/(time + 1) + 0
Res.Df RSS Df Sum of Sq F Pr(>F)
1 9 0.35898
2 8 0.18275 1 0.17622 7.7141 0.02402 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.