[英]R: paired t.test removing observations with no pairs (long format)
I have the following data with paired observations in long format. 我有以下数据以及长格式的配对观测值。 I am trying to do a paired t-test along time variable in R on the long format, but by first detecting obs that are not available in both time 1 and 2 (obs B and E in this case), and then perhaps creating a new dataframe with the observations in order.
我正在尝试使用长格式的R中的时间变量进行配对t检验,但是首先要检测时间1和2都不可用的obs(在这种情况下为obs B和E),然后可能创建一个新的数据框与观察顺序。 Is there a way to do this without reshaping the data into wide format first?
有没有一种方法,而无需先将数据重塑为宽格式? Help and suggestions would be appreciated, R newbie here.
帮助和建议,将不胜感激,R新手在这里。
obs time value
A 1 5.5
B 1 7.1
C 1 4.3
D 1 6.4
E 1 6.6
F 1 5.6
G 1 6.6
A 2 6.5
C 2 6.7
D 2 7.8
F 2 5.7
G 2 8.9
As an alternative to the use of duplicated in @CPak's long-format answer you can group by the observation and filter for where the count of the observations is not equal to 1: 作为在@CPak长格式答案中使用重复项的替代方法,您可以按观察值分组并过滤观察值计数不等于1的位置:
library(dplyr)
p =
group_by(df, obs) %>%
filter(n() != 1) %>%
arrange(time, obs) %>%
ungroup()
Leads to the same result in any event, as when applying the t-test as shown in @CPak's answer: 在任何情况下都导致相同的结果,就像在应用@CPak答案中所示的t检验时一样:
ans <- with(p, t.test(value ~ time, paired=TRUE))
> ans
Paired t-test
data: value by time
t = -3.3699, df = 4, p-value = 0.02805
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.6264228 -0.2535772
sample estimates:
mean of the differences
-1.44
You can use duplicated
both in the forward and reverse fromLast=TRUE
direction to filter your data 您可以在正向和反向
fromLast=TRUE
方向上使用duplicated
来过滤数据
library(dplyr)
p <- df %>%
filter(duplicated(obs) | duplicated(obs, fromLast=TRUE)) %>%
arrange(time, obs)
# obs time value
# 1 A 1 5.5
# 2 C 1 4.3
# 3 D 1 6.4
# 4 F 1 5.6
# 5 G 1 6.6
# 6 A 2 6.5
# 7 C 2 6.7
# 8 D 2 7.8
# 9 F 2 5.7
# 10 G 2 8.9
Then perform the paired t.test 然后执行配对的t.test
ans <- with(p, t.test(value ~ time, paired=TRUE))
# Paired t-test
# data: value by time
# t = -3.3699, df = 4, p-value = 0.02805
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
# -2.6264228 -0.2535772
# sample estimates:
# mean of the differences
# -1.44
Your original data 您的原始数据
df <- read.table(text="obs time value
A 1 5.5
B 1 7.1
C 1 4.3
D 1 6.4
E 1 6.6
F 1 5.6
G 1 6.6
A 2 6.5
C 2 6.7
D 2 7.8
F 2 5.7
G 2 8.9", header=TRUE, stringsAsFactors=FALSE)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.