简体   繁体   English

R:成对的t.test删除不成对的观测值(长格式)

[英]R: paired t.test removing observations with no pairs (long format)

I have the following data with paired observations in long format. 我有以下数据以及长格式的配对观测值。 I am trying to do a paired t-test along time variable in R on the long format, but by first detecting obs that are not available in both time 1 and 2 (obs B and E in this case), and then perhaps creating a new dataframe with the observations in order. 我正在尝试使用长格式的R中的时间变量进行配对t检验,但是首先要检测时间1和2都不可用的obs(在这种情况下为obs B和E),然后可能创建一个新的数据框与观察顺序。 Is there a way to do this without reshaping the data into wide format first? 有没有一种方法,而无需先将数据重塑为宽格式? Help and suggestions would be appreciated, R newbie here. 帮助和建议,将不胜感激,R新手在这里。

obs time value
A   1    5.5
B   1    7.1
C   1    4.3
D   1    6.4
E   1    6.6
F   1    5.6
G   1    6.6
A   2    6.5
C   2    6.7
D   2    7.8
F   2    5.7
G   2    8.9   

As an alternative to the use of duplicated in @CPak's long-format answer you can group by the observation and filter for where the count of the observations is not equal to 1: 作为在@CPak长格式答案中使用重复项的替代方法,您可以按观察值分组并过滤观察值计数不等于1的位置:

library(dplyr)

p = 
  group_by(df, obs) %>%
  filter(n() != 1) %>%
  arrange(time, obs) %>%
  ungroup()

Leads to the same result in any event, as when applying the t-test as shown in @CPak's answer: 在任何情况下都导致相同的结果,就像在应用@CPak答案中所示的t检验时一样:

ans <- with(p, t.test(value ~ time, paired=TRUE))

> ans

    Paired t-test

data:  value by time
t = -3.3699, df = 4, p-value = 0.02805
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.6264228 -0.2535772
sample estimates:
mean of the differences 
                  -1.44 

You can use duplicated both in the forward and reverse fromLast=TRUE direction to filter your data 您可以在正向和反向fromLast=TRUE方向上使用duplicated来过滤数据

library(dplyr)
p <- df %>%
       filter(duplicated(obs) | duplicated(obs, fromLast=TRUE)) %>%
       arrange(time, obs)

   # obs time value
# 1    A    1   5.5
# 2    C    1   4.3
# 3    D    1   6.4
# 4    F    1   5.6
# 5    G    1   6.6
# 6    A    2   6.5
# 7    C    2   6.7
# 8    D    2   7.8
# 9    F    2   5.7
# 10   G    2   8.9

Then perform the paired t.test 然后执行配对的t.test

ans <- with(p, t.test(value ~ time, paired=TRUE))

        # Paired t-test

# data:  value by time
# t = -3.3699, df = 4, p-value = 0.02805
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
 # -2.6264228 -0.2535772
# sample estimates:
# mean of the differences 
                  # -1.44    

Your original data 您的原始数据

df <- read.table(text="obs time value
A   1    5.5
B   1    7.1
C   1    4.3
D   1    6.4
E   1    6.6
F   1    5.6
G   1    6.6
A   2    6.5
C   2    6.7
D   2    7.8
F   2    5.7
G   2    8.9", header=TRUE, stringsAsFactors=FALSE)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM