简体   繁体   中英

t-test through all combinations of all factors all levels

I have a dataframe with the following structure:

> str(data_l)
'data.frame':   800 obs. of  5 variables:
 $ Participant: int  1 2 3 4 5 6 7 8 9 10 ...
 $ Temperature: Factor w/ 4 levels "35","37","39",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ Region     : Factor w/ 5 levels "Eyes","Front",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ Time       : Factor w/ 5 levels "0","15","30",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Rating     : num  5 5 5 4 5 5 5 5 5 5 ...

I want to run one-sample t-test for each combination of all factors all levels, for a total of 4*5*5 = 100 t-tests, with Rating as dependent variables, or y .

I am stuck at looping through the combinations, and performing t-test at each combo.

I tried splitting the dataframe by the factors, then lapply t.test() through the list, but to no avail.

Does anyone have a better approach? Cheers!

Edit

My ultimate intention is to calculate confidence interval for arrays in all factors all levels. For instance, I was able to do this:

subset1 <- data_l$Rating[data_l$Temperature == 35 & data_l$Region == "Front" & data_l$Time == 0]

Then,

t.test(subset1)$conf.int

But the problem is I will have to do this 100 times.

Edit 2

I am recreating the dataframe.

Temperature <- rep(seq(35, 41, 2), 10)
Region <- rep(c("Front", "Back", "Eyes", "Left", "Right"), 8)
Time <- rep(seq(0, 60, 15), 8)
Rating <- sample(1:5, 40, replace = TRUE)
data_l <- data.frame(Region = factor(Region), Temperature = factor(Temperature), Time = factor(Time), Rating = as.numeric(Rating))

Two things.

  1. Can this be done? Certainly. Should it? Many of your combinations may have insufficient data to find a reasonable confidence interval. While your data sample is certainly reduced and simplified, I don't have assurances that there will be sufficient fillingness of your factor combinations.

     table(sapply(split(data_l$Rating, data_l[,c("Temperature","Region","Time")]), length)) # 0 2 # 80 20

    (There are 80 "empty" combinations of your factor levels.)

  2. Let's try this:

     outs <- aggregate(data_l$Rating, data_l[,c("Temperature","Region","Time")], function(x) if (length(unique(x)) > 1) t.test(x)$conf.int else c(NA, NA)) nrow(outs) # [1] 20 head(outs) # Temperature Region Time x.1 x.2 # 1 35 Front 0 NA NA # 2 37 Front 0 -9.706205 15.706205 # 3 39 Front 0 -2.853102 9.853102 # 4 41 Front 0 -15.559307 22.559307 # 5 35 Back 15 -15.559307 22.559307 # 6 37 Back 15 -4.853102 7.853102

    Realize that this is not five columns; the fourth is really a matrix embedded in a frame column:

     head(outs$x) # [,1] [,2] # [1,] NA NA # [2,] -9.706205 15.706205 # [3,] -2.853102 9.853102 # [4,] -15.559307 22.559307 # [5,] -15.559307 22.559307 # [6,] -4.853102 7.853102

    It's easy enough to extract:

     outs$conf1 <- outs$x[,1] outs$conf2 <- outs$x[,2] outs$x <- NULL head(outs) # Temperature Region Time conf1 conf2 # 1 35 Front 0 NA NA # 2 37 Front 0 -9.706205 15.706205 # 3 39 Front 0 -2.853102 9.853102 # 4 41 Front 0 -15.559307 22.559307 # 5 35 Back 15 -15.559307 22.559307 # 6 37 Back 15 -4.853102 7.853102

    (If you're wondering why I have a conditional on length(unique(x)) > 1 , then see what happens without it:

     aggregate(data_l$Rating, data_l[,c("Temperature","Region","Time")], function(x) t.test(x)$conf.int) # Error in t.test.default(x) : data are essentially constant

    This is because there are combinations with empty data. You'll likely see something similar with not-empty but still invariant data.)

I am stuck at looping through the combinations, and performing t-test at each combo.

I'm not sure if this is what you wanted.

N <- 800
df <- data.frame(Participant=1:N,
                 Temperature=gl(4,200),
                 Region=sample(1:5, 800, TRUE),
                 Time=sample(1:5, 800, TRUE),
                 Rating=sample(1:5, 800, TRUE))
head(df)

t_test <- function(data, y, x){
  x <- eval(substitute(x), data)
  y <- eval(substitute(y), data)

  comb <- combn(levels(x), m=2)  # this gives all pair-wise combinations
  n <- dim(comb)[2]
  t <- vector(n, mode="list")

  for(i in 1:n){
    xlevs <- comb[,i]
    DATA <- subset(data, subset=x %in% xlevs)
    x2 <- factor(x, levels=xlevs)
    tt <- t.test(y~x2, data=DATA)
    t[[i]] <- tt
    names(t)[i] <- toString(xlevs)
  }
  t
}

T.test <- t_test(df, Rating, Temperature)

T.test[1]
$`1, 2`

    Welch Two Sample t-test

data:  y by x2
t = -1.0271, df = 396.87, p-value = 0.305
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.4079762  0.1279762
sample estimates:
mean in group 1 mean in group 2 
           2.85            2.99 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM