I have a dataframe with the following structure:
> str(data_l)
'data.frame': 800 obs. of 5 variables:
$ Participant: int 1 2 3 4 5 6 7 8 9 10 ...
$ Temperature: Factor w/ 4 levels "35","37","39",..: 3 3 3 3 3 3 3 3 3 3 ...
$ Region : Factor w/ 5 levels "Eyes","Front",..: 3 3 3 3 3 3 3 3 3 3 ...
$ Time : Factor w/ 5 levels "0","15","30",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Rating : num 5 5 5 4 5 5 5 5 5 5 ...
I want to run one-sample t-test for each combination of all factors all levels, for a total of 4*5*5 = 100 t-tests, with Rating as dependent variables, or y
.
I am stuck at looping through the combinations, and performing t-test at each combo.
I tried splitting the dataframe by the factors, then lapply
t.test()
through the list, but to no avail.
Does anyone have a better approach? Cheers!
Edit
My ultimate intention is to calculate confidence interval for arrays in all factors all levels. For instance, I was able to do this:
subset1 <- data_l$Rating[data_l$Temperature == 35 & data_l$Region == "Front" & data_l$Time == 0]
Then,
t.test(subset1)$conf.int
But the problem is I will have to do this 100 times.
Edit 2
I am recreating the dataframe.
Temperature <- rep(seq(35, 41, 2), 10)
Region <- rep(c("Front", "Back", "Eyes", "Left", "Right"), 8)
Time <- rep(seq(0, 60, 15), 8)
Rating <- sample(1:5, 40, replace = TRUE)
data_l <- data.frame(Region = factor(Region), Temperature = factor(Temperature), Time = factor(Time), Rating = as.numeric(Rating))
Two things.
Can this be done? Certainly. Should it? Many of your combinations may have insufficient data to find a reasonable confidence interval. While your data sample is certainly reduced and simplified, I don't have assurances that there will be sufficient fillingness of your factor combinations.
table(sapply(split(data_l$Rating, data_l[,c("Temperature","Region","Time")]), length)) # 0 2 # 80 20
(There are 80 "empty" combinations of your factor levels.)
Let's try this:
outs <- aggregate(data_l$Rating, data_l[,c("Temperature","Region","Time")], function(x) if (length(unique(x)) > 1) t.test(x)$conf.int else c(NA, NA)) nrow(outs) # [1] 20 head(outs) # Temperature Region Time x.1 x.2 # 1 35 Front 0 NA NA # 2 37 Front 0 -9.706205 15.706205 # 3 39 Front 0 -2.853102 9.853102 # 4 41 Front 0 -15.559307 22.559307 # 5 35 Back 15 -15.559307 22.559307 # 6 37 Back 15 -4.853102 7.853102
Realize that this is not five columns; the fourth is really a matrix embedded in a frame column:
head(outs$x) # [,1] [,2] # [1,] NA NA # [2,] -9.706205 15.706205 # [3,] -2.853102 9.853102 # [4,] -15.559307 22.559307 # [5,] -15.559307 22.559307 # [6,] -4.853102 7.853102
It's easy enough to extract:
outs$conf1 <- outs$x[,1] outs$conf2 <- outs$x[,2] outs$x <- NULL head(outs) # Temperature Region Time conf1 conf2 # 1 35 Front 0 NA NA # 2 37 Front 0 -9.706205 15.706205 # 3 39 Front 0 -2.853102 9.853102 # 4 41 Front 0 -15.559307 22.559307 # 5 35 Back 15 -15.559307 22.559307 # 6 37 Back 15 -4.853102 7.853102
(If you're wondering why I have a conditional on length(unique(x)) > 1
, then see what happens without it:
aggregate(data_l$Rating, data_l[,c("Temperature","Region","Time")], function(x) t.test(x)$conf.int) # Error in t.test.default(x) : data are essentially constant
This is because there are combinations with empty data. You'll likely see something similar with not-empty but still invariant data.)
I am stuck at looping through the combinations, and performing t-test at each combo.
I'm not sure if this is what you wanted.
N <- 800
df <- data.frame(Participant=1:N,
Temperature=gl(4,200),
Region=sample(1:5, 800, TRUE),
Time=sample(1:5, 800, TRUE),
Rating=sample(1:5, 800, TRUE))
head(df)
t_test <- function(data, y, x){
x <- eval(substitute(x), data)
y <- eval(substitute(y), data)
comb <- combn(levels(x), m=2) # this gives all pair-wise combinations
n <- dim(comb)[2]
t <- vector(n, mode="list")
for(i in 1:n){
xlevs <- comb[,i]
DATA <- subset(data, subset=x %in% xlevs)
x2 <- factor(x, levels=xlevs)
tt <- t.test(y~x2, data=DATA)
t[[i]] <- tt
names(t)[i] <- toString(xlevs)
}
t
}
T.test <- t_test(df, Rating, Temperature)
T.test[1]
$`1, 2`
Welch Two Sample t-test
data: y by x2
t = -1.0271, df = 396.87, p-value = 0.305
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.4079762 0.1279762
sample estimates:
mean in group 1 mean in group 2
2.85 2.99
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.