简体   繁体   English

跨R数据帧的选定列进行t检验

[英]t-test across selected columns of R data frame

I've got a relatively simple problem, which I don't think I'm properly approaching using R. 我有一个相对简单的问题,我认为我没有正确地使用R。

I have a data frame with several observations, stored in rows, as well as a bunch of annotations that I don't want to lose, in other columns of the same data frame. 我在同一数据框的其他列中有一个数据框架,其中包含多个观察结果,这些观察结果存储在行中,还有一堆我不想丢失的注释。

I would like to run a t-test across the values in several columns of the data frame, and have the results written to (ideally) the same data frame. 我想对数据框的几列中的值进行t检验,并将结果写入(理想情况下)同一数据框。

A simple example would be: 一个简单的例子是:

# Generate the data
experimentName <- paste(rep("name",20), c(1:20), sep="")
experimentAnno1 <- rep(paste(rep("anno",5), c(1:5), sep=""), 4)
a1 <- rnorm(n=20, mean=10, sd=5)
a2 <- rnorm(n=20, mean=11, sd=5)
a3 <- rnorm(n=20, mean=12, sd=5)
b1 <- rnorm(n=20, mean=20, sd=5)
b2 <- rnorm(n=20, mean=21, sd=5)
b3 <- rnorm(n=20, mean=19, sd=5)

sampledata <- cbind(experimentName, experimentAnno1, a1,a2,a3,b1,b2,b3)

So I've tried a very simple 所以我尝试了一个非常简单的

ttestfun = function(x) t.test(x[,c("a1", "a2", "a3")], x[,c("b1", "b2", "b3")])$p.value
p.value = apply(sampledata, 1, ttestfun)

Which doesn't work :( 哪个不起作用:(

I've also tried a whole bunch of combinations of by(), melt(), apply() etc - all of which I think I'm doing somehow wrong. 我也尝试了很多by(),melt(),apply()等组合,所有这些我都觉得做错了。

The outcome I'm hoping to get is additional columns in the sampledata data frame which are: 我希望得到的结果是sampledata数据框中的其他列:

# pValue
p.value
# LoConf
a$conf.int[1]
# UpConf
a$conf.int[2]

etc. 等等

What is the most efficient way to do this? 最有效的方法是什么?

Thanks in advance! 提前致谢!

You'll need to make sampledata a data.frame first, to get numeric values in the "a" and "b" columns. 您首先需要将sampledatadata.frame ,以在“ a”和“ b”列中获取数值。

> sampledata <- data.frame(experimentName, experimentAnno1, a1,a2,a3,b1,b2,b3)

If you are trying to get per-row statistics based on a Welch two-sample t-test, this way is fast and relatively simple. 如果您尝试基于Welch两样本t检验获取每行统计信息,则这种方法是快速且相对简单的。

> stats <- as.data.frame(do.call(rbind, lapply(1:nrow(sampledata), function(i){
    as.numeric(unlist(t.test(sampledata[i, 3:5], sampledata[i, 6:8]))[1:5])
    })))
> names(stats) <- c("t.stat", "param.df", "p.val", "ci.left", "ci.right")
> cbind(sampledata, stats)

Probably not the most efficient, but here's one way that builds on your initial effort. 也许不是最有效 ,但这里是建立在你最初的努力的一种方式。

Your example data: 您的示例数据:

experimentName <- paste(rep("name",20), c(1:20), sep="")
experimentAnno1 <- rep(paste(rep("anno",5), c(1:5), sep=""), 4)
a1 <- rnorm(n=20, mean=10, sd=5)
a2 <- rnorm(n=20, mean=11, sd=5)
a3 <- rnorm(n=20, mean=12, sd=5)
b1 <- rnorm(n=20, mean=20, sd=5)
b2 <- rnorm(n=20, mean=21, sd=5)
b3 <- rnorm(n=20, mean=19, sd=5)

I use data.frame rather than cbind so we can keep the numbers as numerics ( cbind coerces them to character) 我使用data.frame而不是cbind所以我们可以将数字保留为数字( cbind将其cbind为字符)

# sampledata <- cbind(experimentName, experimentAnno1, a1,a2,a3,b1,b2,b3)
sampledata <- data.frame(experimentName, experimentAnno1, a1,a2,a3,b1,b2,b3)

Seems like your goal is to within each row, test set of a1, a2, a3, against set of b1, b2, b3 似乎您的目标是在每一行中,对a1,a2,a3的测试集与b1,b2,b3的测试集进行比较

Here are some lapply functions that get those values: 以下是一些获取这些值的lapply函数:

sampledata$pvalue <- sapply(1:nrow(sampledata), function(i) t.test(sampledata[i,c("a1", "a2", "a3")], sampledata[i,c("b1", "b2", "b3")])$p.value)

sampledata$LoConf <- sapply(1:nrow(sampledata), function(i) t.test(sampledata[i,c("a1", "a2", "a3")], sampledata[i,c("b1", "b2", "b3")])$conf.int[1])

sampledata$UpConf <- sapply(1:nrow(sampledata), function(i) t.test(sampledata[i,c("a1", "a2", "a3")], sampledata[i,c("b1", "b2", "b3")])$conf.int[2])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM