[英]T-test on two columns in R
I am trying to do a t-test to see if the values in two columns on two dfs are statistically different.我正在尝试进行 t 检验,以查看两个 dfs 上两列中的值是否在统计上不同。
I am trying to run a code that compares the "Duration" column in two dfs -- "Tokens" and "Tokens.Single".我正在尝试运行一个代码来比较两个 dfs 中的“Duration”列——“Tokens”和“Tokens.Single”。 Both dfs have the same number of values in their respective duration columns.两个 df 在其各自的持续时间列中具有相同数量的值。
Here is the code I am trying:这是我正在尝试的代码:
# T-test for duration.
t.test(Tokens$Duration ~ Tokens.Single$Duration, paired=FALSE, var.equal=TRUE)
And this is the error message I received:这是我收到的错误消息:
Error in t.test.formula(Tokens$Duration ~ Tokens.Single$Duration, paired = FALSE, :
grouping factor must have exactly 2 levels
Any insight is appreciated!任何见解表示赞赏!
Without a peak at your data, it's hard to say, but the syntax you are using in t.test
is usually for response by factor variable.如果您的数据没有峰值,很难说,但是您在t.test
中使用的语法通常是针对因子变量的响应。
Based on your description of your data you would be better to use the following syntax:根据您对数据的描述,您最好使用以下语法:
y <- rnorm(50)
x <- rnorm(50)
t.test(x,y)
Which will result in a comparison of means between x and y numeric vector, or in your case:这将导致 x 和 y 数值向量之间的均值比较,或者在您的情况下:
t.test(Tokens$Duration , Tokens.Single$Duration, paired=FALSE, var.equal=TRUE)
Just for completeness, if you had a factor variable indicating the run or experiment number, you could use the formula syntax eg为了完整起见,如果您有一个指示运行或实验编号的因子变量,您可以使用公式语法,例如
y <- rnorm(50)
z <- rep(c("A","B"), 25)
t.test(y ~z)
Yielding:产量:
data: y by z
t = -2.0418, df = 47.504, p-value = 0.04675
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.07859422 -0.00814587
sample estimates:
mean in group A mean in group B
0.1162672 0.6596372
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.