简体   繁体   English

R 中两列的 T 检验

[英]T-test on two columns in R

I am trying to do a t-test to see if the values in two columns on two dfs are statistically different.我正在尝试进行 t 检验,以查看两个 dfs 上两列中的值是否在统计上不同。

I am trying to run a code that compares the "Duration" column in two dfs -- "Tokens" and "Tokens.Single".我正在尝试运行一个代码来比较两个 dfs 中的“Duration”列——“Tokens”和“Tokens.Single”。 Both dfs have the same number of values in their respective duration columns.两个 df 在其各自的持续时间列中具有相同数量的值。

Here is the code I am trying:这是我正在尝试的代码:

# T-test for duration.
t.test(Tokens$Duration ~ Tokens.Single$Duration, paired=FALSE, var.equal=TRUE)

And this is the error message I received:这是我收到的错误消息:

Error in t.test.formula(Tokens$Duration ~ Tokens.Single$Duration, paired = FALSE,  : 
  grouping factor must have exactly 2 levels

Any insight is appreciated!任何见解表示赞赏!

Without a peak at your data, it's hard to say, but the syntax you are using in t.test is usually for response by factor variable.如果您的数据没有峰值,很难说,但是您在t.test中使用的语法通常是针对因子变量的响应。

Based on your description of your data you would be better to use the following syntax:根据您对数据的描述,您最好使用以下语法:

y <- rnorm(50)
x <- rnorm(50)

t.test(x,y)

Which will result in a comparison of means between x and y numeric vector, or in your case:这将导致 x 和 y 数值向量之间的均值比较,或者在您的情况下:

t.test(Tokens$Duration , Tokens.Single$Duration, paired=FALSE, var.equal=TRUE)

Just for completeness, if you had a factor variable indicating the run or experiment number, you could use the formula syntax eg为了完整起见,如果您有一个指示运行或实验编号的因子变量,您可以使用公式语法,例如

y <- rnorm(50)
z <- rep(c("A","B"), 25)
t.test(y ~z)

Yielding:产量:

data:  y by z
t = -2.0418, df = 47.504, p-value = 0.04675
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.07859422 -0.00814587
sample estimates:
mean in group A mean in group B 
      0.1162672       0.6596372 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM