R 中两列的 T 检验

Question

I am trying to do a t-test to see if the values in two columns on two dfs are statistically different.我正在尝试进行 t 检验，以查看两个 dfs 上两列中的值是否在统计上不同。

I am trying to run a code that compares the "Duration" column in two dfs -- "Tokens" and "Tokens.Single".我正在尝试运行一个代码来比较两个 dfs 中的“Duration”列——“Tokens”和“Tokens.Single”。 Both dfs have the same number of values in their respective duration columns.两个 df 在其各自的持续时间列中具有相同数量的值。

Here is the code I am trying:这是我正在尝试的代码：

# T-test for duration.
t.test(Tokens$Duration ~ Tokens.Single$Duration, paired=FALSE, var.equal=TRUE)

And this is the error message I received:这是我收到的错误消息：

Error in t.test.formula(Tokens$Duration ~ Tokens.Single$Duration, paired = FALSE,  : 
  grouping factor must have exactly 2 levels

Any insight is appreciated!任何见解表示赞赏！

Answer 1

Without a peak at your data, it's hard to say, but the syntax you are using in t.test is usually for response by factor variable.如果您的数据没有峰值，很难说，但是您在t.test中使用的语法通常是针对因子变量的响应。

Based on your description of your data you would be better to use the following syntax:根据您对数据的描述，您最好使用以下语法：

y <- rnorm(50)
x <- rnorm(50)

t.test(x,y)

Which will result in a comparison of means between x and y numeric vector, or in your case:这将导致 x 和 y 数值向量之间的均值比较，或者在您的情况下：

t.test(Tokens$Duration , Tokens.Single$Duration, paired=FALSE, var.equal=TRUE)

Just for completeness, if you had a factor variable indicating the run or experiment number, you could use the formula syntax eg为了完整起见，如果您有一个指示运行或实验编号的因子变量，您可以使用公式语法，例如

y <- rnorm(50)
z <- rep(c("A","B"), 25)
t.test(y ~z)

Yielding:产量：

data:  y by z
t = -2.0418, df = 47.504, p-value = 0.04675
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.07859422 -0.00814587
sample estimates:
mean in group A mean in group B 
      0.1162672       0.6596372

R 中两列的 T 检验

问题描述

1 个解决方案

解决方案1
3 2020-08-22 12:36:05

R 中两列的 T 检验

问题描述

1 个解决方案

解决方案1 3 2020-08-22 12:36:05

解决方案1
3 2020-08-22 12:36:05