简体   繁体   English

R,比较两个2x2矩阵/表之间差异的显着性

[英]R, comparing significance of difference between two 2x2 matrices/tables

In my statistical analysis, I'm comparing how likely it is for a sample to have answered (Y/N) to Question 2 given their answer (Y/N) to Question 1. 在我的统计分析中,我比较的是样品给出问题1的答案(Y / N)的可能性为(Y / N)。

millenials_q1q2 <- matrix(c(25, 150, 100, 25),ncol=2, byrow=FALSE)
babyboomers_q1q2 <- matrix(c(100,75,60,60),ncol=2, byrow=FALSE)

I've been able to use proptable to chart the row and column percentages: 我已经能够使用proptable来绘制行和列的百分比图表:

prop.table(test_data1, 1)
prop.table(test_data1, 2)

prop.table(test_data2, 1)
prop.table(test_data2, 2)

What I'm hoping to do is directly compare the two matrices to assess the significance of the difference between the two patterns. 我希望做的是直接比较两个矩阵,以评估两种模式之间差异的重要性。

I hope that this makes sense and gives enough context! 我希望这有道理并提供足够的背景信息!

EDIT (for further context): 编辑(进一步的上下文):

I've subsettted the dataset by demographic (ie Millenials, Baby Boomers), and I'm interested in exploring if/how these sub-samples answered Q1 and Q2 differently. 我已经按人口统计学(即千禧一代,婴儿潮一代)对数据集进行了子集划分,并且我有兴趣探索这些子样本是否/如何以不同的方式回答Q1和Q2。

The matrices above represent distinct differences in how they answered the questions, and I'm interested in measuring that difference. 上面的矩阵在回答问题的方式上表现出明显的差异,我对度量这种差异很感兴趣。 (compared to, say, the following matrices which are similar) (相比之下,以下相似的矩阵)

millenials_same <- matrix(c(55, 45, 55, 45),ncol=2, byrow=FALSE)
babyboomers_same <- matrix(c(57, 44, 53, 46),ncol=2, byrow=FALSE)

Does that help clarify my question? 这有助于澄清我的问题吗? Thanks! 谢谢!

Assuming these matrices to have paired data, you can construct a confidence interval on the difference of each pair. 假设这些矩阵具有成对的数据,则可以针对每对差异构建一个置信区间。

  1. Create a matrix "D" with values test_data1-test_data2 (first item of "D" is test_data1[1]-test_data2[1]) 创建具有值test_data1-test_data2的矩阵“ D”(“ D”的第一项是test_data1 [1] -test_data2 [1])
  2. Construct a confidence interval on the mean of values in "D" 根据“ D”中的平均值构造一个置信区间

More details on step 2 can be seen here: http://www.cyclismo.org/tutorial/R/confidence.html 有关第2步的更多详细信息,请参见此处: http : //www.cyclismo.org/tutorial/R/confidence.html

If the confidence interval doesn't include "0", you can say that there is no evidence to say that the difference in the values is insignificant. 如果置信区间不包含“ 0”,则可以说没有证据表明值的差异不大。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM