简体   繁体   English

比较r中样本的频率

[英]Compare frequencies of samples in r

I would like to compare the frequency of samples from two different observations. 我想比较来自两个不同观察结果的样本频率。 The problem is that the first doesn't contain the whole range of numbers of the second. 问题在于,第一个数字不包含第二个数字的全部范围。 How could I combine these without writing a for loop sorting them based on the x values returned by count? 如何在不编写基于count返回的x值的for循环对它们进行排序的情况下将它们组合在一起? Here's a MWE for clarification: 以下是MWE的说明:

library(plyr)
a <- c(5, 4, 5, 7, 3, 5, 6, 5, 5, 4, 5, 5, 4, 5, 4, 7, 2, 4, 4, 5, 3, 6, 5, 6, 4, 4, 5, 4, 5, 5, 6, 7, 4)
b <- c(1, 3, 4, 6, 2, 7, 7, 4, 3, 6, 6, 3, 6, 6, 5, 6, 6, 5)

a.count <- count(a)
b.count <- count(b)

My desired result should look somehow like that: 我想要的结果应该看起来像这样:

   freq.a freq.b                   
1  1         
2  1       1                                       
3  3       2   
4  2      10                                              
5  2      13
6  7       4                                            
7  2       3

If you put your data in long format (one row per observation, with a variable for which sample it is from), then you can just make a contingency table: 如果您以长格式存储数据(每个观察结果一行,带有一个来自其样本的变量),那么您可以制作一个列联表:

    data.frame(v=df.a, s='a') %>% rbind(data.frame(v=df.b, s='b')) %>%
      xtabs(f=~v+s)

Produces: 产生:

   s
v    a  b
  1  0  1
  2  1  1
  3  2  3
  4 10  2
  5 13  2
  6  4  7
  7  3  2
df <- merge(a.count, b.count, by ='x', all=TRUE)[2:3]
names(df) <- c('freq.a', 'freq.b')
df

  freq.a freq.b
1     NA      1
2      1      1
3      2      3
4     10      2
5     13      2
6      4      7
7      3      2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM