简体   繁体   中英

Sampling a contingency table in R

I'd like to know if anyone can suggest an efficient method to sample a contingency table such that both the total number of observations and the column totals remain the same.

For example, in the following table where the rows are cases and the columns observations, I'd like to "scramble" the observations such that (a) the total number of observations is 54, and (b) the total number of observations in a variable (eg, A) is 16 18, the same as the original column total for A.

x<-matrix(c(
4,6,0,0,8,0,0,
1,1,1,1,4,0,0,
3,0,1,1,6,0,1,
2,1,0,0,1,0,0,
1,1,0,1,0,1,1,
2,0,0,2,1,2,0),
ncol=6,byrow=F)

colnames(x)<-c("A","B","C","D","E","F")

I've seen a discussion of contingency table sampling in which the cell frequencies are the source of the sampling probabilities for a sample(...) call. This won't work for my purposes because, among other reasons, the column totals do not remain equal to the original column totals.

Any help would be greatly appreciated, Patrick

EDIT

If there isn't an easy solution to this problem, perhaps someone can help me with my overly complicated (and failed) attempt. I first create a vector composed of the number of observations of each variable, eg,

m <- matrix()
v <- matrix()
for (h in 1:cols) {
    m <- rep(colnames(x)[h], sum(x[, h]))
    v <- c(v, m)}

I then sample it to randomly shuffle the observations, and bind it to a sample of values equal to the number of cases

    v<-sample(v,length(v))
    p<-sample(seq(1:nrow(x)),length(v),T)
    n<-as.data.frame(cbind(v,p))

    t(table(n))

      v
    p A B C D E F
    1 3 1 3 1 1 1
    2 1 1 0 0 0 0
    3 3 0 3 0 2 1
    4 3 2 1 2 1 2
    5 2 1 0 0 0 1
    6 3 2 3 1 1 1
    7 3 1 2 0 0 1

    colSums(t(table(n)))
    A  B  C  D  E  F 
   18  8 12  4  5  7 

This works great except when the sample p fails to contain one of the values in the sequence (ie, a "case" is missing), which as I've learned happens quite frequently, particularly when there are many iterations of the sample (eg, 1000).

Thanks again, Patrick

Another way would be:

indx <- cbind(c(replicate(ncol(x), sample(1:nrow(x)))), c(col(x)))
x1 <- x
x1[] <- x[indx]

colSums(x1)
# A  B  C  D  E  F 
#18  8 12  4  5  7 

colSums(x)
#A  B  C  D  E  F 
#18  8 12  4  5  7 

sum(x1)
#[1] 54

Update

Based on the new info, which is confusing, may be this helps:

 cSum <- colSums(x)
 ind1 <- vector("list", length=ncol(x))
 for(i in seq_along(cSum)){
 repeat{ind1[[i]] <- sample(0:cSum[i], nrow(x)-1, replace=TRUE)
 if(sum(ind1[[i]]) <=cSum[i]) break
  }
 }

 x1 <- do.call(cbind, ind1)
 x2 <- rbind(x1,cSum-colSums(x1))
 colSums(x2)
# A  B  C  D  E  F 
#18  8 12  4  5  7 

sum(colSums(x2))
#[1] 54

  x2
 #    A B C D E F
 #[1,] 0 0 0 0 0 0
 #[2,] 9 5 1 2 0 1
 #[3,] 0 1 1 1 0 2
 #[4,] 0 0 4 0 0 1
 #[5,] 8 0 5 0 4 2
 #[6,] 0 0 1 0 1 1
 #[7,] 1 2 0 1 0 0

You can use

x.swapped <- apply(x, MARGIN=2, FUN=sample)

apply applies the function passed in the parameter FUN to the columns (if MARGIN is 2, rows when it is 1) of the matrix x .
In this case we apply the sample function.
When called without extra parameters sample just reorders the element in the vector (see ?sample for more help).

We can check that the totals in each column remain the same.

colSums(x)

 A  B  C  D  E  F 
18  8 12  4  5  7 

colSums(x.swapped)
 A  B  C  D  E  F 
18  8 12  4  5  7 

And obviously

sum(x)
[1] 54

sum(x.swapped)
[1] 54

An example of output may be (note that, unless you fix the RNG seed using set.seed the result from sample will differ each time).

x

     A B C D E F
[1,] 4 1 3 2 1 2
[2,] 6 1 0 1 1 0
[3,] 0 1 1 0 0 0
[4,] 0 1 1 0 1 2
[5,] 8 4 6 1 0 1
[6,] 0 0 0 0 1 2
[7,] 0 0 1 0 1 0

x.swapped

     A B C D E F
[1,] 6 4 1 0 1 0
[2,] 0 1 3 2 0 0
[3,] 0 0 1 1 1 2
[4,] 0 0 0 0 1 2
[5,] 8 1 1 1 1 2
[6,] 4 1 0 0 1 0
[7,] 0 1 6 0 0 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM