简体   繁体   中英

`for` loop coercing matrix into large list in R

I have a larger dataset (4352 observations) that I am trying to break down into continuous and discrete data in preparation for Bayesian analysis. So far, I have tried two different methods of doing this: using an if-then statement and if else , both within for loops.

I have my observations as proportions in the object y :

> head(y,10)  
     A   B    C DEF  
1  0.50 0.5 0.00 0.0  
2  0.95 0.0 0.05 0.0  
3  0.10 0.0 0.00 0.9  
4  0.70 0.0 0.30 0.0  
5  0.95 0.0 0.05 0.0  
6  0.60 0.0 0.40 0.0
7  0.95 0.00 0.05 0.0
8  0.95 0.05 0.00 0.0
9  1.00 0.00 0.00 0.0
10 1.00 0.00 0.00 0.0

And a vector of the length of y , which I will later use to index whether a row is discrete (0,1) or continuous.

y.discrete <- rep(0,dim(y)[1])

My first method is the if-then statement:

y.d <- matrix(NA,n,ncat)

for (i in 1:n){
y.d[i,][max(y[i,])==1]=y[i,]
y.discrete[i][!is.na(y.d[i,])]=1
}

the for loop produces Error in yd[i, 1] : incorrect number of dimensions . If you call out one single element (eg, yd[i,1] ) in the if-then statement, then it runs without error. Also, once the loop has been run, the object yd is changed from a matrix to a Large list. I believe this is what is causing the error in the number of dimensions. If you look at i here, it is 1.

I have also tried an if else :

y.d <- matrix(NA,n,4)

for (i in 1:n){
  if (max(y[i,])==1) {
    y.d[i,]<-y[i,]    
  } else {
    if (!is.na(y.d[i,1])) {
      y.discrete[i]<-1
    } 
  }
}

This provides the same error with the loop, but if you look at the last value of i , it is 10. This still has the issue of changing the class, too.

Does anyone have any thoughts on what is happening inside here? I have already asked two colleagues for help, and we are all stumped. I appreciate your help. I am running R 3.0.3 on a Windows 7, 64-bit machine.

Edit: To clarify, I would like yd to contain the corresponding rows from y where one of the values (A, B, C, DEF) is exactly equal to 1. Otherwise, it should remain NA.

Edit 2: I have been trying to get the inverse of the answer supplied by @joran to work for the continuous observations(where the values are between - but don't contain - 0,1), and indexing using the same vector isn't working. When I try:

y.c<-y
y.c[y.discrete,] <- NA

I still have rows with 1's in my data (see rows 9 & 10), and it is not the inverse of what yd delivered:

> head(y.d,10)
    A  B  C DEF
1  NA NA NA  NA
2  NA NA NA  NA
3  NA NA NA  NA
4  NA NA NA  NA
5  NA NA NA  NA
6  NA NA NA  NA
7  NA NA NA  NA
8  NA NA NA  NA
9   1  0  0   0
10  1  0  0   0

> head(y.c, 10)
      A    B    C DEF
1    NA   NA   NA  NA
2  0.95 0.00 0.05 0.0
3  0.10 0.00 0.00 0.9
4  0.70 0.00 0.30 0.0
5  0.95 0.00 0.05 0.0
6  0.60 0.00 0.40 0.0
7  0.95 0.00 0.05 0.0
8  0.95 0.05 0.00 0.0
9  1.00 0.00 0.00 0.0
10 1.00 0.00 0.00 0.0

Sorry if this is a stupid question, but do you know why I can't just index for the vector that we used the inverse of previously?

I'm sort of guessing here because your question leaves out some details. I think what you're actually trying to do is something like this:

y.discrete <- apply(y,1,function(x) as.integer(any(x == 1)))
> y.discrete
 1  2  3  4  5  6  7  8  9 10 
 0  0  0  0  0  0  0  0  1  1 

And then:

> y.d <- y
> y.d[!y.discrete,] <- NA
> y.d
    A  B  C DEF
1  NA NA NA  NA
2  NA NA NA  NA
3  NA NA NA  NA
4  NA NA NA  NA
5  NA NA NA  NA
6  NA NA NA  NA
7  NA NA NA  NA
8  NA NA NA  NA
9   1  0  0   0
10  1  0  0   0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM