I have a larger dataset (4352 observations) that I am trying to break down into continuous and discrete data in preparation for Bayesian analysis. So far, I have tried two different methods of doing this: using an if-then statement and if else
, both within for
loops.
I have my observations as proportions in the object y
:
> head(y,10)
A B C DEF
1 0.50 0.5 0.00 0.0
2 0.95 0.0 0.05 0.0
3 0.10 0.0 0.00 0.9
4 0.70 0.0 0.30 0.0
5 0.95 0.0 0.05 0.0
6 0.60 0.0 0.40 0.0
7 0.95 0.00 0.05 0.0
8 0.95 0.05 0.00 0.0
9 1.00 0.00 0.00 0.0
10 1.00 0.00 0.00 0.0
And a vector of the length of y
, which I will later use to index whether a row is discrete (0,1) or continuous.
y.discrete <- rep(0,dim(y)[1])
My first method is the if-then statement:
y.d <- matrix(NA,n,ncat)
for (i in 1:n){
y.d[i,][max(y[i,])==1]=y[i,]
y.discrete[i][!is.na(y.d[i,])]=1
}
the for
loop produces Error in yd[i, 1] : incorrect number of dimensions
. If you call out one single element (eg, yd[i,1]
) in the if-then statement, then it runs without error. Also, once the loop has been run, the object yd
is changed from a matrix to a Large list. I believe this is what is causing the error in the number of dimensions. If you look at i
here, it is 1.
I have also tried an if else
:
y.d <- matrix(NA,n,4)
for (i in 1:n){
if (max(y[i,])==1) {
y.d[i,]<-y[i,]
} else {
if (!is.na(y.d[i,1])) {
y.discrete[i]<-1
}
}
}
This provides the same error with the loop, but if you look at the last value of i
, it is 10. This still has the issue of changing the class, too.
Does anyone have any thoughts on what is happening inside here? I have already asked two colleagues for help, and we are all stumped. I appreciate your help. I am running R 3.0.3 on a Windows 7, 64-bit machine.
Edit: To clarify, I would like yd
to contain the corresponding rows from y
where one of the values (A, B, C, DEF) is exactly equal to 1. Otherwise, it should remain NA.
Edit 2: I have been trying to get the inverse of the answer supplied by @joran to work for the continuous observations(where the values are between - but don't contain - 0,1), and indexing using the same vector isn't working. When I try:
y.c<-y
y.c[y.discrete,] <- NA
I still have rows with 1's in my data (see rows 9 & 10), and it is not the inverse of what yd
delivered:
> head(y.d,10)
A B C DEF
1 NA NA NA NA
2 NA NA NA NA
3 NA NA NA NA
4 NA NA NA NA
5 NA NA NA NA
6 NA NA NA NA
7 NA NA NA NA
8 NA NA NA NA
9 1 0 0 0
10 1 0 0 0
> head(y.c, 10)
A B C DEF
1 NA NA NA NA
2 0.95 0.00 0.05 0.0
3 0.10 0.00 0.00 0.9
4 0.70 0.00 0.30 0.0
5 0.95 0.00 0.05 0.0
6 0.60 0.00 0.40 0.0
7 0.95 0.00 0.05 0.0
8 0.95 0.05 0.00 0.0
9 1.00 0.00 0.00 0.0
10 1.00 0.00 0.00 0.0
Sorry if this is a stupid question, but do you know why I can't just index for the vector that we used the inverse of previously?
I'm sort of guessing here because your question leaves out some details. I think what you're actually trying to do is something like this:
y.discrete <- apply(y,1,function(x) as.integer(any(x == 1)))
> y.discrete
1 2 3 4 5 6 7 8 9 10
0 0 0 0 0 0 0 0 1 1
And then:
> y.d <- y
> y.d[!y.discrete,] <- NA
> y.d
A B C DEF
1 NA NA NA NA
2 NA NA NA NA
3 NA NA NA NA
4 NA NA NA NA
5 NA NA NA NA
6 NA NA NA NA
7 NA NA NA NA
8 NA NA NA NA
9 1 0 0 0
10 1 0 0 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.