I'm trying to find a good set of variables for classification using a large dataset of expression data (all categorical variables along the columns) to predict a binary outcome. Each subject is measured at several, but not all timepoints (T1-T7 in the study). Each subject has a specific ID. To accomplish this, I've decided to use MXM::MMPC.timeclass()
. However, it produces negative p-values. As far as I understand p-values... probabilities, by definition, can't be negative. They really can't and that's obvious.
I have tried MMPC.timeclass()
and have done extensive literature searches to find another method that might be appropriate, but nothing has come up as of yet.
set.seed(5)
## assume these are longitudinal data, each column is a variable (or feature)
dataset <- matrix( rnorm(400 * 100), ncol = 100 )
id <- rep(1:80, each = 5) ## 80 subjects
reps <- rep( seq(4, 12, by = 2), 80)
## 5 time points for each subject
## dataset contains are the regression coefficients of each subject's values on the
## reps (which is assumed to be time in this example)
target <- rep(0:1, each = 200)
a <- MMPC.timeclass(target, reps, id, dataset)
a@pvalues %>% summary()
Min. 1st Qu. Median Mean 3rd Qu. Max.
-4.01762 -1.39835 -0.68720 -0.98512 -0.37326 -0.01365
Expected results should include p-values (in the 0-1 range) or even better, a ranking of some type for each variable from the screening procedure. I've used VariableScreening::ScreenLD()
before, but this is a categorical outcome, so it's not appropriate for the data.
The answer is that they are log p-values. Documentation will be updated accordingly. See https://github.com/mensxmachina/MXM-R-Package/issues/2 for a response from the package author.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.