简体   繁体   English

变量筛选连续结果,分类预测因子,负p值

[英]Variable screening continuous outcomes, categorical predictor, negative p-values

I'm trying to find a good set of variables for classification using a large dataset of expression data (all categorical variables along the columns) to predict a binary outcome. 我正在尝试使用表达数据的大数据集(沿列的所有分类变量)找到一组好的分类变量,以预测二元结果。 Each subject is measured at several, but not all timepoints (T1-T7 in the study). 在几个但不是所有时间点(研究中的T1-T7)测量每个受试者。 Each subject has a specific ID. 每个科目都有一个特定的ID。 To accomplish this, I've decided to use MXM::MMPC.timeclass() . 为此,我决定使用MXM::MMPC.timeclass() However, it produces negative p-values. 但是,它会产生负p值。 As far as I understand p-values... probabilities, by definition, can't be negative. 据我所知,p值...根据定义,概率不能为负。 They really can't and that's obvious. 他们真的不能,这是显而易见的。

I have tried MMPC.timeclass() and have done extensive literature searches to find another method that might be appropriate, but nothing has come up as of yet. 我已经尝试过MMPC.timeclass()并且已经进行了大量的文献检索以找到可能合适的另一种方法,但是到目前为止还没有任何方法。

set.seed(5)
## assume these are longitudinal data, each column is a variable (or feature)
dataset <- matrix( rnorm(400 * 100), ncol = 100 ) 
id <- rep(1:80, each = 5)  ## 80 subjects
reps <- rep( seq(4, 12, by = 2), 80)

## 5 time points for each subject
## dataset contains are the regression coefficients of each subject's values on the 
## reps (which is assumed to be time in this example)
target <- rep(0:1, each = 200)
a <- MMPC.timeclass(target, reps, id, dataset)
a@pvalues %>% summary()

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-4.01762 -1.39835 -0.68720 -0.98512 -0.37326 -0.01365 

Expected results should include p-values (in the 0-1 range) or even better, a ranking of some type for each variable from the screening procedure. 预期结果应包括p值(在0-1范围内)或甚至更好,从筛选程序的每个变量的某种类型的排名。 I've used VariableScreening::ScreenLD() before, but this is a categorical outcome, so it's not appropriate for the data. 我之前使用过VariableScreening::ScreenLD() ,但这是一个绝对的结果,所以它不适合数据。

The answer is that they are log p-values. 答案是它们是log p值。 Documentation will be updated accordingly. 文档将相应更新。 See https://github.com/mensxmachina/MXM-R-Package/issues/2 for a response from the package author. 有关包作者的回复,请参阅https://github.com/mensxmachina/MXM-R-Package/issues/2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用所需的预测变量和因变量回归协变量列表并使用 lme 和 lmer 返回系数和 p 值表 - How to regress a list of covariates with a desired predictor and dependent variable and return a table of coefficients and p-values using lme and lmer R使用glm返回分类自变量的p值 - R Return p-values for categorical independent variables with glm 为什么SciPy通过Fisher精确检验返回极小p值的负p值? - Why does SciPy return negative p-values for extremely small p-values with the Fisher-exact test? 可视化连续预测变量与分类结果之间的关系 - Visualizing the relationship between a continuous predictor and a categorical outcome 从Dunnett测试中将P值提取到变量表中 - Extract P-Values from Dunnett Test into a Table by Variable 一个变量组合有两个不同的p值? corrplot :: corrplot - Two different p-values for one variable combination? corrplot::corrplot R glm为不同类型的相同分类变量生成不同的p值 - R glm generating different p-values for same categorical variables of different type 从混合 model 中提取分类系数和所有 p 值到数据表中 - Extract categorical coeffients and all p-values from a mixed model into a data table R,p值的相关性 - R, correlation in p-values 打印 p 值 &lt;0.001 - Printing p-values with <0.001
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM