简体   繁体   中英

Principal component analysis and elastic net regression

I have identified genes of interest in disease cases and controls within a microarray gene expression set and have applied PCA. I want to use elastic net regression to build a model that can determine which principal components are predictive of the source (case versus control) but I'm unsure of how to do this ie what to input as the x and y variables. Any help at all would be much appreciated!

Some form of subset selection (ie the elastic net regression you refer to), where you fit a 'penalized' model and determine the most effective predictors isn't applicable to PCA or PCR (principal component regression). PCR reduces the data set to 'n' components, and the different principal components refer to different 'directions' of variance within the data. The first principal component is the direction within the data which has the most variance, the second principal component is the direction within the data which has the second most variance, etc

If you were to type:

summary(pcr.model)

It will return a table containing the amount of variance explained in the response (ie your y) by each principal component. You will notice there is a cumulative total of variance explained by the principal components.

The idea of PCR is that you can select a subset of these (if your data is applicable -- ie most of the variance is captured in the first few principal components), allowing you to greatly reduce the dimensionality of your data (allowing you to, say, plot a graph of PC1 vs PC2). Note that PCR is generally used in the categorisation of ordinal or categorical data types, so if your data isn't like this, probably use something else. If, however, you want to know which predictors are useful and apply an elastic-net type regression, I would recommend using the Lasso. I would also recommend the ISLR book, which contains excellent R walkthroughs of all of the essential frequentist modelling techniques.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM