简体   繁体   中英

Fitting PCA using the model.matrix function in R

So I am working with a data-set involving data regarding the passengers on the Titanic which you may find here .

So here I am using the train data provided. I would like to create a model matrix of the dataset that only contains numbers (no factors!) by using the model.matrix function.

After which, remove the Survived variable from this dataset.

From this matrix, I would like to fit a PCA to the matrix from the previous step and plot the scores of the observations (using only the first 2 dimensions) and color them according to the Survived variable.

I have tried a few ways of doing this but it does not seem accurate nor does it color.

 library(readr)
 library(dplyr)
 titanic_train <- read_csv("C:/Users/johnt/Desktop/Statistical Data Mining/HW 1/train.csv")

 titanic_train <- titanic_train %>% 
   select(Survived, Pclass, Sex, Age, SibSp, Parch, Fare, Embarked) %>% 
   mutate(Fare = log(Fare))


 ###### Model Matrix

 mm <- titanic_train %>% 
   select(Pclass, Age, SibSp, Parch, Fare, Survived) 

 titan <- model.matrix(-Survived ~., mm)

 #Clean it up
 titan <- titan[,-1] #remove intercept column
 titan <- scale(titan)
 titan[is.na(titan)] <- 0

 #PCA
 titan2 <-prcomp(titan[,-5], center = TRUE, scale. = TRUE)
 titan2


 plot(titan2$x[,1:2],col=mm$Survived)

Did you try ggbiplot(titan2) ?

Of course if you do you will have to filter for use of only the first two columns before running it on prcomp

And could you give an example of how you'd like the PCA to look ?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM