简体   繁体   English

使用R中的model.matrix函数拟合PCA

[英]Fitting PCA using the model.matrix function in R

So I am working with a data-set involving data regarding the passengers on the Titanic which you may find here . 因此,我正在处理一个数据集,其中包含有关泰坦尼克号上乘客的数据,您可以在此处找到。

So here I am using the train data provided. 因此,我在这里使用提供的火车数据。 I would like to create a model matrix of the dataset that only contains numbers (no factors!) by using the model.matrix function. 我想通过使用model.matrix函数创建仅包含数字(无因子!)的数据集的模型矩阵。

After which, remove the Survived variable from this dataset. 之后,从该数据集中删除Survived变量。

From this matrix, I would like to fit a PCA to the matrix from the previous step and plot the scores of the observations (using only the first 2 dimensions) and color them according to the Survived variable. 从这个矩阵中,我想将PCA拟合到上一步中的矩阵,并绘制观测值的分数(仅使用前两个维度),然后根据Survived变量为它们着色。

I have tried a few ways of doing this but it does not seem accurate nor does it color. 我尝试了几种方法来执行此操作,但它似乎不准确,也没有颜色。

 library(readr)
 library(dplyr)
 titanic_train <- read_csv("C:/Users/johnt/Desktop/Statistical Data Mining/HW 1/train.csv")

 titanic_train <- titanic_train %>% 
   select(Survived, Pclass, Sex, Age, SibSp, Parch, Fare, Embarked) %>% 
   mutate(Fare = log(Fare))


 ###### Model Matrix

 mm <- titanic_train %>% 
   select(Pclass, Age, SibSp, Parch, Fare, Survived) 

 titan <- model.matrix(-Survived ~., mm)

 #Clean it up
 titan <- titan[,-1] #remove intercept column
 titan <- scale(titan)
 titan[is.na(titan)] <- 0

 #PCA
 titan2 <-prcomp(titan[,-5], center = TRUE, scale. = TRUE)
 titan2


 plot(titan2$x[,1:2],col=mm$Survived)

Did you try ggbiplot(titan2) ? 您尝试过ggbiplot(titan2)吗?

Of course if you do you will have to filter for use of only the first two columns before running it on prcomp 当然,如果要这样做,则必须先过滤以仅使用前两列,然后才能在prcomp上运行它

And could you give an example of how you'd like the PCA to look ? 您能否举例说明PCA的外观?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM