简体   繁体   中英

How can loading factors from PCA be used to calculate an index that can be applied for each individual in a data frame in R?

I am using principal component analysis (PCA) based on ~30 variables to compose an index that classifies individuals in 3 different categories (top, middle, bottom) in R.

I have a dataframe of ~2000 individuals with 28 binary and 2 continuous variables.

Now, I would like to use the loading factors from PC1 to construct an index that classifies my 2000 individuals for these 30 variables in 3 different groups.

Problem: Despite extensive research, I could not find out how to extract the loading factors from PCA_loadings, give each individual a score (based on the loadings of the 30 variables), which would subsequently allow me to rank each individual (for further classification). Does it make sense to display the loading factors in a graph?

  1. I've performed the following steps:

a) Ran a PCA using PCA_outcome <- prcomp(na.omit(df1), scale = T)

b) Extracted the loadings using PCA_loadings <- PCA_outcome$rotation

c) Removed all the variables for which the loading factors were close to 0.

  1. I have considered creating 30 new variable, one for each loading factor, which I would sum up for each binary variable == 1 (though, I am not sure how to proceed with the continuous variables). Consequently, I would assign each individual a score. However, I would not know how to assemble the 30 values from the loading factors to a score for each individual.

R code

df1 <- read.table(text=" 
          educ     call      house  merge_id    school  members       
A           1        0          1      12_3        0      0.9
B           0        0          0      13_3        1      0.8
C           1        1          1      14_3        0      1.1
D           0        0          0      15_3        1      0.8 
E           1        1          1      16_3        3      3.2", header=T)


## Run PCA
PCA_outcome <- prcomp(na.omit(df1), scale = T)

## Extract loadings
PCA_loadings <- PCA_outcome$rotation


## Explanation: A-E are 5 of the 2000 individuals and the variables (education, call, house, school, members) represent my 30 variables (binary and continuous).

Expected results: - Get a rank score for each individual - Subsequently, assign a category 1-3 to each individual.

I'm not 100% sure what you're asking, but here's an answer to the question I think you're asking.

First of all, PC1 of a PCA won't necessarily provide you with an index of socio-economic status. As explained here , PC1 simply "accounts for as much of the variability in the data as possible". PC1 may well work as a good metric for socio-economic status for your data set, but you'll have to critically examine the loadings and see if this makes sense. Depending on the signs of the loadings, it could be that a very negative PC1 corresponds to a very positive socio-economic status. As I say: look at the results with a critical eye. An explanation of how PC scores are calculated can be found here . Anyway, that's a discussion that belongs on Cross Validated , so let's get to the code.

It sounds like you want to perform the PCA, pull out PC1, and associate it with your original data frame (and merge_id s). If that's your goal, here's a solution.

# Create data frame
df <- read.table(text = "educ     call      house  merge_id    school  members       
A           1        0          1      12_3        0      0.9
B           0        0          0      13_3        1      0.8
C           1        1          1      14_3        0      1.1
D           0        0          0      15_3        1      0.8 
E           1        1          1      16_3        3      3.2", header = TRUE)

# Perform PCA
PCA <- prcomp(df[, names(df) != "merge_id"], scale = TRUE, center = TRUE)

# Add PC1
df$PC1 <- PCA$x[, 1]

# Look at new data frame
print(df)
#>   educ call house merge_id school members        PC1
#> A    1    0     1     12_3      0     0.9  0.1000145
#> B    0    0     0     13_3      1     0.8  1.6610864
#> C    1    1     1     14_3      0     1.1 -0.8882381
#> D    0    0     0     15_3      1     0.8  1.6610864
#> E    1    1     1     16_3      3     3.2 -2.5339491

Created on 2019-05-30 by the reprex package (v0.2.1.9000)

As you say you have to use PCA, I'm assuming this is for a homework question, so I'd recommend reading up on PCA so that you get a feel of what it does and what it's useful for.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM