简体   繁体   English

如何构造数据集以运行PCA?

[英]How to structure dataset to run a PCA?

basically my problem is that I want to run a PCA analysis, but my data is not structured properly. 基本上,我的问题是我想运行PCA分析,但是我的数据结构不正确。 Hopefully this image will let you understand what I mean: 希望这张图片能使您理解我的意思:

trial.one.two <- na.omit(trial.one.one)
head(trial.one.two)
                 v79             v81                v82 Q.One Q.Two Q.Three
2 Disagrees a little Agrees a little Disagrees a little     3     2       3
3       Agrees a lot    Agrees a lot Disagrees a little     1     1       3
4    Agrees a little Disagrees a lot    Disagrees a lot     2     4       4
5       Agrees a lot    Agrees a lot    Disagrees a lot     1     1       4
6    Agrees a little    Agrees a lot    Agrees a little     2     1       2
8       Agrees a lot Agrees a little       Agrees a lot     1     2       1

The data I'm working with is a survey conducted among 5000+ individuals, and I want to know how many have answered for example "Agrees a lot" : 2253 , "Agrees a little" : 2005 , etc. I need the data to allocate in the following way: 我正在使用的数据是在5000多个个人中进行的一项调查,我想知道有多少人回答了例如“非常同意”:2253,“有点同意”:2005等。我需要这些数据来按以下方式分配:

1 "Agrees a lot" 2 "Agrees a little" 3 "Disagrees a little" 4 "Disagrees a lot" 1“非常同意” 2“有点同意” 3“有点不同意” 4“很多不同意”

Where 1 is Component 1, 2 is Component 2 and so on, basically I want to run a PCA. 其中1是组件1,2是组件2,依此类推,基本上,我想运行PCA。

Can anyone guide me into what I should do? 谁能指导我该怎么做?

----------UPDATE------------- ----------更新-------------

After I implemented : 在实施之后:

convert.factor <- function(val){
  if(val == "Agrees a lot"){
    return(1)
  } else if(val == "Agrees a little") {
    return(2)
  } else if(val == "Disagrees a little") {
    return(3)
  } else if(val == "Disagrees a lot") {
    return(4)
  }
}

trial.one.two$v79 <- sapply(trial.one.two$v79, convert.factor)
trial.one.two$v81 <- sapply(trial.one.two$v81, convert.factor)
trial.one.two$v82 <- sapply(trial.one.two$v82, convert.factor)

head(trial.one.two)
  v79 v81 v82 Q.One Q.Two Q.Three
2   3   2   3     3     2       3
3   1   1   3     1     1       3
4   2   4   4     2     4       4
5   1   1   4     1     1       4
6   2   1   2     2     1       2
8   1   2   1     1     2       1

You can do something along the lines of 您可以按照以下方式进行操作

convert.factor <- function(val){
  if(val == "Agrees a lot"){
    return(1)
  } else if(val == "Agrees a little") {
    return(2)
  } else if(val == "Disagrees a little") {
    return(3)
  } else if(val == "Disagrees a lot") {
    return(4)
  }
}

trial.one.two$v79 <- sapply(trial.one.two$v79, convert.factor)
trial.one.two$v81 <- sapply(trial.one.two$v81, convert.factor)
trial.one.two$v82 <- sapply(trial.one.two$v82, convert.factor)

Alternatively, if you are just looking for how often people answered each category you could do something like: 另外,如果您只是在寻找人们回答每个类别的频率,您可以执行以下操作:

table(trial.one.two$v79)

Note that there is no reason to convert the variable first in that case. 请注意,在这种情况下,没有理由先转换变量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM