[英]How to structure dataset to run a PCA?
basically my problem is that I want to run a PCA analysis, but my data is not structured properly. 基本上,我的问题是我想运行PCA分析,但是我的数据结构不正确。 Hopefully this image will let you understand what I mean:
希望这张图片能使您理解我的意思:
trial.one.two <- na.omit(trial.one.one)
head(trial.one.two)
v79 v81 v82 Q.One Q.Two Q.Three
2 Disagrees a little Agrees a little Disagrees a little 3 2 3
3 Agrees a lot Agrees a lot Disagrees a little 1 1 3
4 Agrees a little Disagrees a lot Disagrees a lot 2 4 4
5 Agrees a lot Agrees a lot Disagrees a lot 1 1 4
6 Agrees a little Agrees a lot Agrees a little 2 1 2
8 Agrees a lot Agrees a little Agrees a lot 1 2 1
The data I'm working with is a survey conducted among 5000+ individuals, and I want to know how many have answered for example "Agrees a lot" : 2253 , "Agrees a little" : 2005 , etc. I need the data to allocate in the following way: 我正在使用的数据是在5000多个个人中进行的一项调查,我想知道有多少人回答了例如“非常同意”:2253,“有点同意”:2005等。我需要这些数据来按以下方式分配:
1 "Agrees a lot" 2 "Agrees a little" 3 "Disagrees a little" 4 "Disagrees a lot" 1“非常同意” 2“有点同意” 3“有点不同意” 4“很多不同意”
Where 1 is Component 1, 2 is Component 2 and so on, basically I want to run a PCA. 其中1是组件1,2是组件2,依此类推,基本上,我想运行PCA。
Can anyone guide me into what I should do? 谁能指导我该怎么做?
----------UPDATE------------- ----------更新-------------
After I implemented : 在实施之后:
convert.factor <- function(val){
if(val == "Agrees a lot"){
return(1)
} else if(val == "Agrees a little") {
return(2)
} else if(val == "Disagrees a little") {
return(3)
} else if(val == "Disagrees a lot") {
return(4)
}
}
trial.one.two$v79 <- sapply(trial.one.two$v79, convert.factor)
trial.one.two$v81 <- sapply(trial.one.two$v81, convert.factor)
trial.one.two$v82 <- sapply(trial.one.two$v82, convert.factor)
head(trial.one.two)
v79 v81 v82 Q.One Q.Two Q.Three
2 3 2 3 3 2 3
3 1 1 3 1 1 3
4 2 4 4 2 4 4
5 1 1 4 1 1 4
6 2 1 2 2 1 2
8 1 2 1 1 2 1
You can do something along the lines of 您可以按照以下方式进行操作
convert.factor <- function(val){
if(val == "Agrees a lot"){
return(1)
} else if(val == "Agrees a little") {
return(2)
} else if(val == "Disagrees a little") {
return(3)
} else if(val == "Disagrees a lot") {
return(4)
}
}
trial.one.two$v79 <- sapply(trial.one.two$v79, convert.factor)
trial.one.two$v81 <- sapply(trial.one.two$v81, convert.factor)
trial.one.two$v82 <- sapply(trial.one.two$v82, convert.factor)
Alternatively, if you are just looking for how often people answered each category you could do something like: 另外,如果您只是在寻找人们回答每个类别的频率,您可以执行以下操作:
table(trial.one.two$v79)
Note that there is no reason to convert the variable first in that case. 请注意,在这种情况下,没有理由先转换变量。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.