简体   繁体   English

如何仅通过读取具有数值数据的数据集的列来执行PCA?

[英]How to perform a PCA by reading in ONLY the columns of a dataset that have numeric data?

I am trying to do a PCA of monthly temperatures, but I am given a dataset that has more columns than just the monthly data. 我正在尝试进行每月温度的PCA,但是我得到的数据集不仅包含每月数据,而且具有更多的列。 How do I only read in the month columns to perform the PCA? 如何仅阅读月份列来执行PCA? Here is everything I have so far: 这是我到目前为止所拥有的一切:

dat_TEMP=read.table("TEMPERATURE.csv",header=TRUE, sep=";", dec=",",row.names=1)
attach(dat_TEMP)
df=data.frame(January,February,March,April,May,June,July,August,September,October,November,December)
dat.pca=prcomp(df,dat_TEMP,center=T,scale=T)

but when I try to run that last line it gives me this error: "Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric" 但是当我尝试运行最后一行时,它给了我这个错误:“ colMeans(x,na.rm = TRUE)错误:'x'必须是数字”

Can anyone help me with this? 谁能帮我这个? What do I need to do to just read out the month columns? 我只需要读出月份列就需要做什么?

You need to make sure that in extraction your numeric columns arent passed as character or factors. 您需要确保在提取数字列时不将其作为字符或因子传递。 If not , you can then subset the data with numeric columns and then run PCA. 如果不是,则可以使用数字列对数据进行子集,然后运行PCA。

There are multiple ways you can subset the data with only numeric columns . 您可以通过多种方式仅对数字列进行数据子集化。

using select_if() from dplyr 使用dplyr中的select_if()

library("dplyr")
data.numeric=select_if(data, is.numeric)

using apply functions 使用套用功能

colnums <- sapply(data, is.numeric)
data[ , colnums]

Alternatively 或者

data[, sapply(data, class) == "numeric"]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM