简体   繁体   English

将样本信息添加到 PCA (R) 中的数据集

[英]Add sample info to dataset in PCA (R)

Im a biologist, not a programmer so please be gentle.我是生物学家,不是程序员,所以请保持温柔。

So I have a dataset that looks like所以我有一个看起来像的数据集

Genes  Patient1   Patient2   Patient3
A          324      433         343
B          431       342        124
Z          232       234        267

then I have the sample sheet where it contains sample info like:然后我有样本表,其中包含样本信息,例如:

Patient1 - Healthy
Patient2 - Disease
Patient3 - Healthy

I am using:我在用:

library(ggfortify)
df <- dataset
pca_res <- prcomp(df, scale. = TRUE)

autoplot(pca_res)

Then I want to do然后我想做

autoplot(pca_res, data = ?, colour = '?')

I wish to use the info from the sample sheet to color my PCA based on the state (healthy/disease) using the autoplot function.我希望使用样本表中的信息根据 state(健康/疾病)使用自动绘图 function 为我的 PCA 着色。 Is there a way to do this?有没有办法做到这一点?

First, I would create a complete data.frame with all information available.首先,我将创建一个包含所有可用信息的完整 data.frame。

For example, you will need to create this kind of data.frame:例如,您将需要创建这种 data.frame:

df=structure(list(A = c(324, 433, 343), B = c(431, 342, 124), Z = c(232, 
234, 267), Status = c("Healthy", "Disease", "Healthy")), row.names = c("Patient1", 
"Patient2", "Patient3"), class = "data.frame")

After, you could use the factoextra package that is very handy for plotting PCA:之后,您可以使用非常方便绘制 PCA 的factoextra package:

pca_res <- prcomp(df, scale. = TRUE)
library(factoextra)
fviz_pca_ind(pca_res, habillage=df$Status)

You can check the fviz_pca_ind documentation to modify the color thereafter您可以查看fviz_pca_ind文档以修改颜色

Edit:编辑:

To create the whole dataframe from your 2 datasets:要从您的 2 个数据集创建整个 dataframe:

1)Take your first dataframe and put the first column as rownames 1)取您的第一个 dataframe 并将第一列作为行名

rownames(df)=df$Genes
df=df[,-1] #remove the gene column in order to keep only the values

2)Formatting your second dataframe You should format it to havethe same columns as df (Patient1, Patient2,...) with for each one the disease status, that you will call df2 2)格式化您的第二个 dataframe 您应该将其格式化为具有与 df (Patient1, Patient2,...) 相同的列,每个列都有疾病状态,您将调用 df2

df2
rownames(df2)=c("Status")

Patient1   Patient2   Patient3
Healthy   Disease   Healthy

We don't know your data so you have to perform this by your own我们不知道您的数据,因此您必须自己执行此操作

3)Then you rbind df and df2 3)然后你 rbind df 和 df2

df3=rbind(df,df2)
df3=data.frame(t$df)

and then your perform PCA with df3然后你用 df3 执行 PCA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM