[英]Add sample info to dataset in PCA (R)
Im a biologist, not a programmer so please be gentle.我是生物学家,不是程序员,所以请保持温柔。
So I have a dataset that looks like所以我有一个看起来像的数据集
Genes Patient1 Patient2 Patient3
A 324 433 343
B 431 342 124
Z 232 234 267
then I have the sample sheet where it contains sample info like:然后我有样本表,其中包含样本信息,例如:
Patient1 - Healthy
Patient2 - Disease
Patient3 - Healthy
I am using:我在用:
library(ggfortify)
df <- dataset
pca_res <- prcomp(df, scale. = TRUE)
autoplot(pca_res)
Then I want to do然后我想做
autoplot(pca_res, data = ?, colour = '?')
I wish to use the info from the sample sheet to color my PCA based on the state (healthy/disease) using the autoplot function.我希望使用样本表中的信息根据 state(健康/疾病)使用自动绘图 function 为我的 PCA 着色。 Is there a way to do this?有没有办法做到这一点?
First, I would create a complete data.frame with all information available.首先,我将创建一个包含所有可用信息的完整 data.frame。
For example, you will need to create this kind of data.frame:例如,您将需要创建这种 data.frame:
df=structure(list(A = c(324, 433, 343), B = c(431, 342, 124), Z = c(232,
234, 267), Status = c("Healthy", "Disease", "Healthy")), row.names = c("Patient1",
"Patient2", "Patient3"), class = "data.frame")
After, you could use the factoextra
package that is very handy for plotting PCA:之后,您可以使用非常方便绘制 PCA 的factoextra
package:
pca_res <- prcomp(df, scale. = TRUE)
library(factoextra)
fviz_pca_ind(pca_res, habillage=df$Status)
You can check the fviz_pca_ind
documentation to modify the color thereafter您可以查看fviz_pca_ind
文档以修改颜色
Edit:编辑:
To create the whole dataframe from your 2 datasets:要从您的 2 个数据集创建整个 dataframe:
1)Take your first dataframe and put the first column as rownames 1)取您的第一个 dataframe 并将第一列作为行名
rownames(df)=df$Genes
df=df[,-1] #remove the gene column in order to keep only the values
2)Formatting your second dataframe You should format it to havethe same columns as df (Patient1, Patient2,...) with for each one the disease status, that you will call df2 2)格式化您的第二个 dataframe 您应该将其格式化为具有与 df (Patient1, Patient2,...) 相同的列,每个列都有疾病状态,您将调用 df2
df2
rownames(df2)=c("Status")
Patient1 Patient2 Patient3
Healthy Disease Healthy
We don't know your data so you have to perform this by your own我们不知道您的数据,因此您必须自己执行此操作
3)Then you rbind df and df2 3)然后你 rbind df 和 df2
df3=rbind(df,df2)
df3=data.frame(t$df)
and then your perform PCA with df3然后你用 df3 执行 PCA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.