简体   繁体   English

使用R的分类数据图

[英]Plot of categorical data using R

I've a list of protein names(P1,P2,...,Pn) and they are categorized to three different expression levels High(H), medium(M) and Low(L) as measured in three experimental conditions (Exp1,Exp2, and Exp3). 我有一个蛋白质名称列表(P1,P2,...,Pn),它们分为三个不同的表达水平高(H),中(M)和低(L),在三个实验条件下测量(Exp1 ,Exp2和Exp3)。 在此输入图像描述

I wish to make a plot as shown in the bottom part of the figure, with the name of the proteins at the left and name of experiments along the top and high, medium and low categories are indicated by Red,blue and green respectively. 我想制作一个如图的底部所示的图,左边是蛋白质的名称,顶部和高,中,低类别的实验名称分别用红色,蓝色和绿色表示。

I'm new to R, I would much appreciate any help. 我是R的新手,我非常感谢任何帮助。

Thanks in advance 提前致谢

You can create a file with data formatted like this (tab delimited): 您可以创建一个文件格式为这样的文件(制表符分隔):

pv   exp  val
1    1    H
2    1    L
3    1    L
4    1    M
1    2    H
2    2    H
3    2    M
4    2    H
1    3    L
2    3    L
3    3    L
4    3    M

And used the following commands to grab and plot them: 并使用以下命令来获取和绘制它们:

mat <- read.table(file.choose(),header=T) # read the file into memory mat <- read.table(file.choose(),header=T) #将文件读入内存

attach(mat) # map the header names to variable names attach(mat) #将标题名称映射到变量名称

plot(pv~exp,col=val) # plot the categories against each other and use val (H,M,L) as the color array. plot(pv~exp,col=val) #绘制彼此相对的类别,并使用val (H,M,L)作为颜色数组。 R will assign those values to colors on its own. R将自己将这些值分配给颜色。 You can also create a color array using the val array to translate (H,M,L) to (Blue,Red,Green)... but there is other documentation out there for that. 您还可以使用val数组创建一个颜色数组,将(H,M,L)转换为(蓝色,红色,绿色)......但是还有其他文档。

Here is an approach that uses some of the magic of the ggplot2 and reshape2 packages. 这是一种使用ggplot2reshape2包的一些魔力的方法。

First, recreate the data in the format you described: 首先,以您描述的格式重新创建数据:

df <- data.frame(
    P    = paste("P", 1:4, sep=""),
    Exp1 = c("L", "H", "L", "M"),
    Exp2 = c("M", "M", "L", "H"),
    Exp3 = c("H", "L", "L", "M"))

Next, load the add-on packages: 接下来,加载附加软件包:

library(reshape2)
library(ggplot2)

Then, use melt() to convert your data from wide format to tall format. 然后,使用melt()将数据从宽格式转换为高格式。 The id variable is "P", and we tell the function to rename the "variable" to "Exp": id变量是“P”,我们告诉函数将“变量”重命名为“Exp”:

mdf <- melt(df, id.vars="P", variable="Exp")

Because L - M - H has semantic order, we use the ordered parameter of factor() to inform R of this order: 因为L - M - H具有语义顺序,我们使用factor()ordered参数来通知R这个顺序:

mdf$value <- factor(mdf$value, levels=c("H", "M", "L"), ordered=TRUE)

Finally, we are ready to plot your data: 最后,我们准备绘制您的数据:

ggplot(mdf, aes(x=Exp, y=P, colour=value)) + 
    geom_point(size=3) + 
    scale_colour_manual(value=c("red", "green", "blue")) +
    xlab("") + 
    ylab("")

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM