简体   繁体   English

R - 从数据框创建散点图

[英]R - Creating Scatter Plot from Data Frame

i've got a data frame all that look like this: 我有一个数据帧all看起来像这样:

http://pastebin.com/Xc1HEYyH http://pastebin.com/Xc1HEYyH

Now I want to create a scatter plot with the column headings in the x-axis and the respective values as the data points. 现在我想创建一个散点图,其中x轴的列标题和相应的值作为数据点。 For example: 例如:

7|                 x  
6|          x      x  
5|  x       x      x     x    
4|  x       x            x 
3|                             x      x  
2|                             x      x
1|
 ---------------------------------------
    STM    STM    STM   PIC   PIC    PIC
   cold   normal  hot  cold  normal  hot

This should be easy, but I can not figure out how. 这应该很容易,但我无法弄清楚如何。

Regards 问候

The basic idea, if you want to plot using Hadley's ggplot2 is to get your data of the form: 如果你想使用Hadley的ggplot2进行绘图,基本的想法是获取表格的数据:

        x          y
col_names     values

And this can be done by using melt function from Hadley's reshape2 . 这可以通过使用Hadley's reshape2 melt函数来完成。 Do ?melt to see the possible arguments. ?melt以查看可能的论点。 However, here since we want to melt the whole data.frame, we just need, 但是,这里因为我们想要融化整个data.frame,我们只需要,

melt(all) 
# this gives the data in format:
#   variable value
# 1 STM_cold   6.0
# 2 STM_cold   6.0
# 3 STM_cold   5.9
# 4 STM_cold   6.1
# 5 STM_cold   5.5
# 6 STM_cold   5.6

Here, x will be then column variable and y will be corresponding value column. 这里, x将是列variabley将是对应的value列。

require(ggplot2)
require(reshape2)
ggplot(data = melt(all), aes(x=variable, y=value)) + 
             geom_point(aes(colour=variable))

If you don't want the colours, then just remove aes(colour=variable) inside geom_point so that it becomes geom_point() . 如果你不想要颜色,那么只需删除geom_point中的aes(colour=variable) ,使其成为geom_point()

在此输入图像描述

Edit: I should probably mention here, that you could also replace geom_point with geom_jitter that'll give you, well, jittered points: 编辑:我也许应该提到这里,你还可以取代geom_pointgeom_jitter这会给你,好了,抖动点:

在此输入图像描述

Here are two options to consider. 这里有两个选项需要考虑。 The first uses dotplot from the "lattice" package: 第一个使用“lattice”包中的dotplot

library(lattice)
dotplot(values ~ ind, data = stack(all))

在此输入图像描述

The second uses dotchart from base R's "graphics" options. 第二个使用基础R的“图形”选项的dotchart To use the dotchart function, you need to wrap your data.frame in as.matrix : 要使用dotchart功能,需要将data.frame包装在as.matrix

dotchart(as.matrix(all), labels = "")

Note that the points in this graphic are not "jittered", but rather, presented in the order they were recorded. 请注意,此图形中的点不是 “抖动”,而是按记录顺序显示。 That is to say, the lowest point is the first record, and the highest point is the last record. 也就是说,最低点是第一个记录,最高点是最后一个记录。 If you zoomed into the plot for this example, you would see that you have 16 very faint horizontal lines. 如果你放大了这个例子的情节,你会发现你有16条非常微弱的水平线。 Each line represents one row from each column. 每行代表每列的一行。 Thus, if you look at the dots for "STM_cold" or any of the other variables that have NA values, you'll see a few blank lines at the top where there was no data available. 因此,如果您查看“STM_cold”或任何其他具有NA值的变量的点,您会在顶部看到一些空白行,其中没有可用数据。

This has its advantages since it might show a trend over time if the values are recorded chronologically, but might also be a disadvantage if there are too many rows in your source data frame. 这有其优点,因为如果按时间顺序记录值,它可能会显示随时间变化的趋势,但如果源数据框中的行太多,则可能也会有缺点。

在此输入图像描述

A bit of a manual version using base R graphics just for fun. 一些使用基础R图形的手动版本只是为了好玩。

Get the data: 获取数据:

test <- read.table(text="STM_cold STM_normal STM_hot PIC_cold PIC_normal PIC_hot
6.0 6.6 6.3 0.9 1.9 3.2
6.0 6.6 6.5 1.0 2.0 3.2
5.9 6.7 6.5 0.3 1.8 3.2
6.1 6.8 6.6 0.2 1.8 3.8
5.5 6.7 6.2 0.5 1.9 3.3
5.6 6.5 6.5 0.2 1.9 3.5
5.4 6.8 6.5 0.2 1.8 3.7
5.3 6.5 6.2 0.2 2.0 3.5
5.3 6.7 6.5 0.1 1.7 3.6
5.7 6.7 6.5 0.3 1.7 3.6
NA  NA  NA  0.1 1.8 3.8
NA  NA  NA  0.2 2.1 4.1
NA  NA  NA  0.2 1.8 3.3
NA  NA  NA  0.8 1.7 3.5
NA  NA  NA  1.7 1.6 4.0
NA  NA  NA  0.1 1.7 3.7",header=TRUE)

Set up the basic plot: 设置基本情节:

plot(
     NA,
     ylim=c(0,max(test,na.rm=TRUE)+0.3),
     xlim=c(1-0.1,ncol(test)+0.1),
     xaxt="n",
     ann=FALSE,
     panel.first=grid()
     )

axis(1,at=seq_along(test),labels=names(test),lwd=0,lwd.ticks=1)

Plot some points, with some x-axis jitter ing so they are not printed on top of one another. 绘制一些点,一些x轴jitter因此它们不会相互打印。

invisible(
  mapply(
        points,
        jitter(rep(seq_along(test),each=nrow(test))),
        unlist(test),
        col=rep(seq_along(test),each=nrow(test)),
        pch=19
        )
)

Result: 结果:

在此输入图像描述

edit 编辑

Here's an example using alpha transparency on the points and getting rid of the jitter as discussed in the below comments with Ananda. 这是一个使用点上的alpha透明度并消除jitter的示例,如下面与Ananda的评论中所述。

invisible(
  mapply(
        points,
        rep(seq_along(test),each=nrow(test)),
        unlist(test),
        col=rgb(0,0,0,0.1),
        pch=15,
        cex=3
        )
)

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM