简体   繁体   English

R中每一列的plot密度曲线如何?

[英]How to plot density curves for each column in R?

I have a data frame w like this:我有一个这样的数据w

>head(w,3)
         V1        V2         V3        V4 V5        V6         V7        V8        V9       V10 V11        V12        V13        V14
1 0.2446884 0.3173719 0.74258410 0.0000000  0 0.0000000 0.01962759 0.0000000 0.0000000 0.5995647   0 0.30201691 0.03109935 0.16897571
2 0.0000000 0.0000000 0.08592243 0.2254971  0 0.7381867 0.11936323 0.2076167 0.0000000 1.0587742   0 0.50226734 0.51295661 0.01298853
3 8.4293893 4.9985040 2.22526463 0.0000000  0 3.6600283 0.00000000 0.0000000 0.2573714 0.8069288   0 0.05074886 0.00000000 0.59403855
         V15       V16      V17       V18      V19       V20       V21      V22         V23        V24       V25       V26       V27
1 0.00000000 0.0000000 0.000000 0.1250837 0.000000 0.5468143 0.3503245 0.000000 0.183144204 0.23026538 6.9868429 1.5774150 0.0000000
2 0.01732732 0.8064441 0.000000 0.0000000 0.000000 0.0000000 0.0000000 0.000000 0.015123385 0.07580794 0.6160713 0.7452335 0.0740328
3 2.66846151 0.0000000 1.453987 0.0000000 1.875298 0.0000000 0.0000000 0.893363 0.004249061 0.00000000 1.6185897 0.0000000 0.7792773
        V28 V29     V30       V31        V32        V33       V34       V35 V36        V37        V38       V39        V40    refseq
1 0.5543028   0 0.00000 0.0000000 0.08293075 0.18261450 0.3211127 0.2765295   0 0.04230929 0.05017316 0.3340662 0.00000000 NM_000014
2 0.0000000   0 0.00000 0.0000000 0.00000000 0.03531411 0.0000000 0.4143325   0 0.14894716 0.58056304 0.3310173 0.09162460 NM_000015
3 0.8047882   0 0.88308 0.7207709 0.01574767 0.00000000 0.0000000 0.1183736   0 0.00000000 0.00000000 1.3529881 0.03720155 NM_000016

dim(w)
[1] 37126    41

I tried to plot the density curve of each column(except the last column) in one page.我试过 plot 一页中每一列(最后一列除外)的密度曲线。 It seems that ggplot2 can do this.好像ggplot2可以做到这一点。

I tried this according to this post :我根据这篇文章尝试了这个:

ggplot(data=w[,-41], aes_string(x=colnames)) + geom_density()

But it doesn't work by complaining like this:但是这样抱怨是行不通的:

Error in as.character(x) : 
  cannot coerce type 'closure' to vector of type 'character'

And I'm not sure how to convert the format of this dataframe to the one ggplot2 accepts.而且我不确定如何将此 dataframe 的格式转换为 ggplot2 接受的格式。 Or is there other way to do this job in R?或者有其他方法可以在 R 中完成这项工作吗?

ggplot needs your data in a long format, like so: ggplot需要长格式的数据,如下所示:

variable  value
1 V1  0.24468840
2 V1  0.00000000
3 V1  8.42938930
4 V2  0.31737190

Once it's melted into a long data frame, you can group all the density plots by variable. 一旦融入长数据框,您就可以按变量对所有密度图进行分组。 In the snippet below, ggplot uses the w.plot data frame for plotting (which doesn't need to omit the final refseq variable). 在下面的代码片段中, ggplot使用w.plot数据框进行绘图(不需要省略最终的refseq变量)。 You can modify it to use facets, different colors, fills, etc. 您可以将其修改为使用构面,不同颜色,填充等。

w <- as.data.frame(cbind(
  c(0.2446884, 0.0000000, 8.4293893), 
  c(0.3173719, 0.0000000, 4.9985040), 
  c(0.74258410, 0.08592243, 2.22526463)))
w$refseq <- c("NM_000014", "NM_000015", "NM_000016")

library(ggplot2)
library(reshape2)
w.plot <- melt(w) 

p <- ggplot(aes(x=value, colour=variable), data=w.plot)
p + geom_density()

示例图

Use "melt" from the "reshape" package (you could also use the base reshape function, but it's a more complicated call). 使用“reshape”包中的“melt”(你也可以使用base reshape函数,但这是一个更复杂的调用)。

require (reshape)
require (ggplot2)
long = melt(w, id.vars= "refseq")

ggplot(long, aes (value)) +
    geom_density(color = variable)

# or maybe you wanted separate plots on the same page?

ggplot(long, aes (value)) +
    geom_density() +
    facet_wrap(~variable)

There are lots of other ways to plot this in ggplot: see http://docs.ggplot2.org/0.9.3.1/geom_histogram.html for examples. 在ggplot中有很多其他方法可以绘制这个:请参阅http://docs.ggplot2.org/0.9.3.1/geom_histogram.html以获取示例。

Here's a solution using the plot function and a little loop 这是一个使用plot函数和一个小循环的解决方案

Call your plot 打电话给你的情节

plot(density(df[,1]), type = "n")

then run this to add the lines 然后运行它来添加行

n = dim(df)[2]-1
for(i in 1:n){
lines(density(c(df[,i])))
}

This will make a 8 x 5 grid of the density plots with multiple lines on each plot coloured by the variable refseq...这将制作一个 8 x 5 的密度图网格,每个 plot 上有多条线,由变量 refseq 着色......

library(tidyverse)

w_density <- w[,1:40]  # columns you want densities for
w_density$refseq <- w$refseq  # maybe you have a variable to group by

w_density %>%
    pivot_longer(!refseq, names_to = "variable", values_to = "value") %>%
    ggplot(aes(x = value, colour = refseq)) +
    geom_density(show.legend = TRUE) + 
    facet_wrap(~variable, scales = "free", ncol = 5) + 
    ggtitle("Title goes here")

If the grid is not the right size and you're using Rmd then you can play with the chunk sizes...如果网格大小不正确并且您正在使用 Rmd 那么您可以使用块大小......

```{r, fig.height=20, fig.width=11}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM