简体   繁体   English

R ggplot2对数切割,在x轴上具有负值和正值,在y轴上每个仓位均值

[英]R ggplot2 logarithmic cut with negative and positive values on x-axis and mean per bin of y-axis

I am looking for a way to plot the distribution of the mean values of one variable across bins of log2 values of another variable (which has positive and negative values), exploiting the more complicated functions in ggplot2. 我正在寻找一种方法来利用ggplot2中更复杂的函数来绘制一个变量的平均值分布在另一个变量(具有正值和负值)的log2值的区间中。 I think I am majorly complicating this and it is probably hard coded in ggplot2 refined options, but I cannot get it right so before going back to the basics I thought I may try to learn how to apply these functions here. 我认为我正在使这一过程变得很复杂,并且它可能在ggplot2精致选项中进行了硬编码,但是我无法正确实现,因此在回到基础知识之前,我认为我可以尝试学习如何在此处应用这些功能。

value <- rnorm(1000,0,20)
dist = c(rep(0, 15), sample(1:490), sample(-1:-495))
data = data.frame(value=value, dist=dist)

data$log=log2(abs(data$dist)+1)
# re-lable the x-axis: 
data$Labels=2^(abs(data$log))-1

data$bins=cut(data$log, breaks=10)
# Try to recover the negative log after transformation
data$sign=ifelse(data$dist==0, 0, ifelse(data$dist>0, "+", "-"))

# find the average expression of value per each bin
data=with(data, aggregate(data$value, by = list(bins, sign), FUN =    function(x) c(mn =mean(x), n=length(x) )))
data= as.data.frame(as.list(data))
names(data)=c("bins", "sign", "mean", "length")

# I am doing this in a very contorted way to try to achieve what I would like which is something like this:

bin_num = do.call("rbind", lapply(strsplit(sapply(as.character(data$bins), function(x) substr(x, 2, nchar(x)-1)), ","), as.numeric))
data$bin_num=bin_num[,1]
data$bin_num=ifelse(data$sign==0, 0, ifelse(data$sign=="-", -data$bin_num, data$bin_num))
data = data[order(data$bin_num),]

data <- transform(data, x2 = factor(paste(sign, bins)))
data <- transform(data, x2 = reorder(x2, rank(bin_num)))

# Line plot to show the distribution of the means across the bins of log2 of x:
ggplot(data, aes(y = mean, x = bin_num, group=1)) +  geom_point() + geom_line()

# Then I am trying to re-label the logarithmic transformations here by adding labels, but of course it is not working: #然后,我尝试通过添加标签在此处重新标记对数转换,但是当然它不起作用:

ggplot(data, aes(y = mean, x = bin_num, group=1)) +  geom_point() + geom_line() + scale_x_discrete(labels=data$dist, breaks=data$bin_num)

I see that ggplot2 has functionalities to directly compute the mean so I maybe would not even need the previous commands. 我看到ggplot2具有直接计算均值的功能,因此我什至不需要以前的命令。 I tried: 我试过了:

ggplot(data, aes(x = bins, y = mean)) + stat_summary(fun.y = "mean") +     geom_line() + scale_x_continuous(breaks = labels)

But of course it doesn't work... I also saw that the ggplo2 has functions to automatically help with logarithmic labelling instead of what I used here, but I don't see how to do this when there are negative values to be logged. 但是当然这是行不通的...我还看到了ggplo2具有自动对数标记的功能,而不是我在这里使用的功能,但是当要记录负值时,我看不到如何执行此操作。 There is a very nice function from another question here which converts the two values, but I don't see it useful at this stage. 有一个从另外一个问题一个很好的功能, 在这里 ,其将该两个值,但我没有看到它在这个阶段是有用的。 Thanks very much for any suggestions on how to go about this...really appreciated! 非常感谢您提出建议,非常感谢!

First version of an answer, using data.table for faster speeds and better readability: 答案的第一个版本,使用data.table实现更快的速度和更好的可读性:

The code reproduces the question with shorter and faster code 该代码使用更短和更快的代码来再现问题

library(data.table)

# function that returns the lower bound of a cut
lower.bound <- function(x, n) {
  c <- cut(x, n)
  tmp <- substr(x = c, start = 2, stop = regexpr(",", c) - 1)
  return(as.numeric(tmp))
}

nbin <- 10
set.seed(123)
dat <- data.table(value = rnorm(1000,0, 20),
                  dist = c(rep(0, 15), sample(1:490), sample(-1:-495)))

dat[, log := log2(abs(dist) + 1)]
dat[, labels := 2^(abs(log))]
dat[, sign := ifelse(dist == 0, 
                     0,
                     ifelse(dist > 0, "+", "-"))]

dat[, bin := ifelse(sign == 0, 
                    0,
                    ifelse(sign == "+", 
                           lower.bound(log, nbin),
                           -lower.bound(log, nbin)))]

sumdat <- dat[, .(mvalue = mean(value),
                  nvalue = .N,
                  ylab = mean(dist)), 
                 by = .(bin, sign)][order(bin)]

ggplot(sumdat, aes(x = ylab, y = mvalue)) + geom_line()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM