簡體   English   中英

R中具有“負”對數標度的直方圖

[英]Histogram with “negative” logarithmic scale in R

我有一個包含一些異常值的數據集,如下所示

x <- rnorm(1000,0,20)
x <- c(x, 500, -500)

如果我們在線性x軸刻度上繪制這個,我們看到

histogram(x)

非對數x軸

我使用這個有用的線程編寫了一個很好的方法將它放在對數刻度上: 如何在R中使用直方圖的y軸的對數刻度?

mat <- data.frame(x)
ggplot(ee, aes(x = xx)) + geom_histogram(colour="darkblue", size=1, fill="blue") + scale_x_log10()

記錄x軸

但是,我希望第二個例子中的x軸標簽與第一個例子的x軸標簽相匹配,除了一種“負日志” - 即第一個標記(從中心向左移動)可以是-1,然后是接下來可能是-10,下一個-100,但都是等距的。 那有意義嗎?

我不確定我理解你的目標,但是當你想要一個類似於日志的轉換但是有零或負值時,反雙曲正弦變換asinh()通常是一個不錯的選擇。 對於大值,它是類似日志的,並且是針對所有實際值定義的。 有關討論,詳細信息和其他選項,請參閱有關stats.stackexchange.com的 Rob Hyndman的博客此問題

如果這是一種可接受的方法,您可以為ggplot創建自定義比例。 下面的代碼演示了如何創建和使用自定義比例(使用自定義中斷),以及asinh()轉換的可視化。

library(ggplot2)
library(scales)

limits <- 100
step <- 0.005
demo <- data.frame(x=seq(from=-1*limits,to=limits,by=step))

asinh_trans <- function(){
  trans_new(name = 'asinh', transform = function(x) asinh(x), 
            inverse = function(x) sinh(x))
}

ggplot(demo,aes(x,x))+geom_point(size=2)+
     scale_y_continuous(trans = 'asinh',breaks=c(-100,-50,-10,-1,0,1,10,50,100))+
     theme_bw()

在此輸入圖像描述

ggplot(demo,aes(x,x))+geom_point(size=2)+
     scale_x_continuous(trans = 'asinh',breaks=c(0,1,10,50,100))+
     scale_y_log10(breaks=c(0,1,10,50,100))+ # zero won't plot
     xlab("asinh() scale")+ylab("log10 scale")+
     theme_bw()

在此輸入圖像描述

意識到這個問題相當陳舊,我決定回答它,因為我遇到了完全相同的問題。

我看到上面的一些答案誤解了你原來的問題。 我認為這是一個有效的可視化問題,我在下面概述了我的解決方案,希望對其他人也有用。

我的方法是使用ggplot並為xy軸創建自定義日志轉換(以及自定義中斷生成器)

library(ggplot2)
library(scales)

# Create custom log-style x axis transformer (...,-10,-3,-1,0,1,3,10,...)
custom_log_x_trans <- function()
  trans_new("custom_log_x",
            transform = function (x) ( sign(x)*log(abs(x)+1) ),
            inverse = function (y) ( sign(y)*( exp(abs(y))-1) ),
            domain = c(-Inf,Inf))

# Custom log x breaker (...,-10,-3,-1,0,1,3,10,...)
custom_x_breaks <- function(x)
{ 
  range <- max(abs(x), na.rm=TRUE)

  return (sort( c(0,
                  sapply(0:log10(range), function(z) (10^z) ),
                  sapply(0:log10(range/3), function(z) (3*10^z) ),
                  sapply(0:log10(range), function(z) (-10^z) ),
                  sapply(0:log10(range/3), function(z) (-3*10^z) )
  )))
}

# Create custom log-style y axis transformer (0,1,3,10,...)
custom_log_y_trans <- function()
  trans_new("custom_log_y",
            transform = function (x) ( log(abs(x)+1) ),
            inverse = function (y) ( exp(abs(y))-1 ),
            domain = c(0,Inf))

# Custom log y breaker (0,1,3,10,...)
custom_y_breaks <- function(x)
{ 
  max_y <- length(x)

  range <- max(abs(max_y), na.rm=TRUE)

  return (sort( c(0,
                  sapply(0:log10(range), function(z) (10^z) ),
                  sapply(0:log10(range/3), function(z) (3*10^z) )
  )))
}

ggplot(data=mat) +
  geom_histogram(aes(x=x,fill=..count..), 
                 binwidth = 1, color="black", size=0.1) +
  scale_fill_gradient("Count", low = "steelblue", high = "red") +
  coord_trans(x="custom_log_x",y="custom_log_y") +
  scale_x_continuous(breaks = custom_x_breaks(mat$x)) +
  scale_y_continuous(breaks = custom_y_breaks(mat$x)) +
  theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))  + 
  theme_bw()

這給了我以下情節。

在此輸入圖像描述

注意:

  • 該圖還包括着色方案,以便在視覺上顯示每個條的絕對值。
  • 隨着x增加,箱子變得越來越薄(對數變換的副作用)

在任何一種情況下, 兩個異常值都清晰可見

我找到了欺騙它的方法。 我說“作弊”,因為它實際上分別繪制了數據的負面和正面部分。 因此,您無法比較負面和正面數據。 但只能分別顯示負面和正面部分的分布。

其中一個問題是如果數據中的值為零,則不會在圖中顯示。

reverselog_trans <- function(base = exp(1)) {
  trans <- function(x) -log(x, base)
  inv <- function(x) base^(-x)
  trans_new(paste0("reverselog-", format(base)), trans, inv, 
            log_breaks(base = base), 
            domain = c(1e-100, Inf))
}

quartz();


dist1 <- ggplot(data=df.meltFUAC) +
  geom_point(alpha=1,aes(x=deltaU.deltaUltrasensitivity,y=deltaF.deltaFitness, 
                         colour=deltaF.w_c)) + 
  scale_x_continuous(name = expression(Delta * S[ult]), 
                     limits=c(1e-7,1),trans = "log10",breaks=c(1e-01,1e-03,1e-05),
                     labels=c("1e-01","1e-03","1e-05")) + 
  scale_y_continuous(name = expression(paste(Delta, " Fitness")),trans = "log10", 
                     limits = c(1e-7,1), breaks=c(1e-01,1e-03,1e-05),
                     labels=c("1e-01","1e-03","1e-05")) +
  theme_bw() +
  theme(legend.position = "none", axis.title.x=element_blank(),strip.background=element_blank(),
        panel.border=element_rect(colour = "black"),panel.grid.major=element_blank(),
        panel.grid.minor=element_blank(),plot.background=element_blank(),
        plot.margin=unit(c(0,0,0,-11),"mm"))

dist2 <- ggplot(data=df.meltFUAC, aes(x=-deltaU.deltaUltrasensitivity,y=deltaF.deltaFitness, 
                                      colour=deltaF.w_c)) +
  geom_point(alpha=1) + 
  scale_x_continuous(name = expression(Delta * sqrt(S[ult] %.% S[amp])),limits=c(1,1e-7),
                     trans = reverselog_trans(10),breaks=c(1e-01,1e-03,1e-05),
                     labels=c("-1e-01","-1e-03","-1e-05")) +
  scale_y_continuous(name = expression(paste(Delta, " Fitness")),trans = "log10", 
                     limits = c(1e-7,1), breaks=c(1e-01,1e-03,1e-05),
                     labels=c("1e-01","1e-03","1e-05")) +
  theme_bw() +
  theme(legend.position = "none",strip.background=element_blank(),panel.border=element_rect(colour = "black"),
        axis.text.y=element_blank(), axis.ticks.y=element_blank(), axis.title.y=element_blank(),
        axis.line.y=element_line(colour="black",size=1,linetype="solid"),axis.title.x=element_blank(),
        panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank(),
        plot.margin=unit(c(0,-8,0,2.5),"mm"))

hist0 <- ggplot(data=df.meltFUAC, aes(deltaF.deltaFitness,fill=deltaF.w_c)) +
  #geom_histogram(alpha = 0.5, aes(y=..density..),position = 'identity') +
  geom_density(alpha = 0.5, aes(colour=deltaF.w_c)) +
  scale_x_continuous(name = expression(paste(Delta, " Fitness")), 
                     limits=c(1e-7,1),trans = "log10",breaks=c(1e-01,1e-03,1e-05),
                     labels=c("1e-01","1e-03","1e-05")) + 
  scale_y_continuous(name = "Density", limits=c(0,0.6)) + 
  theme_bw() +
  theme(legend.position = "none", axis.title.x=element_blank(),strip.background=element_blank(),
        axis.text.y=element_blank(), axis.ticks.y=element_blank(), axis.title.y=element_blank(),
        axis.text.x=element_blank(), axis.ticks.x=element_blank(), axis.title.x=element_blank(),
        panel.border=element_rect(colour = "black"),panel.grid.major=element_blank(),
        panel.grid.minor=element_blank(),plot.background=element_blank(),
        plot.margin=unit(c(0,5,2.5,-2.5),"mm")) +
  coord_flip()

hist1 <- ggplot(data=df.meltFUAC, aes(deltaU.deltaUltrasensitivity,fill=deltaF.w_c)) +
  #geom_histogram(alpha = 0.5, aes(y=..density..),position = 'identity') +
  geom_density(alpha = 0.5, aes(colour=deltaF.w_c)) +
  scale_x_continuous(name = expression(Delta * S[ult]), 
                     limits=c(1e-7,1),trans = "log10",breaks=c(1e-01,1e-03,1e-05),
                     labels=c("1e-01","1e-03","1e-05")) + 
  scale_y_continuous(name = "Density", limits=c(0,0.6)) + 
  theme_bw() +
  theme(legend.position = "none", axis.title.x=element_blank(),strip.background=element_blank(),
        axis.text.y=element_blank(), axis.ticks.y=element_blank(), axis.title.y=element_blank(),
        axis.text.x=element_blank(), axis.ticks.x=element_blank(), axis.title.x=element_blank(),
        axis.line.x=element_line(colour="black",size=1,linetype="solid"),
        panel.border=element_rect(colour = "black"),panel.grid.major=element_blank(),
        panel.grid.minor=element_blank(),plot.background=element_blank(),
        plot.margin=unit(c(5,0,-2.5,2),"mm"))

hist2 <- ggplot(data=df.meltFUAC, aes(-deltaU.deltaUltrasensitivity,fill=deltaF.w_c)) +
  #geom_histogram(alpha = 0.5, aes(y=..density..),position = 'identity') +
  geom_density(alpha = 0.5, aes(colour=deltaF.w_c)) +
  scale_x_continuous(name = expression(Delta * S[ult]),limits=c(1,1e-7),
                     trans = reverselog_trans(10),breaks=c(1e-01,1e-03,1e-05),
                     labels=c("-1e-01","-1e-03","-1e-05")) +
  scale_y_continuous(name = "Density", limits=c(0,0.6)) + 
  theme_bw() +
  theme(legend.position = "none",strip.background=element_blank(),panel.border=element_rect(colour = "black"),
        axis.text.y=element_blank(), axis.ticks.y=element_blank(), axis.title.y=element_blank(),
        axis.text.x=element_blank(), axis.ticks.x=element_blank(), axis.title.x=element_blank(),
        axis.line.y=element_line(colour="black",size=1,linetype="solid"),
        axis.line.x=element_line(colour="black",size=1,linetype="solid"),
        panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank(),
        plot.margin=unit(c(5,-8,-2.5,2.5),"mm"))



grid.newpage();
pushViewport(viewport(layout = grid.layout(3, 3, widths = unit(c(4,4,2),"null"),
                                           heights=unit(c(2,7.5,0.5),"null"))));
vplayout <- function(x, y) viewport(layout.pos.row = x, layout.pos.col = y);

print(dist2, vp = vplayout(2, 1));
print(dist1, vp = vplayout(2, 2));
print(hist2, vp = vplayout(1, 1));
print(hist1, vp = vplayout(1, 2));
print(hist0, vp = vplayout(2, 3));
grid.text(expression(Delta * Ultrasensitivity),vp = vplayout(3,1:2),x = unit(0.55, "npc"), 
          y = unit(0.9, "npc"),gp=gpar(fontsize=12, col="black"));

dev.copy2pdf(file=sprintf("%s/_dist/dist_hist_deltaF_deltaU_wc_01vs10.pdf", resultDir));
dev.off();

這是它得到的圖表(但您需要手動設置圖例):

在此輸入圖像描述

或者更簡單的一個:

reverselog_trans <- function(base = exp(1)) {
  trans <- function(x) -log(x, base)
  inv <- function(x) base^(-x)
  trans_new(paste0("reverselog-", format(base)), trans, inv, 
            log_breaks(base = base), 
            domain = c(1e-100, Inf))
}

quartz();

hist1 <- ggplot(deltaF, aes(deltaFitness,fill=w_c)) + guides(fill=guide_legend(title=expression(omega[c]))) + geom_histogram(alpha = 0.5, aes(y=..density..),position = 'identity') + geom_density(alpha = 0.05, aes(colour=w_c)) + scale_x_continuous(name = expression(paste(Delta, " Fitness")),trans = "log10");
hist1 <- hist1 + scale_y_continuous(name = "Density", limits=c(0,1));
#hist1 <- hist1 + theme(panel.background=element_blank(),panel.border=element_blank(),axis.line.x=element_blank(),axis.line.y=element_line(colour="black",linetype="solid",size=1),axis.title.x=element_blank(),panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank(),plot.margin=unit(c(5,5,0,5),"mm"));
hist1 <- hist1 + theme_bw();
hist1 <- hist1 + theme(strip.background=element_blank(),panel.border=element_rect(colour = "black"),axis.title.x=element_blank(),panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank(),plot.margin=unit(c(5,5,0,5),"mm"));
hist1 <- hist1 + scale_color_discrete(name=expression(omega[c]));# + geom_vline(xintercept=0, colour="grey", size = 1);# + geom_hline(yintercept=0, colour="grey", size = 0.5);

hist2 <- ggplot(deltaU, aes(deltaUltrasensitivity,fill=w_c)) + geom_histogram(alpha = 0.5, aes(y=..density..),position = 'identity') + geom_density(alpha = 0.05, aes(colour=w_c))  + scale_x_continuous(name = expression(paste(Delta, " Ultrasensitivity")), limits=c(1e-7,1),trans = "log10",breaks=c(1e-01,1e-03,1e-05),labels=c("1e-01","1e-03","1e-05"));
hist2 <- hist2 + scale_y_continuous(name = "Density",limits=c(0,1)) ;#+ geom_vline(xintercept=0, colour="grey", size = 1);# + geom_hline(yintercept=0, colour="grey", size = 0.5);
#hist2 <- hist2 + theme(legend.position = "none", axis.title.x=element_blank(),panel.background=element_blank(),panel.border=element_blank(),panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank(),plot.margin=unit(c(0,5,0,-7.5),"mm"));
hist2 <- hist2 + theme_bw();
hist2 <- hist2 + theme(legend.position = "none", axis.title.x=element_blank(),strip.background=element_blank(),panel.border=element_rect(colour = "black"),panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank(),plot.margin=unit(c(0,5,0,-7.5),"mm"));
# + ggtitle("Positive part")

hist3 <- ggplot(deltaU, aes(-deltaUltrasensitivity,fill=w_c)) + geom_histogram(alpha = 0.5, aes(y=..density..),position = 'identity') + geom_density(alpha = 0.05, aes(colour=w_c)) + scale_x_continuous(name = expression(paste(Delta, " Ultrasensitivity")), limits=c(1,1e-7),trans = reverselog_trans(10),breaks=c(1e-01,1e-03,1e-05),labels=c("-1e-01","-1e-03","-1e-05"));
hist3 <- hist3 + scale_y_continuous(name = "Density", limits=c(0,1));# + geom_hline(yintercept=0, colour="black", size = 0.5);
#hist3 <- hist3 + theme(legend.position = "none",panel.background=element_blank(),axis.text.y=element_blank(), axis.ticks.y=element_blank(), axis.title.y=element_blank(),axis.line.y=element_line(colour="black",size=1,linetype="solid"),axis.title.x=element_blank(),panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank(),plot.margin=unit(c(0,-7.5,0,5),"mm"));
hist3 <- hist3 + theme_bw();
hist3 <- hist3 + theme(legend.position = "none",strip.background=element_blank(),panel.border=element_rect(colour = "black"),axis.text.y=element_blank(), axis.ticks.y=element_blank(), axis.title.y=element_blank(),axis.line.y=element_line(colour="black",size=1,linetype="solid"),axis.title.x=element_blank(),panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank(),plot.margin=unit(c(0,-7.5,0,5),"mm"));
# + ggtitle("Negative part")

grid.newpage();
pushViewport(viewport(layout = grid.layout(4, 2, widths = unit(c(5,5),"null"),heights=unit(c(4.6,0.4,4.6,0.4),"null"))));
vplayout <- function(x, y) viewport(layout.pos.row = x, layout.pos.col = y);
print(hist1, vp = vplayout(1, 1:2));  # key is to define vplayout
grid.text(expression(paste(Delta, " Fitness")),vp = vplayout(2,1:2),x = unit(0.5, "npc"), y = unit(0.9, "npc"),gp=gpar(fontsize=12, col="black"));
print(hist3, vp = vplayout(3, 1));
print(hist2, vp = vplayout(3, 2));
grid.text(expression(paste(Delta, " Ultrasensitivity")),vp = vplayout(4,1:2),x = unit(0.5, "npc"), y = unit(0.9, "npc"),gp=gpar(fontsize=12, col="black"));


dev.copy2pdf(file=sprintf("%s/deltaF_deltaU_wc_01vs10.pdf", resultDir));
dev.off();

這是我得到的圖表:

在此輸入圖像描述

為什么用ggplot2解決方案? 你的第一個圖是用格子histogram函數完成的,這就是你應該留下的地方。 只需在histogram函數中直接應用對數變換,使用nint參數指定直方圖區間的數量,並type參數以在“count”或“density”之間進行選擇。 我認為你得到了你需要的一切,但也許我錯過了你問題的一些重要細節......

library(lattice)
histogram(log10(x), nint=50, type="count")

在此輸入圖像描述

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM