简体   繁体   English

R中具有对数轴的散点图矩阵

[英]Scatterplot matrix with logarithmic axes in R

I am trying to create a scatterplot matrix from my dataset so that in the resulting matrix: 我正在尝试从我的数据集创建散点图矩阵,以便在结果矩阵中:

  • I have two different groups based on 我有两个不同的小组
    • Quarter of the year (distinguished as the colours of points) 一年的四分之一(以点的颜色区分)
    • Day type (shape of points indicating, is it weekend or casual day between Monday and Friday) 日类型(指示点的形状,是星期一至星期五之间的周末还是休闲日)
  • Logarithmic-scaled x and y axes. 对数缩放的x和y轴。
  • Values on axis tick labels are not logarithmic ie values should be shown on axes as integers between 0 to 350, not their log10 counterparts. 轴刻度标签上的值不是对数的,即,轴上的值应显示为0到350之间的整数,而不是对应的log10。
  • Upper panel has correlation values for each quarter. 上面板具有每个季度的相关值。

So far I've tried using functions: 到目前为止,我已经尝试使用函数:

  1. pairs() 对()
  2. ggpairs() [from GGally package] ggpairs()[来自GGally软件包]
  3. scatterplotMatrix() scatterplotMatrix()
  4. splom() splom()

But I haven't been able to get decent results with these packages, and every time it seems that one or more of my requirements are missing. 但是我无法使用这些软件包获得令人满意的结果,而且每次似乎都缺少我的一个或多个要求时。

  • With pairs(), I'm able to create the scatterplot matrix, but the parameter log="xy" somehow removes the variable names from the diagonal of the resulting matrix. 使用pairs(),我可以创建散点图矩阵,但是参数log =“ xy”会以某种方式从结果矩阵的对角线中删除变量名称。
  • ggpairs() doesn't support logarithmic scales directly, but I created a function that goes through the scatterplot matrix's diagonal and lower plane based on this answer. ggpairs()不直接支持对数刻度,但是我根据答案创建了一个函数,该函数遍历散点图矩阵的对角线和下平面。 Though the logarithmic scaling works on lower plane, it messes up the variable labels and value ticks. 尽管对数缩放在较低平面上起作用,但它弄乱了变量标签和值刻度。

Function is created and used as follows: 函数的创建和使用方式如下:

ggpairs_logarithmize <- function(a) { # parameter a is a ggpairs sp-matrix
        max_limit <- sqrt(length(a$plots))
        for(row in 1:max_limit) { # index 1 is used to go through the diagonal also
                for(col in j:max_limit) {
                        subsp <- getPlot(a,row,col)
                        subspnew <- subsp + scale_y_log10() + scale_x_log10()
                        subspnew$type <- 'logcontinous'
                        subspnew$subType <- 'logpoints'
                        a <- putPlot(a,subspnew,row,col)
                }
        }
        return(a)
}
scatplot <- ggpairs(...)
scatplot_log10 <- ggpairs_logarithmize(scatplot)
scatplot_log10
  • scatterplotMatrix() didn't seem to support two groupings. scatterplotMatrix()似乎不支持两个分组。 I was able to do this separately for season and day type though, but I need both groups in the same plot. 虽然我可以针对季节和日期类型分别执行此操作,但是我需要两个组都在同一个情节中。
  • splom() somehow labels the axis tick values also to logarithmic values, and these should be kept as they are (between integers 0 and 350). splom()以某种方式还将轴刻度值标记为对数值,并且这些值应保持不变(在整数0和350之间)。

Are there any simple solutions available to create a scatterplot matrix with logarithmic axes with the requirements I have? 是否有任何简单的解决方案可用来创建具有对数轴且符合我的要求的散点图矩阵?

EDIT (13.7.2012): Example data and output were asked. 编辑(13.7.2012):询问示例数据和输出。 Here's some code snippets to produce a demo dataset: 以下是一些代码片段,用于生成演示数据集:

Declare necessary functions 声明必要的功能

logarithmize <- function(a)
{
        max_limit <- sqrt(length(a$plots))
        for(j in 1:max_limit) {
                for(i in j:max_limit) {
                        subsp <- getPlot(a,i,j)
                        subspnew <- subsp + scale_y_log10() + scale_x_log10()
                        subspnew$type <- 'logcontinous'
                        subspnew$subType <- 'logpoints'
                        a <- putPlot(a,subspnew,i,j)
                }
        }
        return(a)
}

add_quarters <- function(a,datecol,targetcol) {
    for(i in 1:nrow(a)) {
        month <- 1+as.POSIXlt(as.Date(a[i,datecol]))$mon
        if ( month <= 3 ) { a[i,targetcol] <- "Q1" }
        else if (month <= 6 && month > 3) { a[i,targetcol] <- "Q2" }
        else if ( month <= 9 && month > 6 ) { a[i,targetcol] <- "Q3" }
        else if ( month > 9 ) { a[i,targetcol] <- "Q4" }
    }
    return(a)
}

Create dataset: 创建数据集:

days <- seq.Date(as.Date("2010-01-01"),as.Date("2012-06-06"),"day")
bananas <- sample(1:350,length(days), replace=T)
apples <- sample(1:350,length(days), replace=T)
oranges <- sample(1:350,length(days), replace=T)
weekdays <- c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday")
fruitsales <- data.frame(Date=days,Dayofweek=rep(weekdays,length.out=length(days)),Bananas=bananas,Apples=apples,Oranges=oranges)
fruitsales[5:6,"Quarter"] <- NA
fruitsales[6:7,"Daytype"] <- NA
fruitsales$Daytype <- fruitsales$Dayofweek
levels(fruitsales$Daytype) # Confirm the day type levels before assigning new levels
levels(fruitsales$Daytype) <- c("Casual","Casual","Weekend","Weekend","Casual","Casual","Casual
")
fruitsales <- add_quarters(fruitsales,1,6)

Excecute (NOTE! Windows/Mac users, change x11() according to what OS you have) 执行(注意!Windows / Mac用户,根据所使用的操作系统更改x11())

# install.packages("GGally")
require(GGally)
x11(); ggpairs(fruitsales,columns=3:5,colour="Quarter",shape="Daytype")
x11(); logarithmize(ggpairs(fruitsales,columns=3:5,colour="Quarter",shape="Daytype"))

The problem with pairs stems from the use of user co-ordinates in a log coordinate system. pairs的问题源于在对数坐标系中使用用户坐标。 Specifically, when adding the labels on the diagonals, pairs sets 具体来说,在对角线上添加标签时, pairs设置

par(usr = c(0, 1, 0, 1))

however, if you specify a log coordinate system via log = "xy" , what you need here is 但是,如果您通过log = "xy"指定对数坐标系,则这里需要的是

par(usr = c(0, 1, 0, 1), xlog = FALSE, ylog = FALSE) 

see this post on R help . 请参阅R help上的这篇文章

This suggests the following solution (using data given in question): 这建议了以下解决方案(使用有问题的数据):

## adapted from panel.cor in ?pairs
panel.cor <- function(x, y, digits=2, cex.cor, quarter, ...)
{
  usr <- par("usr"); on.exit(par(usr))
  par(usr = c(0, 1, 0, 1), xlog = FALSE, ylog = FALSE)
  r <- rev(tapply(seq_along(quarter), quarter, function(id) cor(x[id], y[id])))
  txt <- format(c(0.123456789, r), digits=digits)[-1]
  txt <- paste(names(txt), txt)
  if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
  text(0.5, c(0.2, 0.4, 0.6, 0.8), txt)
}

pairs(fruitsales[,3:5], log = "xy", 
      diag.panel = function(x, ...) par(xlog = FALSE, ylog = FALSE),
      label.pos = 0.5,
      col = unclass(factor(fruitsales[,6])), 
      pch = unclass(fruitsales[,7]), upper.panel = panel.cor, 
      quarter = factor(fruitsales[,6]))

This produces the following plot 这将产生以下情节

对在对数坐标系上绘图

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM