简体   繁体   中英

Scatter plot matrices using pairs() in R

I'm new to R and working on some code that outputs a scatter plot matrix. The data frame is in the following format:

A B C D
2 3 0 5
8 9 5 4
0 0 5 3
7 0 0 0

My data sets can run into the 100-1000s of rows and 10-100s of columns, with a wide scale of values (hence log transforming my data).

This bit of code gives me some partial success in enhancing the basic plot (see embedded image):

panel.cor <- function(x, y, digits = 2, prefix = "", cex.cor, ...)
{
  usr <- par("usr"); on.exit(par(usr))
  par(usr = c(0, 1, 0, 1), xlog = FALSE, ylog = FALSE)
  r <- abs(cor(x, y))
  txt <- format(c(r, 0.123456789), digits = digits)[1]
  txt <- paste(prefix, txt)
  if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
  text(0.5, 0.5, txt, cex = cex.cor * r)
}

# Add regression line to plots.

my_line <- function(x,y,...){
  points(x,y,...)
  LR <- lm(log(x) ~ log(y), data = SP)
  abline(LR, col = "red", untf = TRUE)
}

# Plot scatter plot matrices.

pairs(mydataframe, pch = 20, main = "test",
      cex = 0.125, cex.labels = 1,
      xlim = c(100, 1e9),
      ylim = c(100, 1e9),
      upper.panel = panel.cor,
      lower.panel = my_line,
      log = "xy")'

example

Problem 1 - instead of getting R^2 values in the upper panel, I get NAs instead. How can I correct this?
Problem 2 - I'd like to remove the function for adjusting text size of R^2 value in proportion to correlation. I know it's in panel.cor but not sure which part will need removal or adjustment.

Many thanks in advance

EDIT: 08/06/2016
I have found a work around which also simplifies the code:

panel.cor <- function(x, y, digits = 2, cex.cor, ...)
{
  usr <- par("usr"); on.exit(par(usr))
  par(usr = c(0, 1, 0, 1))
  # correlation coefficient
  r <- cor(x, y)
  txt <- format(c(r, 0.123456789), digits = digits)[1]
  txt <- paste("r= ", txt, sep = "")
  text(0.5, 0.6, txt)
}

# add regression line to plots.

my_line <- function(x,y,...)
{
  points(x,y,...)
  LR <- lm(x ~ y, data = SP)
  abline(LR, col = "red", untf = TRUE)
}

# Plot scatterplot matrices.

pairs(SP, pch = 20, main = "test",
      cex = 0.125, cex.labels = 1,
      upper.panel = panel.cor,
      lower.panel = my_line)

example 2

The issue appears to be missing values ie 0's. I change these to NA's initially so I can use a log scale. This in combination with log transformation leads to missing R^2 values in the upper panel.

Ideally I'd like to have a log scale. Is there a way i can do this without introducing the aformentioned issue?

Clarification - I'd like a log (xy) scale in the scatter plots (lower panel) and for x-axis in the histograms (diagonal panel). I've been playing about with it today but can't quite get it as i want. Perhaps i'm asking too much from pairs. Any help would be appreciated.

Edit: 10/06/2016

Success!....well approximately 99% happy.

I have made changes - added histograms to diagonal panel and p-value to upper panel (the base code in "pairs()" for adding the histogram needed adjustment due to the log scale used on the x-axis). Please feel free to correct my descriptions if they're not accurate or correct:

library(lattice)
DF <- read.csv("File location", header = TRUE)
DF.1 <- DF+1 # Added small epsilon to data frame otherwise plot errors arise due to missing values.

# Function to calculate R^2 & p-value for upper panels in pairs() - scatterplot matrices.

panel.cor <- function(x, y, digits = 3, cex.cor, ...)
{
  usr <- par("usr"); on.exit(par(usr))
  par(usr = c(0, 1, 0, 1), xlog = FALSE, ylog = FALSE) # xlog/ylog: ensures that R^2 and p-values display in upper panel.
  # Calculate correlation coefficient and add to diagonal plot.
  r <- cor(x, y)
  txt <- format(c(r, 0.123456789), digits = digits)[1]
  txt <- paste("r= ", txt, sep = "")
  text(0.5, 0.7, txt, cex = 1.25) # First 2 arguments determine postion of R^2-value in upper panel cells.

  # Calculate P-value and add to diagonal plot.
  p <- cor.test(x, y)$p.value
  txt2 <- format(c(p, 0.123456789), digits = digits)[1]
  txt2 <- paste("p= ", txt2, sep = "")
  if(p<0.01) txt2 <- paste("p= ", "<0.01", sep = "")
  text(0.5, 0.3, txt2, cex = 1.25) # First 2 arguments determine postion of p-value in upper panel cells.
}

# Function to calculate frequency distribution and plot histogram in diagonal plot.

panel.hist <- function(x, ...)
{
  usr <- par("usr"); on.exit(par(usr))
  par(usr = c(0.5, 1.5, 0, 1.75), xlog = TRUE, ylog = FALSE) # xlog argument allows log x-axis when called in pairs.
  h <- hist(log(x), plot = FALSE, breaks = 20)
  breaks <- h$breaks; nB <- length(breaks)
  y <- h$counts; y <- y/max(y)
  rect(breaks[-nB], 0, breaks[-1], y, col = "cyan")
}

# add regression line to plots.

my_line <- function(x,y, ...)
{
  points(x,y,...)
  LR <- lm(log(x) ~ log(y), data = DF.1)
  abline(LR, col = "red", untf = TRUE)
}

# Plot scatterplot matrices.

pairs(DF.1, pch = 20, main = "Chart Title",
      cex = 0.75, cex.labels = 1.5, label.pos = 0.0001,
      upper.panel = panel.cor,
      lower.panel = my_line,
      diag.panel = panel.hist,
      log = ("xy"),
      xlim = c(5, 1e9),
      ylim = c(5, 1e9))

The fly in the ointment:

1 - the text labels in the diagonal panel only partially appear. I used a decreasing value for the "label.pos" argument in "pairs()" which moved the label down until they appeared. However, they won't move anymore no matter how much I decrease that value. I've tried to coerce the position from the histogram function, but that doesn't work. I hope someone can see what I'm missing. Thanks in advance...I've not had any responses yet:(

PS: I tried to link 3rd image with my successful plot but I was foiled by my lack of reputation...groan.

EDIT: 13/06/2016

Solved! I feel a bit foolish. The fix for the positioning of the main title in the diagonal panel was super simple and I spent a long time trying much more complex ways to do this. The "label.pos" argument in pairs should be negative! I used a small value of -0.0675 which placed it near the top of the cell containing the histogram.

I hope someone else finds this useful. I'll mark as solved but I'd appreciate any comments regarding my code commenting or if someone sees a way of making the code more efficient. Thanks Alex

Sometimes i feel totally dense. Answer my own question...who would have thought...slaps head. Please see edits in my post for the fixes I found.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM