简体   繁体   English

在R中使用两个数据帧的散点图矩阵

[英]Scatterplot matrix using two dataframes in R

I want to create a scatterplot Matrix between a group of variables (not all!) in my dataframe. 我想在数据框中的一组变量(不是全部!)之间创建散点图矩阵。

A quick snapshot of my dataFrame: 我的dataFrame的快速快照:

V1    V2    V3    V4    V5    V6    V7    R1    R2
.08  .05   .93   .1    .21   .32    .21   .09  .07
.43  .12   .1   .40    .07   .98    .25   .10  .05

The two groups are V1 to V7 and R1-R2 . 这两个组是V1至V7R1-R2 So what I'm trying to achieve is a plot between V1-R1 , V1-R2 , V2-R1 ....... V7-R2 . 所以我要实现的是在V1-R1V1-R2V2-R1 ..... V7-R2之间绘制图。 I do not want to plot V1-V2 , V1-V4 etc. 我不想绘制V1-V2V1-V4

I figured an easy way to get to this would be to split my dataframe into two which would enable me to achieve my goal. 我想一个简单的方法是将数据框分为两个,这将使我能够实现自己的目标。

So I split my dataframe into two as below: 所以我将数据框分成两个如下:

dataFrame1<-dataframe[,1:7]

dataFrame2<-dataframe[,8:9]

This works well as far as getting the correlation table out from R is concerned: 就从R取出相关表而言,这很好用:

cor(dataFrame1,dataFrame2)

however the plotting bit is a bit of a challenge. 但是,绘图有点挑战。

I have thus far tried ggpairs , car and scatterplotMatrix and none of them seem to work. 我迄今试图ggpairs, 汽车scatterplotMatrix和他们都不工作。

For ggpairs using the current code as below: 对于使用当前代码的ggpair,如下所示:

ggpairs (dataFrame1, dataFrame2)

I get the following error message 我收到以下错误消息

Make sure your 'columns' values are positive. 确保您的“列”值为正。

Of course the above dataFrame is just a sample of the entire dataset and hence you cannot see any negatives in R1 and R2. 当然,上面的dataFrame只是整个数据集的一个示例,因此您无法在R1和R2中看到任何负数。

I don't want to manually do it in ggplot2 and then use glob to merge into a single plot. 我不想在ggplot2中手动执行此操作,然后使用glob合并到单个绘图中。 Also I don't want to plot the matrix for all the variables as is because that is not what I am trying to achieve. 我也不想按原样绘制所有变量的矩阵,因为那不是我想要达到的目标。

Is there another way to get to what I'm after? 还有另一种方式可以达到我的追求吗?

Thanks. 谢谢。

Here is a dplyr solution. 这是dplyr解决方案。 First subset you original df into two different data.frames; 将df原始的第一个子集转换为两个不同的data.frames; turn them into a long form, needed for ggplot; 将它们转换为ggplot所需的长格式; then merge the data.frames by rows (I added an id variable for that) and plot the result with facet_grid . 然后按行合并data.frames(为此添加了一个id变量),并使用facet_grid绘制结果。

# Simulating data
df <- data.frame(
  id = 1:100,
  V1 = rnorm(100),
  V2 = rnorm(100),
  V3 = rnorm(100),
  R1 = rnorm(100),
  R2 = rnorm(100),
  R3 = rnorm(100))

library(dplyr)
library(tidyr)

# Subset the data.frames
df1 <- select(df,id,starts_with("V"))
df2 <- select(df,id,starts_with("R"))

# Turn them both to long form and merge them
dft <- gather(df1,var,value,-id) %>% 
  left_join(gather(df2,var,value,-id),by="id")

ggplot(data = dft,aes(x = value.x,y=value.y)) +
  geom_point() +
  facet_grid(var.x~var.y)

在此处输入图片说明

On a side note, your code produces this error because ggpairs does not expect two data.frames. 附带一提,您的代码会产生此错误,因为ggpairs不需要两个data.frames。 See ?GGally::ggpairs : 参见?GGally::ggpairs

ggpairs(data, columns = 1:ncol(data) ...) ggpairs(数据,列= 1:ncol(数据)...)

The second argument should be the columns index; 第二个参数应该是列索引; you are passing a whole data.frame. 您正在传递整个data.frame。 ggpairs doesn't seem to be able to do what you want, but it would plot every variable against every other if you just passed it the whole original dataframe : ggpairs(dataframe) . ggpairs似乎无法执行您想要的操作,但是如果您只是将整个原始数据帧传递给ggpairs(dataframe) ,它将对每个变量进行绘制。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM