简体   繁体   English

要在 ggplot2 中绘制的平均 3D 数据

[英]Average 3D data to plot in ggplot2

Let's say I have 3 vectors:假设我有 3 个向量:

x=round(runif(1000,1,65))
y=round(runif(1000,1,65))
z=runif(1000,0,1)

These are stored within a Dataframe df as columns.它们作为列存储在 Dataframe df 中。 X and Y are integers, and I am searching for a solution specific to integers as well as an extended solution for doubles. X 和 Y 是整数,我正在寻找特定于整数的解决方案以及双打的扩展解决方案。

I can make a 2D histogram of this dataset, but I only get the count of x,y in each 2D bin.我可以制作这个数据集的 2D 直方图,但我只能得到每个 2D bin 中 x,y 的计数。

ggplot(df,aes(x=x,y=y)) + geom_bin2d() + theme_bw()

I tried with geom_tile as well, but it actually overlays the heatmaps on top of each others.我也尝试过 geom_tile,但它实际上将热图叠加在彼此之上。 The behavior would be correct if I manually average the dataset beforehand.如果我事先手动平均数据集,则行为将是正确的。 Yet I would like a solution that either does this elegantly or directly.然而,我想要一个优雅或直接的解决方案。 I think of a graph where I see the average/median/user-defined as color in a 2D layout.我想到了一个图表,在该图表中,我将平均/中值/用户定义为 2D 布局中的颜色。

The solution uses preferably ggplot2解决方案最好使用 ggplot2

The argument fill in aes will allow you to define the colour, when linked to stat = 'identity' in the geom_bin2d call (which takes the z value to define the fill):当在geom_bin2d调用中链接到stat = 'identity'时, aes的参数fill将允许您定义颜色(它采用 z 值来定义填充):

ggplot(df, aes(x, y, fill = z)) + geom_bin2d(stat = 'identity')

在此处输入图片说明

EDIT :编辑

I see that you were asking it to combine the mean/median or some form of calculation for each of the 65x65 squares.我看到您要求它为每个 65x65 正方形组合平均值/中位数或某种形式的计算。 Not automatically done, but perhaps with dplyr this would be a potential solution:不会自动完成,但也许使用dplyr这将是一个潜在的解决方案:

library(ggplot2)
library(dplyr)

df <- tibble(x=round(runif(1000,1,65)),
             y=round(runif(1000,1,65)),
             z=runif(1000,0,1))

df %>% 
  group_by(x, y) %>%               ## These two lines make a new value from z,
  summarise(fill = mean(z)) %>%       ## as a calculation from combos of x and y
  ggplot(aes(x, y, fill = fill)) + 
  geom_bin2d(stat = 'identity')

EDIT 2 :编辑 2

A further question below about a) using a continuous variable and b) adjusting the number of bins:下面关于 a) 使用连续变量和 b) 调整 bin 数量的另一个问题:

bins <- 30

df %>% 
  mutate(x1 = as.numeric(cut(x, bins)),
         y1 = as.numeric(cut(y, bins))) %>% 
  group_by(x1, y1) %>%
  summarise(fill = mean(z)) %>%
  ggplot(aes(x1, y1, fill = fill)) + 
  geom_bin2d(stat = 'identity') +
  scale_x_continuous(breaks = c(1,bins), labels = c(1, max(df$x)))+
  scale_y_continuous(breaks = c(1,bins), labels = c(1, max(df$y)))

This produces a graph with a variable number of bins.这将生成一个具有可变数量 bin 的图形。 The labels up the axes are hardest to reproduce here though, at the moment they're just set to label the top and bottom values.轴上的标签在这里最难重现,目前它们只是设置为标记顶部和底部值。 Take off the bottom two lines and you'll get it labelled by bin number (1-30) at least.去掉底部的两行,你至少会得到箱号(1-30)的标签。

箱数 30

This will calculate the average of z for all pairs of x/y coordinates and plot on color scale:这将计算所有x/y坐标对的z平均值并在色标上绘制:

df = data.frame(x, y, z)

library(dplyr)
library(ggplot2)

df %>% group_by(x, y) %>% summarize(mean_z = mean(z)) %>% 
   ggplot(aes(x = x, y = y, fill = mean_z)) + geom_bin2d(stat = "identity")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM