簡體   English   中英

使用R或Matlab的雙變量分布的3D圖

[英]3D plot of bivariate distribution using R or Matlab

我想知道是否有人可以告訴我您如何繪制與此類似的內容 在此處輸入圖片說明 樣本的直方圖由兩條曲線下面的代碼生成。 使用R或Matlab,但最好使用R。

# bivariate normal with a gibbs sampler...

gibbs<-function (n, rho) 
{
  mat <- matrix(ncol = 2, nrow = n)
  x <- 0
  y <- 0
  mat[1, ] <- c(x, y)
  for (i in 2:n) {
    x <- rnorm(1, rho * y, (1 - rho^2))
    y <- rnorm(1, rho * x,(1 - rho^2))
    mat[i, ] <- c(x, y)
  }
  mat
}



bvn<-gibbs(10000,0.98)
par(mfrow=c(3,2))
plot(bvn,col=1:10000,main="bivariate normal distribution",xlab="X",ylab="Y")
plot(bvn,type="l",main="bivariate normal distribution",xlab="X",ylab="Y")

hist(bvn[,1],40,main="bivariate normal distribution",xlab="X",ylab="")
hist(bvn[,2],40,main="bivariate normal distribution",xlab="Y",ylab="")
par(mfrow=c(1,1))`

提前致謝

最好的祝福,

JC T.

您可以在Matlab中以編程方式進行操作。

結果如下:

Matlab圖

碼:

% Generate some data.
data = randn(10000, 2);

% Scale and rotate the data (for demonstration purposes).
data(:,1) = data(:,1) * 2;
theta = deg2rad(130);
data = ([cos(theta) -sin(theta); sin(theta) cos(theta)] * data')';

% Get some info.
m = mean(data);
s = std(data);
axisMin = m - 4 * s;
axisMax = m + 4 * s;

% Plot data points on (X=data(x), Y=data(y), Z=0)
plot3(data(:,1), data(:,2), zeros(size(data,1),1), 'k.', 'MarkerSize', 1);

% Turn on hold to allow subsequent plots.
hold on

% Plot the ellipse using Eigenvectors and Eigenvalues.
data_zeroMean = bsxfun(@minus, data, m);
[V,D] = eig(data_zeroMean' * data_zeroMean / (size(data_zeroMean, 1)));
[D, order] = sort(diag(D), 'descend');
D = diag(D);
V = V(:, order);
V = V * sqrt(D);
t = linspace(0, 2 * pi);
e = bsxfun(@plus, 2*V * [cos(t); sin(t)], m');
plot3(...
    e(1,:), e(2,:), ...
    zeros(1, nPointsEllipse), 'g-', 'LineWidth', 2);

maxP = 0;
for side = 1:2
    % Calculate the histogram.
    p = [0 hist(data(:,side), 20) 0];
    p = p / sum(p);
    maxP = max([maxP p]);
    dx = (axisMax(side) - axisMin(side)) / numel(p) / 2.3;
    p2 = [zeros(1,numel(p)); p; p; zeros(1,numel(p))]; p2 = p2(:);
    x = linspace(axisMin(side), axisMax(side), numel(p));
    x2 = [x-dx; x-dx; x+dx; x+dx]; x2 = max(min(x2(:), axisMax(side)), axisMin(side));

    % Calculate the curve.
    nPtsCurve = numel(p) * 10;
    xx = linspace(axisMin(side), axisMax(side), nPtsCurve);

    % Plot the curve and the histogram.
    if side == 1
        plot3(xx, ones(1, nPtsCurve) * axisMax(3 - side), spline(x,p,xx), 'r-', 'LineWidth', 2);
        plot3(x2, ones(numel(p2), 1) * axisMax(3 - side), p2, 'k-', 'LineWidth', 1);
    else
        plot3(ones(1, nPtsCurve) * axisMax(3 - side), xx, spline(x,p,xx), 'b-', 'LineWidth', 2);
        plot3(ones(numel(p2), 1) * axisMax(3 - side), x2, p2, 'k-', 'LineWidth', 1);
    end

end

% Turn off hold.
hold off

% Axis labels.
xlabel('x');
ylabel('y');
zlabel('p(.)');

axis([axisMin(1) axisMax(1) axisMin(2) axisMax(2) 0 maxP * 1.05]);
grid on;

我必須承認,這是一個挑戰,因為我正在尋找顯示其他數據集的不同方法。 我通常按​​照其他答案中所示的scatterhist 2D圖的方式進行了一些操作,但是我想嘗試一下rgl一段時間。

我用你的函數來生成數據

gibbs<-function (n, rho) {
    mat <- matrix(ncol = 2, nrow = n)
    x <- 0
    y <- 0
    mat[1, ] <- c(x, y)
    for (i in 2:n) {
        x <- rnorm(1, rho * y, (1 - rho^2))
        y <- rnorm(1, rho * x, (1 - rho^2))
        mat[i, ] <- c(x, y)
    }
    mat
}
bvn <- gibbs(10000, 0.98)

設定

我使用rgl進行艱苦的工作,但是我不知道如何在不car情況下獲得置信度。 我猜還有其他方法可以解決這個問題。

library(rgl) # plot3d, quads3d, lines3d, grid3d, par3d, axes3d, box3d, mtext3d
library(car) # dataEllipse

處理數據

在不繪制直方圖數據的情況下,我將其提取出來並將其歸一化為概率。 *max變量是為了簡化將來的繪圖。

hx <- hist(bvn[,2], plot=FALSE)
hxs <- hx$density / sum(hx$density)
hy <- hist(bvn[,1], plot=FALSE)
hys <- hy$density / sum(hy$density)

## [xy]max: so that there's no overlap in the adjoining corner
xmax <- tail(hx$breaks, n=1) + diff(tail(hx$breaks, n=2))
ymax <- tail(hy$breaks, n=1) + diff(tail(hy$breaks, n=2))
zmax <- max(hxs, hys)

地板上的基本散點圖

標度應根據分布設置為適當的值。 可以肯定的是,X和Y標簽的位置並不是很漂亮,但是根據數據進行重新定位應該不難。

## the base scatterplot
plot3d(bvn[,2], bvn[,1], 0, zlim=c(0, zmax), pch='.',
       xlab='X', ylab='Y', zlab='', axes=FALSE)
par3d(scale=c(1,1,3))

后牆上的直方圖

我不知道如何在整個3D渲染中自動將它們繪制在平面上,因此我必須手動制作每個矩形。

## manually create each histogram
for (ii in seq_along(hx$counts)) {
    quads3d(hx$breaks[ii]*c(.9,.9,.1,.1) + hx$breaks[ii+1]*c(.1,.1,.9,.9),
            rep(ymax, 4),
            hxs[ii]*c(0,1,1,0), color='gray80')
}
for (ii in seq_along(hy$counts)) {
    quads3d(rep(xmax, 4),
            hy$breaks[ii]*c(.9,.9,.1,.1) + hy$breaks[ii+1]*c(.1,.1,.9,.9),
            hys[ii]*c(0,1,1,0), color='gray80')
}

摘要行

## I use these to ensure the lines are plotted "in front of" the
## respective dot/hist
bb <- par3d('bbox')
inset <- 0.02 # percent off of the floor/wall for lines
x1 <- bb[1] + (1-inset)*diff(bb[1:2])
y1 <- bb[3] + (1-inset)*diff(bb[3:4])
z1 <- bb[5] + inset*diff(bb[5:6])

## even with draw=FALSE, dataEllipse still pops up a dev, so I create
## a dummy dev and destroy it ... better way to do this?
dev.new()
de <- dataEllipse(bvn[,1], bvn[,2], draw=FALSE, levels=0.95)
dev.off()

## the ellipse
lines3d(de[,2], de[,1], z1, color='green', lwd=3)

## the two density curves, probability-style
denx <- density(bvn[,2])
lines3d(denx$x, rep(y1, length(denx$x)), denx$y / sum(hx$density), col='red', lwd=3)
deny <- density(bvn[,1])
lines3d(rep(x1, length(deny$x)), deny$x, deny$y / sum(hy$density), col='blue', lwd=3)

美化

grid3d(c('x+', 'y+', 'z-'), n=10)
box3d()
axes3d(edges=c('x-', 'y-', 'z+'))
outset <- 1.2 # place text outside of bbox *this* percentage
mtext3d('P(X)', edge='x+', pos=c(0, ymax, outset * zmax))
mtext3d('P(Y)', edge='y+', pos=c(xmax, 0, outset * zmax))

最終產品

使用rgl一個好處是您可以用鼠標旋轉它並找到最佳視角。 缺少為該SO頁面制作動畫的方法,請執行上述所有操作,以使您有足夠的播放時間。 (如果旋轉它,您將能夠看到這些線稍微位於直方圖的前面,略高於散點圖;否則我發現了相交,因此在某些地方看起來不連續。)

3D二元散點圖/歷史

最后,我發現這有點令人分心(2D變體就足夠了):顯示z軸表示數據存在第三維; Tufte特別阻止這種行為(Tufte,“ Envisioning Information”,1990)。 但是,隨着尺寸的增加,這種使用RGL的技術將使人們對圖案有深刻的認識。

(記錄下來,Win7 x64已在32位和64位R-3.0.3,rgl v0.93.996,car v2.0-19中進行了測試。)

使用bvn <- as.data.frame(gibbs(10000,0.98))創建數據bvn <- as.data.frame(gibbs(10000,0.98)) R幾個2d解決方案:


1:使用psych軟件包快速psych骯臟的解決方案:

library(psych)
scatter.hist(x=bvn$V1, y=bvn$V2, density=TRUE, ellipse=TRUE)

結果是:

在此處輸入圖片說明


2: ggplot2一個不錯的解決方案:

library(ggplot2)
library(gridExtra)
library(devtools)
source_url("https://raw.github.com/low-decarie/FAAV/master/r/stat-ellipse.R") # needed to create the 95% confidence ellipse

htop <- ggplot(data=bvn, aes(x=V1)) + 
  geom_histogram(aes(y=..density..), fill = "white", color = "black", binwidth = 2) + 
  stat_density(colour = "blue", geom="line", size = 1.5, position="identity", show_guide=FALSE) +
  scale_x_continuous("V1", limits = c(-40,40), breaks = c(-40,-20,0,20,40)) + 
  scale_y_continuous("Count", breaks=c(0.0,0.01,0.02,0.03,0.04), labels=c(0,100,200,300,400)) + 
  theme_bw() + theme(axis.title.x = element_blank())

blank <- ggplot() + geom_point(aes(1,1), colour="white") +
  theme(axis.ticks=element_blank(), panel.background=element_blank(), panel.grid=element_blank(),
        axis.text.x=element_blank(), axis.text.y=element_blank(), axis.title.x=element_blank(), axis.title.y=element_blank())

scatter <- ggplot(data=bvn, aes(x=V1, y=V2)) + 
  geom_point(size = 0.6) + stat_ellipse(level = 0.95, size = 1, color="green") +
  scale_x_continuous("label V1", limits = c(-40,40), breaks = c(-40,-20,0,20,40)) + 
  scale_y_continuous("label V2", limits = c(-20,20), breaks = c(-20,-10,0,10,20)) + 
  theme_bw()

hright <- ggplot(data=bvn, aes(x=V2)) + 
  geom_histogram(aes(y=..density..), fill = "white", color = "black", binwidth = 1) + 
  stat_density(colour = "red", geom="line", size = 1, position="identity", show_guide=FALSE) +
  scale_x_continuous("V2", limits = c(-20,20), breaks = c(-20,-10,0,10,20)) + 
  scale_y_continuous("Count", breaks=c(0.0,0.02,0.04,0.06,0.08), labels=c(0,200,400,600,800)) + 
  coord_flip() + theme_bw() + theme(axis.title.y = element_blank())

grid.arrange(htop, blank, scatter, hright, ncol=2, nrow=2, widths=c(4, 1), heights=c(1, 4))

結果是:

在此處輸入圖片說明


3:使用ggplot2的緊湊型解決方案:

library(ggplot2)
library(devtools)
source_url("https://raw.github.com/low-decarie/FAAV/master/r/stat-ellipse.R") # needed to create the 95% confidence ellipse

ggplot(data=bvn, aes(x=V1, y=V2)) + 
  geom_point(size = 0.6) + 
  geom_rug(sides="t", size=0.05, col=rgb(.8,0,0,alpha=.3)) + 
  geom_rug(sides="r", size=0.05, col=rgb(0,0,.8,alpha=.3)) + 
  stat_ellipse(level = 0.95, size = 1, color="green") +
  scale_x_continuous("label V1", limits = c(-40,40), breaks = c(-40,-20,0,20,40)) + 
  scale_y_continuous("label V2", limits = c(-20,20), breaks = c(-20,-10,0,10,20)) + 
  theme_bw()

結果是:

在此處輸入圖片說明

Matlab的實現稱為scatterhist ,需要統計工具箱 不幸的是,它不是3D,而是擴展的2D圖。

% some example data
x = randn(1000,1);
y = randn(1000,1);

h = scatterhist(x,y,'Location','SouthEast',...
                'Direction','out',...
                'Color','k',...
                'Marker','o',...
                'MarkerSize',4);

legend('data')
legend boxoff
grid on

在此處輸入圖片說明

它還允許對數據集進行分組:

load fisheriris.mat;
x = meas(:,1);        %// x-data
y = meas(:,2);        %// y-data
gnames = species;     %// assigning of names to certain elements of x and y


scatterhist(x,y,'Group',gnames,'Location','SouthEast',...
            'Direction','out',...
            'Color','kbr',...
            'LineStyle',{'-','-.',':'},...
            'LineWidth',[2,2,2],...
            'Marker','+od',...
            'MarkerSize',[4,5,6]);

在此處輸入圖片說明

R實施

加載庫“汽車”。 我們僅使用dataEllipse函數根據數據百分比繪制橢圓(0.95表示95%的數據落在橢圓內)。

library("car")

gibbs<-function (n, rho) 
 {
   mat <- matrix(ncol = 2, nrow = n)
   x <- 0
   y <- 0
   mat[1, ] <- c(x, y)
   for (i in 2:n) {
   x <- rnorm(1, rho * y, (1 - rho^2))
   y <- rnorm(1, rho * x,(1 - rho^2))
   mat[i, ] <- c(x, y)
   }
   mat
 }

bvn<-gibbs(10000,0.98)

打開PDF設備:

OUTFILE <- "bivar_dist.pdf"

pdf(OUTFILE)

首先設置布局

layout(matrix(c(2,0,1,3),2,2,byrow=TRUE), widths=c(3,1), heights=c(1,3), TRUE)

制作散點圖

par(mar=c(5.1,4.1,0.1,0))

注釋行可用於繪制散點圖,而無需使用dataEllipse函數的“ car”包

# plot(bvn[,2], bvn[,1], 
#      pch=".",cex = 1, col=1:length(bvn[,2]),
#      xlim=c(-0.6, 0.6),
#      ylim=c(-0.6,0.6),
#      xlab="X",
#      ylab="Y")
# 
# grid(NULL, NULL, lwd = 2)


dataEllipse(bvn[,2], bvn[,1],
        levels = c(0.95),
        pch=".",
        col=1:length(bvn[,2]),
        xlim=c(-0.6, 0.6),
        ylim=c(-0.6,0.6),
        xlab="X",
        ylab="Y",
        center.cex = 1
        )

第一行中的X變量的直方圖

     par(mar=c(0,4.1,3,0))

     hist(bvn[,2],
          ann=FALSE,axes=FALSE,
          col="light blue",border="black",
          )
     title(main = "Bivariate Normal Distribution")

散點圖右側的Y變量的直方圖

     yhist <- hist(bvn[,1],
                   plot=FALSE
                  )

     par(mar=c(5.1,0,0.1,1))

     barplot(yhist$density,
             horiz=TRUE,
             space=0,
             axes=FALSE,
             col="light blue",
             border="black"
             )

 dev.off(which = dev.cur())

圖像輸出低於

在橢圓中選擇50%和95%的數據

      dataEllipse(bvn[,2], bvn[,1],
                  levels = c(0.5, 0.95),
                  pch=".",
                  col= 1:length(bvn[,2]),
                  xlim=c(-0.6, 0.6),
                  ylim=c(-0.6,0.6),
                  xlab="X",
                  ylab="Y",
                  center.cex = 1
                 )

我在上面使用了@jaap的代碼,並將其轉換為稍微更通用的函數。 該代碼可在此處獲取 注意:我沒有在@jaap的代碼中添加任何新內容,只是做了一些小的更改並將其包裝在函數中。 希望它會有所幫助。

density.hist <- function(df, x=NULL, y=NULL) {

require(ggplot2)
require(gridExtra)
require(devtools)

htop <- ggplot(data=df, aes_string(x=x)) + 
  geom_histogram(aes(y=..density..), fill = "white", color = "black", bins=100) + 
  stat_density(colour = "blue", geom="line", size = 1, position="identity", show.legend=FALSE) +
  theme_bw() + theme(axis.title.x = element_blank())

blank <- ggplot() + geom_point(aes(1,1), colour="white") +
  theme(axis.ticks=element_blank(), panel.background=element_blank(), panel.grid=element_blank(),
  axis.text.x=element_blank(), axis.text.y=element_blank(), axis.title.x=element_blank(), 
  axis.title.y=element_blank())

scatter <- ggplot(data=df, aes_string(x=x, y=y)) + 
  geom_point(size = 0.6) + stat_ellipse(type = "norm", linetype = 2, color="green",size=1) +
  stat_ellipse(type = "t",color="green",size=1) +
  theme_bw() + labs(x=x, y=y)

hright <- ggplot(data=df, aes_string(x=x)) + 
  geom_histogram(aes(y=..density..), fill = "white", color = "black", bins=100) + 
  stat_density(colour = "red", geom="line", size = 1, position="identity", show.legend=FALSE) +
  coord_flip() + theme_bw() + theme(axis.title.y = element_blank())

grid.arrange(htop, blank, scatter, hright, ncol=2, nrow=2, widths=c(4, 1), heights=c(1, 4))

}

scatter.hist函數的輸出

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM