简体   繁体   English

R 中的密度直方图

[英]density histogram in R

I'm new to R and probability&statistics.我是 R 和概率统计的新手。 I have a question regarding the histograms...我有一个关于直方图的问题......

hist(rbinom(10000, 10, 0.1), freq=FALSE)

it shows the histogram following diagram which is not clear to me:它显示了我不清楚的直方图:

在此处输入图像描述

if the y-axis is density, so the total number should be %100, am I wrong?如果 y 轴是密度,那么总数应该是 %100,我错了吗?
But in the histogram, I can see that it is bigger than %100.但在直方图中,我可以看到它大于 %100。

The area under the curve should be 1. Since your boxes appear to have width 1/2, the sum of the heights should be 2. To make this make more sense, use the breaks parameter to hist曲线下的面积应该是 1。因为你的盒子看起来有 1/2 的宽度,所以高度的总和应该是 2。为了让这个更有意义,使用breaks hist

hist(rbinom(10000, 10, 0.1), freq=FALSE, breaks = 5)

Or maybe even better或者甚至更好

hist(rbinom(10000, 10, 0.1), freq=FALSE, breaks=seq(-0.5,5.5,1))

直方图

You can integrate the density function estimated based on your sample.您可以整合根据您的样本估计的密度 function。 The answer is approximately 1, so no contradiction.答案大约是 1,所以没有矛盾。

set.seed(444)

s <- rbinom(10000, 10, 0.1)
dens_s <- table(s)/sum(table(s))
sum(as.numeric(names(dens_s))*dens_s)

Function hist returns a list object with all information necessary to answer the question. Function hist返回一个列表 object,其中包含回答问题所需的所有信息。

I will set the RNG seed to make the example reproducible.我将设置 RNG 种子以使示例可重现。

set.seed(1234)
h <- hist(rbinom(10000, 10, 0.1), freq=FALSE)

str(h)
#List of 6
# $ breaks  : num [1:11] 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 ...
# $ counts  : int [1:10] 3448 3930 0 1910 0 588 0 112 0 12
# $ density : num [1:10] 0.69 0.786 0 0.382 0 ...
# $ mids    : num [1:10] 0.25 0.75 1.25 1.75 2.25 2.75 3.25 3.75 4.25 4.75
# $ xname   : chr "rbinom(10000, 10, 0.1)"
# $ equidist: logi TRUE
# - attr(*, "class")= chr "histogram"

The relevant list members are breaks and density .相关的列表成员是breaksdensity

  1. breaks is a vector of length 11, so there are 10 bins. breaks是一个长度为 11 的向量,因此有 10 个 bin。
  2. density is a vector of length 10, each corresponding to one of the bins. density是一个长度为 10 的向量,每个对应于一个 bin。

Now compute the area of each bar by multiplying the bins lengths by the respective densities.现在通过将箱长度乘以各自的密度来计算每个条的面积。

diff(h$breaks)    # bins lengths
# [1] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
h$density*diff(h$breaks)
# [1] 0.3448 0.3930 0.0000 0.1910 0.0000 0.0588 0.0000 0.0112 0.0000 0.0012

Total area:总面积:

sum(h$density*diff(h$breaks))
#[1] 1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM