[英]density histogram in R
I'm new to R and probability&statistics.我是 R 和概率统计的新手。 I have a question regarding the histograms...
我有一个关于直方图的问题......
hist(rbinom(10000, 10, 0.1), freq=FALSE)
it shows the histogram following diagram which is not clear to me:它显示了我不清楚的直方图:
if the y-axis is density, so the total number should be %100, am I wrong?如果 y 轴是密度,那么总数应该是 %100,我错了吗?
But in the histogram, I can see that it is bigger than %100.但在直方图中,我可以看到它大于 %100。
The area under the curve should be 1. Since your boxes appear to have width 1/2, the sum of the heights should be 2. To make this make more sense, use the breaks
parameter to hist
曲线下的面积应该是 1。因为你的盒子看起来有 1/2 的宽度,所以高度的总和应该是 2。为了让这个更有意义,使用
breaks
hist
hist(rbinom(10000, 10, 0.1), freq=FALSE, breaks = 5)
Or maybe even better或者甚至更好
hist(rbinom(10000, 10, 0.1), freq=FALSE, breaks=seq(-0.5,5.5,1))
You can integrate the density function estimated based on your sample.您可以整合根据您的样本估计的密度 function。 The answer is approximately 1, so no contradiction.
答案大约是 1,所以没有矛盾。
set.seed(444)
s <- rbinom(10000, 10, 0.1)
dens_s <- table(s)/sum(table(s))
sum(as.numeric(names(dens_s))*dens_s)
Function hist
returns a list object with all information necessary to answer the question. Function
hist
返回一个列表 object,其中包含回答问题所需的所有信息。
I will set the RNG seed to make the example reproducible.我将设置 RNG 种子以使示例可重现。
set.seed(1234)
h <- hist(rbinom(10000, 10, 0.1), freq=FALSE)
str(h)
#List of 6
# $ breaks : num [1:11] 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 ...
# $ counts : int [1:10] 3448 3930 0 1910 0 588 0 112 0 12
# $ density : num [1:10] 0.69 0.786 0 0.382 0 ...
# $ mids : num [1:10] 0.25 0.75 1.25 1.75 2.25 2.75 3.25 3.75 4.25 4.75
# $ xname : chr "rbinom(10000, 10, 0.1)"
# $ equidist: logi TRUE
# - attr(*, "class")= chr "histogram"
The relevant list members are breaks
and density
.相关的列表成员是
breaks
和density
。
breaks
is a vector of length 11, so there are 10 bins. breaks
是一个长度为 11 的向量,因此有 10 个 bin。density
is a vector of length 10, each corresponding to one of the bins. density
是一个长度为 10 的向量,每个对应于一个 bin。 Now compute the area of each bar by multiplying the bins lengths by the respective densities.现在通过将箱长度乘以各自的密度来计算每个条的面积。
diff(h$breaks) # bins lengths
# [1] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
h$density*diff(h$breaks)
# [1] 0.3448 0.3930 0.0000 0.1910 0.0000 0.0588 0.0000 0.0112 0.0000 0.0012
Total area:总面积:
sum(h$density*diff(h$breaks))
#[1] 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.