在R中使用hist（）函数绘制的影响直方图的变量

Question

在R中，可以绘制直方图并将其属性保存到变量中：

> h1=hist(c(1,1,2,3,4,5,5), breaks=0.5:5.5)

然后可以读取这些属性：

> h1
$breaks
[1] 0.5 1.5 2.5 3.5 4.5 5.5

$counts
[1] 2 1 1 1 2

$density
[1] 0.2857143 0.1428571 0.1428571 0.1428571 0.2857143

$mids
[1] 1 2 3 4 5

$xname
[1] "c(1, 1, 2, 3, 4, 5, 5)"

$equidist
[1] TRUE

attr(,"class")
[1] "histogram"

这些属性如何影响直方图？ 到目前为止，我已经得出以下结论：

$breaks和$counts之间的关系。 $breaks代表绘制数据可能下降的间隔，而$counts代表已下降到该间隔的数据量，例如：

[]表示封闭间隔（包括端点）

（）表示开放间隔（不包括端点）

BREAKS  : COUNTS
[0.5-1.5] : 2 # There are two 1 which falls into this interval
(1.5-2.5] : 1 # There is one 2 which falls into this interval
(2.5-3.5] : 1 # There is one 3 which falls into this interval
(3.5-4.5] : 1 # There is one 4 which falls into this interval
(4.5-5.5] : 2 # There are two 5 which falls into this interval

$breaks和$density之间的关系基本上与上述相同，但以百分比表示，例如：

BREAKS  : DENSITY
[0.5-1.5] : 0.2857143 # This interval covers cca 28% of plot
(1.5-2.5] : 0.1428571 # This interval covers cca 14% of plot
(2.5-3.5] : 0.1428571 # This interval covers cca 14% of plot
(3.5-4.5] : 0.1428571 # This interval covers cca 14% of plot
(4.5-5.5] : 0.2857143 # This interval covers cca 28% of plot

当然，当您将所有这些值相加时，将得到1：

> sum(h1$density)
[1] 1

以下代表x轴名称：

$xname
[1] "c(1, 1, 2, 3, 4, 5, 5)"

但是剩余的东西做什么，尤其是$mids呢？

$mids
[1] 1 2 3 4 5

$equidist
[1] TRUE

attr(,"class")
[1] "histogram"

另外， help(hist)返回许多其他信息，是否不应该将它们也列在上面的输出中？ 如以下文章所述

默认情况下，箱计数包括小于或等于箱的右断点且严格大于箱的左断点的值，最左边的箱（包括其左断点）除外。

所以如下：

h1=hist(c(1,1,2,3,4,5,5,1.5), breaks=0.5:5.5)

将返回直方图，其中1.5将落入0.5-1.5间隔。 一种“解决方法”是将间隔大小设置为较小，例如

h1=hist(c(1,1,2,3,4,5,5,1.5), breaks=seq(0.5,5.5,0.1))

但这对我来说似乎不切实际，并且还会在$counts和$density添加一堆0，是否有更好的自动方法？

除此之外，还有一个副作用，我无法解释自己：为什么最后一个示例在摘要10中返回而不是1？

> sum(h1$density)
[1] 10
> h1$density[h1$density>0]
[1] 2.50 1.25 1.25 1.25 1.25 2.50

Answer 1

问题1 $ mids和$ equidist是什么意思：从帮助文件中：

中点：n个单元的中点。

equidist：逻辑，指示中断之间的距离是否全部相同。

Q2：是的，如果h1=hist(c(1,1,2,3,4,5,5,1.5), breaks=0.5:5.5) 1.5将落入0.5-1.5类别。 如果您希望它属于1.5-2.5类别，则应使用：

h1=hist(c(1,1,2,3,4,5,5,1.5), breaks=0.49:5.49)

或更整洁：

h1=hist(c(1,1,2,3,4,5,5,1.5), breaks=0.5:5.5, right=FALSE)

我不确定您要在这里自动化什么，但是希望以上内容可以回答您的问题。 如果没有，请让我更清楚您的问题。

Q3关于密度为10而不是1，这是因为密度不是频率。 从帮助文件：

密度：值f ^（x [i]），作为估计的密度值。 如果all（diff（breaks）== 1），则它们是相对频率计数/ n，通常满足sum [i; f ^（x [i]）（b [i + 1] -b [i]）] = 1，其中b [i] = breaks [i]。

因此，如果您的休息时间不等于1，则密度将不等于1。

在R中使用hist（）函数绘制的影响直方图的变量

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-06-21 15:17:27

在R中使用hist（）函数绘制的影响直方图的变量

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-06-21 15:17:27

解决方案1
2 已采纳 2015-06-21 15:17:27