简体   繁体   English

从R中的直方图获取频率值

[英]Getting frequency values from histogram in R

I know how to draw histograms or other frequency/percentage related tables. 我知道如何绘制直方图或其他频率/百分比相关表。 But now I want to know, how can I get those frequency values in a table to use after the fact. 但是现在我想知道,如何在表格中使用这些频率值以便在事后使用。

I have a massive dataset, now I draw a histogram with a set binwidth. 我有一个庞大的数据集,现在我绘制一个设置binwidth的直方图。 I want to extract the frequency value (ie value on y-axis) that corresponds to each binwidth and save it somewhere. 我想提取对应于每个binwidth的频率值(即y轴上的值)并将其保存在某处。

Can someone please help me with this? 有人可以帮我这个吗? Thank you! 谢谢!

The hist function has a return value (an object of class histogram ): hist函数有一个返回值(类histogram的对象):

R> res <- hist(rnorm(100))
R> res
$breaks
[1] -4 -3 -2 -1  0  1  2  3  4

$counts
[1]  1  2 17 27 34 16  2  1

$intensities
[1] 0.01 0.02 0.17 0.27 0.34 0.16 0.02 0.01

$density
[1] 0.01 0.02 0.17 0.27 0.34 0.16 0.02 0.01

$mids
[1] -3.5 -2.5 -1.5 -0.5  0.5  1.5  2.5  3.5

$xname
[1] "rnorm(100)"

$equidist
[1] TRUE

attr(,"class")
[1] "histogram"

From ?hist : Value 来自?hist :Value

an object of class "histogram" which is a list with components: “histogram”类的对象,它是一个包含组件的列表:

  • breaks the n+1 cell boundaries (= breaks if that was a vector). 打破n + 1个单元格边界(=如果是向量则断开)。 These are the nominal breaks, not with the boundary fuzz. 这些是名义上的中断,而不是边界模糊。
  • counts n integers; 计数n个整数; for each cell, the number of x[] inside. 对于每个单元格,x []内部的数量。
  • density values f^(x[i]), as estimated density values. 密度值f ^(x [i]),作为估计的密度值。 If all(diff(breaks) == 1), they are the relative frequencies counts/n and in general satisfy sum[i; 如果全部(diff(break)== 1),则它们是相对频率count / n并且通常满足sum [i; f^(x[i]) (b[i+1]-b[i])] = 1, where b[i] = breaks[i]. f ^(x [i])(b [i + 1] -b [i])] = 1,其中b [i] = break [i]。
  • intensities same as density. 强度与密度相同。 Deprecated, but retained for compatibility. 已弃用,但保留兼容性。
  • mids the n cell midpoints. 中等n个细胞中点。
  • xname a character string with the actual x argument name. xname是具有实际x参数名称的字符串。
  • equidist logical, indicating if the distances between breaks are all the same. equidist logical,表示中断之间的距离是否相同。

breaks and density provide just about all you need: breaksdensity提供您所需要的一切:

histrv<-hist(x)
histrv$breaks
histrv$density

Just in case someone hits this question with ggplot 's geom_histogram in mind, note that there is a way to extract the data from a ggplot object. 为了防止有人在考虑ggplotgeom_histogram遇到这个问题,请注意有一种方法可以从ggplot对象中提取数据。

The following convenience function outputs a dataframe with the lower limit of each bin ( xmin ), the upper limit of each bin ( xmax ), the mid-point of each bin ( x ), as well as the frequency value ( y ). 以下便利功能输出具有每个箱的下限( xmin ),每个箱的上限( xmax ),每个箱的中点( x )以及频率值( y )的数据帧。

## Convenience function
get_hist <- function(p) {
    d <- ggplot_build(p)$data[[1]]
    data.frame(x = d$x, xmin = d$xmin, xmax = d$xmax, y = d$y)
}

# make a dataframe for ggplot
set.seed(1)
x = runif(100, 0, 10)
y = cumsum(x)
df <- data.frame(x = sort(x), y = y)

# make geom_histogram 
p <- ggplot(data = df, aes(x = x)) + 
    geom_histogram(aes(y = cumsum(..count..)), binwidth = 1, boundary = 0,
                color = "black", fill = "white")

Illustration: 插图:

hist = get_hist(p)
head(hist$x)
## [1] 0.5 1.5 2.5 3.5 4.5 5.5
head(hist$y)
## [1]  7 13 24 38 52 57
head(hist$xmax)
## [1] 1 2 3 4 5 6
head(hist$xmin)
## [1] 0 1 2 3 4 5

A related question I answered here ( Cumulative histogram with ggplot2 ). 我在这里回答的一个相关问题( 累积直方图与ggplot2 )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM