简体   繁体   English

从分箱数据生成直方图和密度 plot

[英]Generating a histogram and density plot from binned data

I've binned some data and currently have a dataframe that consists of two columns, one that specifies a bin range and another that specifies the frequency like this:-我对一些数据进行了分箱,目前有一个 dataframe 由两列组成,一列指定分箱范围,另一列指定频率,如下所示:-

> head(data)
      binRange Frequency
1    (0,0.025]        88
2 (0.025,0.05]        72
3 (0.05,0.075]        92
4  (0.075,0.1]        38
5  (0.1,0.125]        20
6 (0.125,0.15]        16

I want to plot a histogram and density plot using this but I can't seem to find a way of doing so without having to generate new bins etc. Using this solution here I tried to do the following:-我想 plot 一个直方图和密度 plot 使用这个但我似乎无法找到一种方法,而不必生成新的垃圾箱等。 在这里使用这个解决方案我尝试执行以下操作: -

p <- ggplot(data, aes(x= binRange, y=Frequency)) + geom_histogram(stat="identity")

but it crashes.但它崩溃了。 Anyone know of how to deal with this?任何人都知道如何处理这个?

Thank you谢谢

the problem is that ggplot doesnt understand the data the way you input it, you need to reshape it like so (I am not a regex-master, so surely there are better ways to do is): 问题是ggplot不会像你输入它那样理解数据,你需要像这样重塑它(我不是一个正则表达式的主人,所以肯定还有更好的方法):

df <- read.table(header = TRUE, text = "
                 binRange Frequency
1    (0,0.025]        88
2 (0.025,0.05]        72
3 (0.05,0.075]        92
4  (0.075,0.1]        38
5  (0.1,0.125]        20
6 (0.125,0.15]        16")

library(stringr)
library(splitstackshape)
library(ggplot2)
# extract the numbers out,
df$binRange <- str_extract(df$binRange, "[0-9].*[0-9]+")

# split the data using the , into to columns:
# one for the start-point and one for the end-point
df <- cSplit(df, "binRange")

# plot it, you actually dont need the second column
ggplot(df, aes(x = binRange_1, y = Frequency, width = 0.025)) +
    geom_bar(stat = "identity", breaks=seq(0,0.125, by=0.025))

or if you don't want the data to be interpreted numerically, you can just simply do the following: 或者如果您不希望数字以数字方式解释,您只需执行以下操作:

df <- read.table(header = TRUE, text = "
                 binRange Frequency
1    (0,0.025]        88
2 (0.025,0.05]        72
3 (0.05,0.075]        92
4  (0.075,0.1]        38
5  (0.1,0.125]        20
6 (0.125,0.15]        16")

library(ggplot2)
ggplot(df, aes(x = binRange, y = Frequency)) + geom_bar(stat = "identity")

you won't be able to plot a density-plot with your data, given its not continous but rather categorical, thats why I actually prefer the second way of showing it, 你将无法用你的数据绘制密度图,因为它不是连续的而是相当分类的,这就是为什么我实际上更喜欢第二种显示方式,

You can try你可以试试

library(ggplot2)
ggplot(df, aes(x = binRange, y = Frequency)) + geom_col()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM