简体   繁体   English

这种图是什么类型,以及如何在R中绘制它?

[英]What is this type of graph, and how do you draw it in R?

http://imgur.com/IfVyu6f

I thought it would be something called cumulative and found cumulative frequency graph and cumulative flow diagram. 我认为这将是所谓的“累积并发现累积频率图和累积流程图”。 However, I don't think neither is the graph in the image because cumulative graphs start from 0, but my variables do not. 但是,我认为图像中的图形也不是因为累积图形从0开始,但是我的变量却不是。 Also, density plots sounds the closest, but it's a distribution over the area of 1, but I want to show the frequencies. 另外,密度图听起来最接近,但这是1区域的分布,但我想显示频率。

Basically, the variables are sub-part of the main variable, and I want to show when these sub-variable converge to create a peak. 基本上,这些变量是主变量的子部分,我想显示这些子变量何时收敛以创建峰值。 In essence, these variables sum to show a cumulative bound. 本质上,这些变量求和以显示累积界限。

Using ggplot2 you can use the geom_area() function 使用ggplot2您可以使用geom_area()函数

library(ggplot2)
library(gcookbook) # For the data set

ggplot(uspopage, aes(x=Year, y=Thousands, fill=AgeGroup)) + geom_area()

Thanks for sharing a little more about what your data look like. 感谢您分享更多有关您数据的外观。

Let's use the publicly available crime stats data from the Houston Police Department as an example. 让我们以休斯顿警察局提供的公开犯罪统计数据为例。 In this case, we're using the data set for the month of January, 2015. 在这种情况下,我们使用的是2015年1月的数据集。

library(ggplot2)

crime <- gdata::read.xls('http://www.houstontx.gov/police/cs/xls/jan15.xls')

# There's a single case in there where the offense type is called '1',
# that doesn't make sense to us so we'll remove it.
crime <- crime[!crime$Offense.Type == '1', ]
crime$Offense.Type <- droplevels(crime$Offense.Type)

There are 10 columns, but the ones we're interested in look like this: 一共有10列,但我们感兴趣的列如下所示:

# Hour Offense.Type
# 8   Auto Theft
# 13  Theft
# 5   Auto Theft
# 13  Theft
# 18  Theft
# 18  Theft

As you mentioned, the problem is that each row is a single incident. 如您所述,问题在于每一行都是单个事件。 We need a way to get frequencies on a per hour basis to pass to geom_area() . 我们需要一种方法来获取每小时传递给geom_area()频率。

The first way is to let ggplot2 handle it, no need to preformat the data. 第一种方法是让ggplot2处理它,而无需预先格式化数据。

p <- ggplot(crime, aes(x=Hour, fill=Offense.Type)) 
p + geom_area(aes(y = ..count..), stat='density')

ggplot密度法

The other way is to preformat the frequencies table, using R's table() and reshape2's melt() : 另一种方法是使用R的table()和reshape2的melt()对频率表进行预格式化:

library(reshape2)
crime.counts <- table(crime$Hour, crime$Offense.Type)
crime.counts.l <- melt(crime.counts,
                        id.vars = c('Hour'),
                        value.name = "NumberofCrimes")

names(crime.counts.l) <- c("Hour", "Offense.Type", "numberOfCrimes")
p <- ggplot(crime.counts.l, aes(x = Hour,
                                 y = numberOfCrimes,
                                 fill = Offense.Type))
p + geom_area()

预格式化表格方法

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM