简体   繁体   中英

What is this type of graph, and how do you draw it in R?

http://imgur.com/IfVyu6f

I thought it would be something called cumulative and found cumulative frequency graph and cumulative flow diagram. However, I don't think neither is the graph in the image because cumulative graphs start from 0, but my variables do not. Also, density plots sounds the closest, but it's a distribution over the area of 1, but I want to show the frequencies.

Basically, the variables are sub-part of the main variable, and I want to show when these sub-variable converge to create a peak. In essence, these variables sum to show a cumulative bound.

Using ggplot2 you can use the geom_area() function

library(ggplot2)
library(gcookbook) # For the data set

ggplot(uspopage, aes(x=Year, y=Thousands, fill=AgeGroup)) + geom_area()

Thanks for sharing a little more about what your data look like.

Let's use the publicly available crime stats data from the Houston Police Department as an example. In this case, we're using the data set for the month of January, 2015.

library(ggplot2)

crime <- gdata::read.xls('http://www.houstontx.gov/police/cs/xls/jan15.xls')

# There's a single case in there where the offense type is called '1',
# that doesn't make sense to us so we'll remove it.
crime <- crime[!crime$Offense.Type == '1', ]
crime$Offense.Type <- droplevels(crime$Offense.Type)

There are 10 columns, but the ones we're interested in look like this:

# Hour Offense.Type
# 8   Auto Theft
# 13  Theft
# 5   Auto Theft
# 13  Theft
# 18  Theft
# 18  Theft

As you mentioned, the problem is that each row is a single incident. We need a way to get frequencies on a per hour basis to pass to geom_area() .

The first way is to let ggplot2 handle it, no need to preformat the data.

p <- ggplot(crime, aes(x=Hour, fill=Offense.Type)) 
p + geom_area(aes(y = ..count..), stat='density')

ggplot密度法

The other way is to preformat the frequencies table, using R's table() and reshape2's melt() :

library(reshape2)
crime.counts <- table(crime$Hour, crime$Offense.Type)
crime.counts.l <- melt(crime.counts,
                        id.vars = c('Hour'),
                        value.name = "NumberofCrimes")

names(crime.counts.l) <- c("Hour", "Offense.Type", "numberOfCrimes")
p <- ggplot(crime.counts.l, aes(x = Hour,
                                 y = numberOfCrimes,
                                 fill = Offense.Type))
p + geom_area()

预格式化表格方法

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM