简体   繁体   English

barplot-对x轴标签进行分组,而无需操纵伴随的条

[英]barplot - Grouping x-axis labels without manipulating accompanying bars

I'm doing some basic data analysis on this dataset: https://www.kaggle.com/murderaccountability/homicide-reports 我正在对此数据集进行一些基本数据分析: https : //www.kaggle.com/murderaccountability/homicide-reports

I'm generating a basic barplot using the State names as the x-axis values, and the y-axis values is the percentage of nationwide homicide occurrences (number of entries in the data set divided by the total number of entries) 我正在使用州名称作为x轴值来生成基本条形图,而y轴值是全国范围内凶杀案发生的百分比(数据集中的条目数除以条目总数)

barplot(prop.table(table(homicideData.raw$State)),
    main = "Nationwide Homicide % per State",
    ylab = "Accounting % of Nation-wide Homicides",
    las=2)

在此处输入图片说明

This is very messy, is there a way of grouping perhaps 5 states together as an x-axis label, without changing the bars? 这非常混乱,是否有一种方法可以将5个状态作为x轴标签分组在一起,而无需更改条形?

Let's say the following for example: 举例来说:

x-axis labels: "Alaska - California", "Colorado - Florida", ... (and so on). x轴标签:“阿拉斯加-加利福尼亚”,“科罗拉多州-佛罗里达”,...(依此类推)。 Each label should then have 5 bars above it. 每个标签上方应有5条。

Here's a solution with ggplot. 这是ggplot的解决方案。 It's not the simplest as it involves some data-manipulation. 这不是最简单的,因为它涉及一些数据操作。

(1) read in the data-set and extract the homicide count/proportion by state: (1)读取数据集并按状态提取凶杀计数/比例:

df <- read.csv("homicide.csv")

library(dplyr)
freq <- with(df, table(State)) %>% data.frame
freq <- freq %>% mutate(prop = Freq/sum(Freq))

(2) find first and last element of each group of 5 states: (2)找到每组5个状态的第一个和最后一个元素:

hd <- seq(1, nrow(freq), by=5) %>% ceiling
hd <- hd[-length(hd)]
td <- c((hd-1)[-1], nrow(freq)) 

(3) custom function to make the custom label for each group (eg Alb - Clf) and calculate length of each group (3)自定义功能,为每组(例如Alb-Clf)制作自定义标签,并计算每组的长度

abbrevFn <- function(head, tail, state, ...) paste(abbreviate(state[c(head,tail)], ...), collapse = " - ")

intervalFn <- function(head, tail) diff(c(head, tail)) + 1

(4) group the states by replicating custom label by the length for each group (4)通过按每个组的长度复制自定义标签来对状态进行分组

freq$group <- lapply(1:length(hd), function(x) rep(abbrevFn(hd[x], td[x], freq$State, min=3), intervalFn(hd[x], td[x]))) %>% unlist

(5) plot geom_bar based on the customised group, and dodge position by state: (5)根据自定义组绘制geom_bar,并按状态躲避位置:

xint <- c((1:length(hd) - .5), (1:length(hd) + .5)) %>% unique

library(ggplot2)
ggplot(freq, aes(group, prop, fill=State)) + 
  geom_bar(stat="identity", position="dodge", width=1) + 
  scale_fill_manual(values=rep("gray80", nrow(freq))) +
  ylab("Accounting % of Nation-wide Homicides") +
  xlab("States") +
  geom_vline(xintercept=xint, linetype="dotted") +
  guides(fill=FALSE) +
  theme_bw()

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM