简体   繁体   English

如何在R中将X轴拆分为十分位数并进行ggplot

[英]How to split x-axis as decile in R and make ggplot

Hi I am wondering how to split x-axis as decile in R and make ggplot? 嗨,我想知道如何在R中将x轴拆分为十分位数并制作ggplot?

I currently have age range data and NO2 pollution data. 我目前有年龄数据和二氧化氮污染数据。 The two datasets share the same geographic reference named ward. 这两个数据集共享名为ward的相同地理参考。 I wish to plot my demographic data in quantiles of equal number of ward (Total 298). 我希望将我的人口统计数据绘制在相同病房数的分位数中(总数298)。

我的桌子

I tried the quantile regression in R where I used the following: 我在R中使用以下方法尝试了分位数回归:

library(SparseM) 
library(quantreg)
mydata<- read.csv("M:/Desktop10/Test2.csv") 
attach(mydata) 
Y <- cbind(NO2.value)
X <- cbind(age.0.to.4, age..5.to.9, age.10.to.14, age.15.to.19, age.20.to.24, age.25.to.29, age.30.to.44, age.45.to.59, age.60.to.64, age.65.to.74, age.75.to.84, age.85.to.89, age.above.90) 
quantreg.all <- rq(Y ~ X, tau = seq(0.05, 0.95, by = 0.05), data=mydata) 
quantreg.plot <- summary(quantreg.all) 
plot(quantreg.plot) 

But what I get are not what I expected as the y-axies is not the NO2 data. 但是我得到的不是我期望的,因为y轴不是NO2数据。

The ideal plot is attached: 附上理想图:

理想情节

Many thanks for your help and suggestions. 非常感谢您的帮助和建议。

If I understand your question, I think the cut function combined with the quantile function will create the deciles. 如果我理解您的问题,我认为cut函数与quantile函数结合将创建十分位。 Here's an example with fake data. 这是伪数据的示例。

In the code below, we use the cut function to split the data into deciles and we use the quantile function to set the breaks argument for cut . 在下面的代码中,我们使用cut函数将数据拆分为十分位数,并使用quantile函数为cut设置breaks参数。 This tells cut to group the data into 10 groups of equal size, from smallest values of NO2 to largest. 这告诉cut将数据分组为10组,大小相等,从NO2最小值到最大值。

group_by(age) means we create the deciles separately for each age group. group_by(age)表示我们为每个age分别创建十分位。 This means that there are equal numbers of subjects within each decile in a given age group, but the NO2 cutoff values for each decile are different for different age groups. 这意味着在给定年龄组中,每个十分位中的对象数相等,但是对于不同年龄组,每个十分位的NO2临界值是不同的。 To create deciles over the data as a whole, just remove group_by(age) . 要创建整个数据的十分表,只需删除group_by(age) This will result in the same NO2 cutoff values for each decile across all age groups, but within a given age group, the number of subjects will not be the same in each decile. 这将在所有年龄组中为每个十分位得出相同的NO2临界值,但是在给定年龄组内,每个十分位中的受试者人数将不同。

library(tidyverse)

# Fake data
set.seed(2)
dat = data.frame(NO2=c(runif(600, 0, 10), runif(400, 1, 11)), 
                 age=rep(c("0-10","11-20"), c(600,400)))

# Create decile groups
dat = dat %>% 
  group_by(age) %>% 
  mutate(decile = cut(NO2, breaks=quantile(NO2, probs=seq(0,1,0.1)), 
                      labels=10:1, include.lowest=TRUE),
         decile = fct_rev(decile))

Now we plot using ggplot2 . 现在我们使用ggplot2进行ggplot2 The stat_summary function returns the mean for each decile in each age group. stat_summary函数返回每个age组中每个decile的平均值。

ggplot(dat, aes(decile, NO2, colour=age, group=age)) +
  stat_summary(fun.y=mean, geom="line") +
  stat_summary(fun.y=mean, geom="point") +
  expand_limits(y=0) +
  theme_bw()

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM