[英]How to split x-axis as decile in R and make ggplot
Hi I am wondering how to split x-axis as decile in R and make ggplot? 嗨,我想知道如何在R中将x轴拆分为十分位数并制作ggplot?
I currently have age range data and NO2 pollution data. 我目前有年龄数据和二氧化氮污染数据。 The two datasets share the same geographic reference named ward.
这两个数据集共享名为ward的相同地理参考。 I wish to plot my demographic data in quantiles of equal number of ward (Total 298).
我希望将我的人口统计数据绘制在相同病房数的分位数中(总数298)。
I tried the quantile regression in R where I used the following: 我在R中使用以下方法尝试了分位数回归:
library(SparseM)
library(quantreg)
mydata<- read.csv("M:/Desktop10/Test2.csv")
attach(mydata)
Y <- cbind(NO2.value)
X <- cbind(age.0.to.4, age..5.to.9, age.10.to.14, age.15.to.19, age.20.to.24, age.25.to.29, age.30.to.44, age.45.to.59, age.60.to.64, age.65.to.74, age.75.to.84, age.85.to.89, age.above.90)
quantreg.all <- rq(Y ~ X, tau = seq(0.05, 0.95, by = 0.05), data=mydata)
quantreg.plot <- summary(quantreg.all)
plot(quantreg.plot)
But what I get are not what I expected as the y-axies is not the NO2 data. 但是我得到的不是我期望的,因为y轴不是NO2数据。
The ideal plot is attached: 附上理想图:
Many thanks for your help and suggestions. 非常感谢您的帮助和建议。
If I understand your question, I think the cut
function combined with the quantile
function will create the deciles. 如果我理解您的问题,我认为
cut
函数与quantile
函数结合将创建十分位。 Here's an example with fake data. 这是伪数据的示例。
In the code below, we use the cut
function to split the data into deciles and we use the quantile
function to set the breaks
argument for cut
. 在下面的代码中,我们使用
cut
函数将数据拆分为十分位数,并使用quantile
函数为cut
设置breaks
参数。 This tells cut
to group the data into 10 groups of equal size, from smallest values of NO2
to largest. 这告诉
cut
将数据分组为10组,大小相等,从NO2
最小值到最大值。
group_by(age)
means we create the deciles separately for each age
group. group_by(age)
表示我们为每个age
分别创建十分位。 This means that there are equal numbers of subjects within each decile in a given age group, but the NO2 cutoff values for each decile are different for different age groups. 这意味着在给定年龄组中,每个十分位中的对象数相等,但是对于不同年龄组,每个十分位的NO2临界值是不同的。 To create deciles over the data as a whole, just remove
group_by(age)
. 要创建整个数据的十分表,只需删除
group_by(age)
。 This will result in the same NO2 cutoff values for each decile across all age groups, but within a given age group, the number of subjects will not be the same in each decile. 这将在所有年龄组中为每个十分位得出相同的NO2临界值,但是在给定年龄组内,每个十分位中的受试者人数将不同。
library(tidyverse)
# Fake data
set.seed(2)
dat = data.frame(NO2=c(runif(600, 0, 10), runif(400, 1, 11)),
age=rep(c("0-10","11-20"), c(600,400)))
# Create decile groups
dat = dat %>%
group_by(age) %>%
mutate(decile = cut(NO2, breaks=quantile(NO2, probs=seq(0,1,0.1)),
labels=10:1, include.lowest=TRUE),
decile = fct_rev(decile))
Now we plot using ggplot2
. 现在我们使用
ggplot2
进行ggplot2
。 The stat_summary
function returns the mean for each decile
in each age
group. stat_summary
函数返回每个age
组中每个decile
的平均值。
ggplot(dat, aes(decile, NO2, colour=age, group=age)) +
stat_summary(fun.y=mean, geom="line") +
stat_summary(fun.y=mean, geom="point") +
expand_limits(y=0) +
theme_bw()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.