[英]Need help in grouping rows by year and differentiating months
I have a dataframe that looks like this:我有一个看起来像这样的数据框:
dataframe:数据框:
Date Revenue
2009 15
dec 15
2010 450
jan 13
feb 14
mar 14
apr 10
may 10
jun 31
jul 99
aug 43
sep 87
oct 32
nov 54
dec 43
2011 67
And it continues for several years in the same pattern until 2019. The row which contains the year represents the aggregate revenue for that year.它以相同的模式持续数年,直到 2019 年。包含年份的行代表该年的总收入。 2009 is the only year which contains only one data point (december). 2009 年是唯一只包含一个数据点的年份(12 月)。
The dataframe is from a pivot table imported from excel that had months subgrouped for every year.数据框来自从 excel 导入的数据透视表,该数据透视表每年对几个月进行分组。
Each month is in the same column as the year and months from different years are not differentiated.每个月都在与年份相同的列中,不同年份的月份不区分。 I need to plot a line graph with monthly revenue for each year (that is, several lines for different years that show the revenue month by month), but the fact that I can't differentiate months from different years is not allowing me to.我需要绘制一个带有每年月收入的折线图(即,不同年份的几条线,逐月显示收入),但我无法区分不同年份的月份这一事实不允许我这样做。
How can I make subgroups of months by year?我怎样才能按年制作月份的子组? Or assigning a new column with years for determined intervals (that is, every 12 rows), but excluding the year rows?或者为确定的间隔(即每 12 行)分配一个带有年份的新列,但不包括年份行?
Thank you!谢谢!
I would suggest next approach formating your data, and completing values for year.我建议采用下一种方法来格式化您的数据,并完成年份的值。 Your data (I have defined as df
the output you included) has the feature that Date
variable has mixed numeric and character values.您的数据(我已将您包含的输出定义为df
)具有Date
变量具有混合数字和字符值的功能。 The code I added creates a new variable according to the type in order to extract the year.我添加的代码根据类型创建了一个新变量以提取年份。 After that missing rows are filled to completely identify the year group.之后填充缺失的行以完全识别年份组。 Finally, it is sketched the plot.最后,勾勒出剧情。 You only have one value for 2009 so it can not be seen and for 2011 there is only information about total.您只有 2009 年的一个值,因此无法看到它,而 2011 年只有有关总数的信息。 With your entire data you will have the complete image of all years.使用您的全部数据,您将拥有所有年份的完整图像。 Here a tidyverse
approach:这里有一个tidyverse
方法:
library(tidyverse)
#Data
df <- structure(list(Date = c("2009", "dec", "2010", "jan", "feb",
"mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov",
"dec", "2011"), Revenue = c(15L, 15L, 450L, 13L, 14L, 14L, 10L,
10L, 31L, 99L, 43L, 87L, 32L, 54L, 43L, 67L)), class = "data.frame", row.names = c(NA,
-16L))
The code:编码:
#Code
df %>% mutate(Var=ifelse(is.na(as.numeric(Date)),NA,as.numeric(Date))) %>%
fill(Var) %>%
#filter years in date to exclude big totals
filter(is.na(as.numeric(Date))) %>%
#Add order to levels
mutate(Date=factor(Date,levels = c("jan","feb","mar","apr","may",
"jun","jul","aug","sep","oct",
"nov","dec"),ordered=T)) %>%
#Finally plot
ggplot(aes(x=Date,y=Revenue,group=factor(Var),color=factor(Var)))+
geom_line()+
theme_bw()
Output:输出:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.