简体   繁体   English

需要帮助按年份和区分月份对行进行分组

[英]Need help in grouping rows by year and differentiating months

I have a dataframe that looks like this:我有一个看起来像这样的数据框:

dataframe:数据框:

Date    Revenue   
2009      15       
dec       15       
2010      450       
jan       13       
feb       14       
mar       14       
apr       10       
may       10       
jun       31       
jul       99    
aug       43  
sep       87 
oct       32  
nov       54     
dec       43
2011      67

And it continues for several years in the same pattern until 2019. The row which contains the year represents the aggregate revenue for that year.它以相同的模式持续数年,直到 2019 年。包含年份的行代表该年的总收入。 2009 is the only year which contains only one data point (december). 2009 年是唯一只包含一个数据点的年份(12 月)。

The dataframe is from a pivot table imported from excel that had months subgrouped for every year.数据框来自从 excel 导入的数据透视表,该数据透视表每年对几个月进行分组。

Each month is in the same column as the year and months from different years are not differentiated.每个月都在与年份相同的列中,不同年份的月份不区分。 I need to plot a line graph with monthly revenue for each year (that is, several lines for different years that show the revenue month by month), but the fact that I can't differentiate months from different years is not allowing me to.我需要绘制一个带有每年月收入的折线图(即,不同年份的几条线,逐月显示收入),但我无法区分不同年份的月份这一事实不允许我这样做。

How can I make subgroups of months by year?我怎样才能按年制作月份的子组? Or assigning a new column with years for determined intervals (that is, every 12 rows), but excluding the year rows?或者为确定的间隔(即每 12 行)分配一个带有年份的新列,但不包括年份行?

Thank you!谢谢!

I would suggest next approach formating your data, and completing values for year.我建议采用下一种方法来格式化您的数据,并完成年份的值。 Your data (I have defined as df the output you included) has the feature that Date variable has mixed numeric and character values.您的数据(我已将您包含的输出定义为df )具有Date变量具有混合数字和字符值的功能。 The code I added creates a new variable according to the type in order to extract the year.我添加的代码根据类型创建了一个新变量以提取年份。 After that missing rows are filled to completely identify the year group.之后填充缺失的行以完全识别年份组。 Finally, it is sketched the plot.最后,勾勒出剧情。 You only have one value for 2009 so it can not be seen and for 2011 there is only information about total.您只有 2009 年的一个值,因此无法看到它,而 2011 年只有有关总数的信息。 With your entire data you will have the complete image of all years.使用您的全部数据,您将拥有所有年份的完整图像。 Here a tidyverse approach:这里有一个tidyverse方法:

library(tidyverse)
#Data
df <- structure(list(Date = c("2009", "dec", "2010", "jan", "feb", 
"mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", 
"dec", "2011"), Revenue = c(15L, 15L, 450L, 13L, 14L, 14L, 10L, 
10L, 31L, 99L, 43L, 87L, 32L, 54L, 43L, 67L)), class = "data.frame", row.names = c(NA, 
-16L))

The code:编码:

#Code
df %>% mutate(Var=ifelse(is.na(as.numeric(Date)),NA,as.numeric(Date))) %>%
  fill(Var) %>%
  #filter years in date to exclude big totals
  filter(is.na(as.numeric(Date))) %>%
  #Add order to levels
  mutate(Date=factor(Date,levels = c("jan","feb","mar","apr","may",
                                     "jun","jul","aug","sep","oct",
                                     "nov","dec"),ordered=T)) %>%
  #Finally plot
  ggplot(aes(x=Date,y=Revenue,group=factor(Var),color=factor(Var)))+
  geom_line()+
  theme_bw()

Output:输出:

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM