This is first time I am trying to work with time-series plots. I have a dataset with approximately 50k rows with similar structure as below for multiple years;
Year expense_1 expense_2 expense_3 expense_4
1999 5 NA NA 31.82
2000 2 NA NA 4.75
1999 10.49 NA NA NA
2000 39.69 NA NA NA
2000 NA NA 10.61 NA
1999 8.08 NA NA NA
2000 16 NA NA NA
1999 9.32 NA NA NA
1999 9.35 NA NA NA
Now, I want to plot a time-series with Year
on the X-axis , Expense
on the Y-axis with different lines each for expense_1
, expense_2
, expense_3
, expense_4
. The expense for each category should be summed-up year-wise and NA
should be removed.
You can calculate the sum
using summarise_all
then convert your data to long format so that it's easier to plot using ggplot
library(tidyverse)
library(scales)
df <- read.table(text = "Year expense_1 expense_2 expense_3 expense_4
1999 5 NA NA 31.82
2000 2 NA NA 4.75
1999 10.49 NA NA NA
2000 39.69 NA NA NA
2000 NA NA 10.61 NA
1999 8.08 NA NA NA
2000 16 NA NA NA
1999 9.32 NA NA NA
1999 9.35 NA NA NA",
header = TRUE, stringsAsFactors = FALSE)
# define summation function that returns NA if all values are NA
# By default, R returns 0 if all values are NA
sum_NA <- function(x) {
if(all(is.na(x))) NA_integer_ else sum(x, na.rm = TRUE)
}
df_long <- df %>%
group_by(Year) %>%
summarise_all(funs(sum_NA(.))) %>%
gather(key = "type", value = "expense", -Year)
df_long
#> # A tibble: 8 x 3
#> Year type expense
#> <int> <chr> <dbl>
#> 1 1999 expense_1 42.2
#> 2 2000 expense_1 57.7
#> 3 1999 expense_2 NA
#> 4 2000 expense_2 NA
#> 5 1999 expense_3 NA
#> 6 2000 expense_3 10.6
#> 7 1999 expense_4 31.8
#> 8 2000 expense_4 4.75
ggplot(df_long, aes(x = Year, y = expense, color = type, group = type)) +
geom_point() +
geom_line() +
scale_x_continuous(breaks = scales::pretty_breaks(n = 1)) +
theme_bw()
Created on 2018-05-21 by the reprex package (v0.2.0).
You can let ggplot
do most of the work for you - just gather
, then start plotting:
df %>%
gather(expense, value, -Year) %>%
ggplot(aes(x=Year, y=value, color=expense)) +
geom_line(stat="summary", fun.y="sum")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.