[英]r plot time-series for summed up multiple variables
這是我第一次嘗試使用時間序列圖。 我有一個約有5萬行的數據集,其多年結構如下。
Year expense_1 expense_2 expense_3 expense_4
1999 5 NA NA 31.82
2000 2 NA NA 4.75
1999 10.49 NA NA NA
2000 39.69 NA NA NA
2000 NA NA 10.61 NA
1999 8.08 NA NA NA
2000 16 NA NA NA
1999 9.32 NA NA NA
1999 9.35 NA NA NA
現在,我想在X軸上繪制Year
為Y的時間序列,在Y軸上 expense_1
Expense
時間expense_1
, expense_2
不同的線分別是expense_1
, expense_2
, expense_3
, expense_4
。 每個類別的費用應按年度匯總,並且不NA
。
你可以計算sum
使用summarise_all
那么你的數據轉換為長格式,使得它更容易使用繪制ggplot
library(tidyverse)
library(scales)
df <- read.table(text = "Year expense_1 expense_2 expense_3 expense_4
1999 5 NA NA 31.82
2000 2 NA NA 4.75
1999 10.49 NA NA NA
2000 39.69 NA NA NA
2000 NA NA 10.61 NA
1999 8.08 NA NA NA
2000 16 NA NA NA
1999 9.32 NA NA NA
1999 9.35 NA NA NA",
header = TRUE, stringsAsFactors = FALSE)
# define summation function that returns NA if all values are NA
# By default, R returns 0 if all values are NA
sum_NA <- function(x) {
if(all(is.na(x))) NA_integer_ else sum(x, na.rm = TRUE)
}
df_long <- df %>%
group_by(Year) %>%
summarise_all(funs(sum_NA(.))) %>%
gather(key = "type", value = "expense", -Year)
df_long
#> # A tibble: 8 x 3
#> Year type expense
#> <int> <chr> <dbl>
#> 1 1999 expense_1 42.2
#> 2 2000 expense_1 57.7
#> 3 1999 expense_2 NA
#> 4 2000 expense_2 NA
#> 5 1999 expense_3 NA
#> 6 2000 expense_3 10.6
#> 7 1999 expense_4 31.8
#> 8 2000 expense_4 4.75
ggplot(df_long, aes(x = Year, y = expense, color = type, group = type)) +
geom_point() +
geom_line() +
scale_x_continuous(breaks = scales::pretty_breaks(n = 1)) +
theme_bw()
由reprex軟件包 (v0.2.0)創建於2018-05-21。
您可以讓ggplot
為您完成大部分工作-只需gather
,然后開始繪圖即可:
df %>%
gather(expense, value, -Year) %>%
ggplot(aes(x=Year, y=value, color=expense)) +
geom_line(stat="summary", fun.y="sum")
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.