简体   繁体   English

R - 如何创建季节性情节 - 多年不同的线条

[英]R - How to create a seasonal plot - Different lines for years

I already asked the same question yesterday, but I didnt get any suggestions until now, so I decided to delete the old one and ask again, giving additional infos. 我昨天已经问了同样的问题,但直到现在我还没有得到任何建议,所以我决定删除旧的,再次询问,给予额外的信息。

So here again: 再来一次:

I have a dataframe like this: 我有这样的数据帧:

Link to the original dataframe: https://megastore.uni-augsburg.de/get/JVu_V51GvQ/ 链接到原始数据框: https//megastore.uni-augsburg.de/get/JVu_V51GvQ/

      Date   DENI011
1 1993-01-01   9.946
2 1993-01-02  13.663
3 1993-01-03   6.502
4 1993-01-04   6.031
5 1993-01-05  15.241
6 1993-01-06   6.561
     ....
     ....
6569 2010-12-26  44.113
6570 2010-12-27  34.764
6571 2010-12-28  51.659
6572 2010-12-29  28.259
6573 2010-12-30  19.512
6574 2010-12-31  30.231

I want to create a plot that enables me to compare the monthly values in the DENI011 over the years. 我想创建一个图表,使我能够比较多年来DENI011中的月度值。 So I want to have something like this: 所以我想要这样的东西:

http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Seasonal%20Plot http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Seasonal%20Plot 在此输入图像描述

Jan-Dec on the x-scale, values on the y-scale and the years displayed by different colored lines. 1月至12月的x尺度,y尺度的值和不同颜色线显示的年份。

I found several similar questions here, but nothing works for me. 我在这里找到了几个类似的问题,但对我来说没什么用。 I tried to follow the instructions on the website with the example, but the problem is that I cant create a ts-object. 我试图按照网站上的说明进行示例,但问题是我无法创建一个ts对象。

Then I tried it this way: 然后我这样试了:

Ref_Data$MonthN <- as.numeric(format(as.Date(Ref_Data$Date),"%m")) # Month's number
Ref_Data$YearN <- as.numeric(format(as.Date(Ref_Data$Date),"%Y"))
Ref_Data$Month  <- months(as.Date(Ref_Data$Date), abbreviate=TRUE) # Month's abbr.

g <- ggplot(data = Ref_Data, aes(x = MonthN, y = DENI011, group = YearN, colour=YearN)) + 
  geom_line() +
  scale_x_discrete(breaks = Ref_Data$MonthN, labels = Ref_Data$Month)

That also didnt work, the plot looks horrible. 这也没有用,情节看起来很糟糕。 I dont need to put all the years in 1 plot from 1993-2010. 从1993年到2010年,我不需要将所有年份都放在1个地块中。 Actually only a few years would be ok, like from 1998-2006 maybe. 实际上只有几年就可以了,比如1998-2006。

And suggestions, how to solve this? 和建议,如何解决这个问题?

As others have noted, in order to create a plot such as the one you used as an example, you'll have to aggregate your data first. 正如其他人所指出的那样,为了创建一个例如您用作示例的图,您必须首先聚合您的数据。 However, it's also possible to retain daily data in a similar plot. 但是,也可以在类似的情节中保留每日数据。

reprex::reprex_info()
#> Created by the reprex package v0.1.1.9000 on 2018-02-11

library(tidyverse)
library(lubridate)

# Import the data
url <- "https://megastore.uni-augsburg.de/get/JVu_V51GvQ/"
raw <- read.table(url, stringsAsFactors = FALSE)

# Parse the dates, and use lower case names
df <- as_tibble(raw) %>% 
  rename_all(tolower) %>% 
  mutate(date = ymd(date))

One trick to achieve this would be to set the year component in your date variable to a constant, effectively collapsing the dates to a single year, and then controlling the axis labelling so that you don't include the constant year in the plot. 实现此目的的一个技巧是将日期变量中的年份组件设置为常量,有效地将日期折叠为一年,然后控制轴标签,以便您不在绘图中包含常量年份。

# Define the plot
p <- df %>% 
  mutate(
    year = factor(year(date)),     # use year to define separate curves
    date = update(date, year = 1)  # use a constant year for the x-axis
  ) %>% 
  ggplot(aes(date, deni011, color = year)) +
    scale_x_date(date_breaks = "1 month", date_labels = "%b")

# Raw daily data
p + geom_line()

In this case though, your daily data are quite variable, so this is a bit of a mess. 但在这种情况下,您的日常数据变化很大,所以这有点乱。 You could hone in on a single year to see the daily variation a bit better. 您可以在一年内磨练,以便更好地了解每日变化。

# Hone in on a single year
p + geom_line(aes(group = year), color = "black", alpha = 0.1) +
  geom_line(data = function(x) filter(x, year == 2010), size = 1)

But ultimately, if you want to look a several years at a time, it's probably a good idea to present smoothed lines rather than raw daily values. 但最终,如果你想要一次看几年,那么提出平滑的线条而不是原始的每日价值可能是一个好主意。 Or, indeed, some monthly aggregate. 或者,确实是一些月度汇总。

# Smoothed version
p + geom_smooth(se = F)
#> `geom_smooth()` using method = 'loess'
#> Warning: Removed 117 rows containing non-finite values (stat_smooth).

There are multiple values from one month, so when plotting your original data, you got multiple points in one month. 从一个月开始有多个值,因此在绘制原始数据时,您在一个月内获得了多个积分。 Therefore, the line looks strange. 因此,这条线看起来很奇怪。

If you want to create something similar to the example your provided, you have to summarize your data by year and month. 如果您想创建与您提供的示例类似的内容,则必须按年份和月份汇总数据。 Below I calculated the mean of each year and month for your data. 下面我计算了数据的每年和每月的平均值。 In addition, you need to convert your year and month to factors if you want to plot it as discrete variables. 此外,如果要将其绘制为离散变量,则需要将年和月转换为因子。

library(dplyr)
Ref_Data2 <- Ref_Data %>%
  group_by(MonthN, YearN, Month) %>%
  summarize(DENI011 = mean(DENI011)) %>%
  ungroup() %>%
  # Convert the Month column to factor variable with levels from Jan to Dec
  # Convert the YearN column to factor
  mutate(Month = factor(Month, levels = unique(Month)),
         YearN = as.factor(YearN))

g <- ggplot(data = Ref_Data2, 
            aes(x = Month, y = DENI011, group = YearN, colour = YearN)) + 
  geom_line() 
g

在此输入图像描述

If you don't want to add in library(dplyr) , this is the base R code. 如果您不想添加library(dplyr) ,这是基本R代码。 Exact same strategy and results as www's answer. 与www的答案完全相同的策略和结果。

dat <- read.delim("~/Downloads/df1.dat", sep = " ")

dat$Date <- as.Date(dat$Date)

dat$month <- factor(months(dat$Date, TRUE), levels = month.abb)
dat$year <- gsub("-.*", "", dat$Date)

month_summary <- aggregate(DENI011 ~ month + year, data = dat, mean)

ggplot(month_summary, aes(month, DENI011, color = year, group = year)) +
    geom_path()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM