简体   繁体   English

如何在R中累积data.frame的多列?

[英]how to accumulated multiple columns of a data.frame in R?

I am trying to find accumulated values for each year of variables A to Z in myData .我试图在myData找到每年变量A to Z累积值。 I have tried a few things but didn't succeed.我尝试了几件事,但没有成功。 Once i do that, i would then need to compute maximum,minimum, median, upper and lower quartile average across all those years.一旦我这样做了,我就需要计算所有这些年的maximum,minimum, median, upper and lower quartile平均值。 Here is my laborious code so far but don't have any idea how to proceed further- in fact, the current code also is not giving me what i am after.到目前为止,这是我费力的代码,但不知道如何进一步进行 - 事实上,当前的代码也没有给我我想要的东西。

library(tidyverse)

mydate <- as.data.frame(seq(as.Date("2000-01-01"), to= as.Date("2019-12-31"), by="day"))
colnames(mydate) <- "Date"
Data <- data.frame(A = runif(7305,0,10), 
                   J = runif(7305,0,8), 
                   X = runif(7305,0,12), 
                   Z = runif(7305,0,10))
DF <- data.frame(mydate, Data)

myData <- DF %>% separate(Date, into = c("Year","Month","Day")) %>% 
   sapply(as.numeric) %>% 
   as.data.frame() %>% 
   mutate(Date = DF$Date) %>% 
   filter(Month > 4 & Month < 11) %>% 
   mutate(DOY = format(Date, "%j")) %>% 
   group_by(Year) %>% 
   mutate(cumulativeSum = accumulate(DOY))

I am trying to get a Figure like below for A, J, X, Z .我正在尝试为A, J, X, Z获得如下图所示。 any help would be appreciated.任何帮助,将不胜感激。

Update (EDIT)更新(编辑)

My question is pretty confusing so i decided to break it down into steps using excel.我的问题非常令人困惑,所以我决定使用 excel 将其分解为多个步骤。 Here i am using only one variable which in this case is A (note: in my question i have multiple variable).在这里,我只使用了一个变量,在这种情况下是A (注意:在我的问题中,我有多个变量)。 i am accumulated data from May to October each year which is reflected in column cumulative sum .我是每年 5 月到 10 月的累计数据,反映在列cumulative sum In the second step (Step-2) , i re-arrange the data in day of the year (May to October) with their data.在第二步(Step-2) ,我用他们的数据重新排列一年中(五月到十月)的数据。 in step-3 , i am taking the statistics i mentioned earlier across all the years for every day of the year.step-3 ,我将在一年中的每一天获取我之前提到的所有年份的统计数据。 I try to clarify as much as i could but probably this a bit strange question.我试图尽可能多地澄清,但这可能是一个有点奇怪的问题。 在此处输入图片说明

Ultimate Figure Here is an example Figure that i would like to derive as a result of this exercise.终极图这是我想通过此练习得出的示例图。

在此处输入图片说明

So, if I'm understand well, you are trying to plot the statistical descriptive of the cumulative values of each variable between May and October of years 2000 to 2019.因此,如果我理解得很好,您正在尝试绘制 2000 年至 2019 年 5 月至 10 月之间每个变量的累积值的统计描述。

So here is a possible solution to calculate first descriptive statistics of each variable (using dplyr , lubridate , tiydr package) - I encouraged you to break this code in several part in order to understand all steps.所以这里有一个可能的解决方案来计算每个变量的第一个描述性统计数据(使用dplyrlubridatetiydr包)——我鼓励你把这段代码tiydr几个部分来理解所有的步骤。

Basically, I isolate month and year of the date, then, pivot the dataframe into a longer format, filter for keeping values only in the period of interest (May to October), calculate the cumulative sum of values grouped by variables and year.基本上,我隔离日期的月份和年份,然后将数据框转换为更长的格式,过滤以仅保留感兴趣的时间段(5 月至 10 月)中的值,计算按变量和年份分组的值的累积总和。 Then, I create a fake date (by pasting a consistent year with real month and days) in order to calculate descriptive statistics in function of this date and variable.然后,我创建了一个假日期(通过将一致的年份与真实的月份和日期粘贴在一起)以计算此日期和变量函数的描述性统计数据。

Altogether, it gives something like that:总而言之,它给出了这样的东西:

library(lubridate)
library(dplyr)
library(tidyr)

mydata <- DF %>% mutate(Year = year(Date), Month = month(Date)) %>%
  pivot_longer(-c(Date,Year,Month), names_to = "variable", values_to = "values") %>% 
  filter(between(Month,5,10)) %>% 
  group_by(Year, variable) %>% 
  mutate(Cumulative = cumsum(values)) %>%
  mutate(NewDate = ymd(paste("2020", Month,day(Date), sep = "-"))) %>%
  ungroup() %>%
  group_by(variable, NewDate) %>%
  summarise(Median = median(Cumulative),
            Maximum = max(Cumulative),
            Minimum = min(Cumulative),
            Upper = quantile(Cumulative,0.75),
            Lower = quantile(Cumulative, 0.25))

Then, you can get a similar plot to your example by doing:然后,您可以通过执行以下操作获得与示例类似的图:

library(ggplot2)
ggplot(mydata, aes(x = NewDate))+
  geom_ribbon(aes(ymin = Lower, ymax = Upper), color = "grey", alpha =0.5)+
  geom_line(aes(y = Median), color = "darkblue")+
  geom_line(aes(y = Maximum), color = "red", linetype = "dashed", size = 1.5)+
  geom_line(aes(y = Minimum), color ="red", linetype = "dashed", size = 1.5)+
  facet_wrap(~variable, scales = "free")+
  scale_x_date(date_labels = "%b", date_breaks = "month", name = "Month")+
  ylab("Daily Cumulative Precipitation (mm)")

在此处输入图片说明

Does it look what you are trying to achieve ?它看起来像你想要达到的目标吗?


EDIT: Adding Legends编辑:添加图例

Adding a legend here is not easy as you are using different geom (ribbon, line) with different color, shape, ...这里添加一个传说是不容易,因为你正在使用不同geom (织带,线)有不同的颜色,形状,...

So, one way is to regroup statistics that can be plot with the same geom and do:因此,一种方法是重新组合可以使用相同geom绘制的统计数据并执行以下操作:

mydata %>% pivot_longer(cols = c(Median, Minimum,Maximum), names_to = "Statistic",values_to = "Value") %>%
  ggplot(aes(x = NewDate))+
  geom_ribbon(aes(ymin = Lower, ymax = Upper, fill = "Upper / Lower"), alpha =0.5)+
  geom_line(aes(y = Value, color = Statistic, linetype = Statistic, size = Statistic))+
  facet_wrap(~variable, scales = "free")+
  scale_x_date(date_labels = "%b", date_breaks = "month", name = "Month")+
  ylab("Daily Cumulative Precipitation (mm)")+
  scale_size_manual(values = c(1.5,1,1.5))+
  scale_linetype_manual(values = c("dashed","solid","dashed"))+
  scale_color_manual(values = c("red","darkblue","red"))+
  scale_fill_manual(values = "grey", name = "")

在此处输入图片说明

So, it looks good but as you can see, it's a litle bit weird as the Upper/Lower is slightly out of the main legends.所以,它看起来不错,但正如你所看到的,它有点奇怪,因为上/下稍微超出了主要传说。

Another solution is to add legends as labeling on the last date.另一种解决方案是添加图例作为最后日期的标签。 For that, you can create a second dataframe by subsetting only the last date of your first dataframe:为此,您可以通过仅子集第一个数据帧的最后日期来创建第二个数据帧:

mydata_label <- mydata %>% filter(NewDate == max(NewDate)) %>% 
  pivot_longer(cols = Median:Lower, names_to = "Stat",values_to = "val")

Then, without changing much the plotting part, you can do:然后,在不改变绘图部分的情况下,您可以执行以下操作:

ggplot(mydata, aes(x = NewDate))+
  geom_ribbon(aes(ymin = Lower, ymax = Upper), alpha =0.5)+
  geom_line(aes(y = Median), color = "darkblue")+
  geom_line(aes(y = Maximum), color = "red", linetype = "dashed", size = 1.5)+
  geom_line(aes(y = Minimum), color ="red", linetype = "dashed", size = 1.5)+
  facet_wrap(~variable, scales = "free")+
  scale_x_date(date_labels = "%b", date_breaks = "month", name = "Month", limits = c(min(mydata$NewDate),max(mydata$NewDate)+25))+
  ylab("Daily Cumulative Precipitation (mm)")+
  geom_text(data = mydata_label, 
            aes(x = NewDate+5, y = val, label = Stat, color = Stat), size = 2, hjust = 0, show.legend = FALSE)+
  scale_color_manual(values = c("Median" = "darkblue","Maximum" = "red","Minimum" = "red","Upper" = "black", "Lower" = "black"))

在此处输入图片说明

I reduced on purpose the size of the text labeling due to space issues in order you can see all of them.由于空间问题,我特意缩小了文本标签的大小,以便您可以看到所有这些。 But based on the figure you attached to your question, you should have plenty of space to make it working.但是根据您附加到您的问题的数字,您应该有足够的空间使其工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM