简体   繁体   English

查找带有日期的每年的最小值和最大值

[英]Find min and max value for each year with date

I have a huge data set and I want the minimum and maximum value for each year with the date.我有一个庞大的数据集,我想要日期的每一年的最小值和最大值。

My dataframe df looks like this:我的 dataframe df看起来像这样:

Date       a  b  
01/20/2015 20 50 
05/13/2015 60 70
10/18/2015 22 45
04/22/2016 15 40
04/25/2016 20 30
06/28/2016 33 45
01/01/2018 90 20
04/25/2018 50 30
10/19/2018 45 55

And I want my output like this我想要我的 output 像这样

Date       min.a max.a min.b max.b
01/20/2015 20      
05/13/2015        70
10/18/2015               45   
05/13/2015                    70

and similarly for other years.其他年份也是如此。

I was using the following code but I was not able to extract the date of each year.我正在使用以下代码,但无法提取每年的日期。 I have extracted the year from the date column.我从日期列中提取了年份。

df$year<-format(df$date,"%y")
df%>%
group_by(a,b)%>%summarize(min(a),max(a),min(b),max(b))

but I do not get my desired output.但我没有得到我想要的 output。 I want min-max value for each year with the date.我想要日期的每一年的最小值-最大值。

The following code works and does everything for the maximum, I think it should be easy to adapt to the minimum (just repeat the code accordingly).以下代码可以最大程度地工作并完成所有工作,我认为应该很容易适应最小值(只需相应地重复代码)。


library(dplyr)

df %>%
group_by(year) %>%
mutate(max.a = max(a), max.b = max(b)) %>%
ungroup() %>%
mutate(max.a = case_when(a == max.a ~ max.a, TRUE ~ NA_real_), max.b = case_when(b == max.b ~ max.b, TRUE ~ NA_real_)) %>%
filter(!is.na(max.a) | !is.na(max.b)) %>%
select(-a,-b)

This should do the trick;这应该可以解决问题; I've updated the code to provide a presentation exactly as you have indicated in the question.我已经更新了代码以提供与您在问题中指出的完全相同的演示文稿。


library(tibble)
library(lubridate)
library(tidyr)
library(dplyr)
library(stringr)

data <- 
  tribble(
    ~Date, ~a, ~b,
    "01/20/2015", 20, 50,
    "05/13/2015", 60, 70, 
    "10/18/2015", 22, 45, 
    "04/22/2016", 15, 40, 
    "04/25/2016", 20, 30, 
    "06/28/2016", 33, 45,
    "01/01/2018", 90, 20, 
    "04/25/2018", 50, 30,
    "10/19/2018" ,45, 55)


anal <- 
  data %>% 
  #here's how to manate the date bit
  mutate(Date = mdy(Date),
         yr = year(Date)) %>%
  # then as Fnguyen's answer
  group_by(yr) %>% 
  mutate(min_a = min(a),
        max_a = max(a),
        min_b = min(b),
        max_b = max(b))%>% 
  ungroup() %>%
  mutate(min_a = case_when(a == min_a ~ min_a,
                           TRUE ~ NA_real_),
         max_a = case_when(a == max_a ~ max_a,
                           TRUE ~ NA_real_),
         min_b = case_when(b == min_b ~ min_b,
                           TRUE ~ NA_real_),
         max_b = case_when(b == max_b ~ max_b,
                           TRUE ~ NA_real_))%>%
  filter(!is.na(min_a) | !is.na(max_a) | !is.na(min_b) | !is.na(max_b)) %>%
  select(-c(a, b)) %>% 
  pivot_longer(cols = min_a:max_b, names_to = "metric", values_to = "val") %>% 
  na.omit() %>%
  mutate(metric = factor(metric, levels = c("min_a", "max_a", "min_b", "max_b"), ordered = TRUE)) %>% 
  arrange(yr, metric) %>% 
  rowid_to_column() %>% 
  pivot_wider(names_from = metric, values_from = val) %>% 
  select(-c(rowid, yr))

anal

Which gives you:这给了你:

在此处输入图像描述

Here is a base R solution这是一个基本的 R 解决方案

f <- function(v) {
  Date <- (v[c(which.min(v$a),
            which.max(v$a),
            which.min(v$b),
            which.max(v$b)),"Date"])
  q <- setNames(data.frame(diag(c(range(v$a),range(v$b)))),c("min.a","max.a","min.b","max.b"))
  cbind(Date,q)
}

dfout <- do.call(rbind,
                 c(make.row.names = FALSE,
                   lapply(split(df,format(df$Date,"%Y")),f)))

such that这样

> dfout
         Date min.a max.a min.b max.b
1  2015-01-20    20     0     0     0
2  2015-05-13     0    60     0     0
3  2015-10-18     0     0    45     0
4  2015-05-13     0     0     0    70
5  2016-04-22    15     0     0     0
6  2016-06-28     0    33     0     0
7  2016-04-25     0     0    30     0
8  2016-06-28     0     0     0    45
9  2018-10-19    45     0     0     0
10 2018-01-01     0    90     0     0
11 2018-01-01     0     0    20     0
12 2018-10-19     0     0     0    55

DATA数据

df <- structure(list(Date = structure(c(16455, 16568, 16726, 16913, 
16916, 16980, 17532, 17646, 17823), class = "Date"), a = c(20L, 
60L, 22L, 15L, 20L, 33L, 90L, 50L, 45L), b = c(50L, 70L, 45L, 
40L, 30L, 45L, 20L, 30L, 55L)), row.names = c(NA, -9L), class = "data.frame")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM