[英]Find min and max value for each year with date
I have a huge data set and I want the minimum and maximum value for each year with the date.我有一个庞大的数据集,我想要日期的每一年的最小值和最大值。
My dataframe df
looks like this:我的 dataframe
df
看起来像这样:
Date a b
01/20/2015 20 50
05/13/2015 60 70
10/18/2015 22 45
04/22/2016 15 40
04/25/2016 20 30
06/28/2016 33 45
01/01/2018 90 20
04/25/2018 50 30
10/19/2018 45 55
And I want my output like this我想要我的 output 像这样
Date min.a max.a min.b max.b
01/20/2015 20
05/13/2015 70
10/18/2015 45
05/13/2015 70
and similarly for other years.其他年份也是如此。
I was using the following code but I was not able to extract the date of each year.我正在使用以下代码,但无法提取每年的日期。 I have extracted the year from the date column.
我从日期列中提取了年份。
df$year<-format(df$date,"%y")
df%>%
group_by(a,b)%>%summarize(min(a),max(a),min(b),max(b))
but I do not get my desired output.但我没有得到我想要的 output。 I want min-max value for each year with the date.
我想要日期的每一年的最小值-最大值。
The following code works and does everything for the maximum, I think it should be easy to adapt to the minimum (just repeat the code accordingly).以下代码可以最大程度地工作并完成所有工作,我认为应该很容易适应最小值(只需相应地重复代码)。
library(dplyr)
df %>%
group_by(year) %>%
mutate(max.a = max(a), max.b = max(b)) %>%
ungroup() %>%
mutate(max.a = case_when(a == max.a ~ max.a, TRUE ~ NA_real_), max.b = case_when(b == max.b ~ max.b, TRUE ~ NA_real_)) %>%
filter(!is.na(max.a) | !is.na(max.b)) %>%
select(-a,-b)
This should do the trick;这应该可以解决问题; I've updated the code to provide a presentation exactly as you have indicated in the question.
我已经更新了代码以提供与您在问题中指出的完全相同的演示文稿。
library(tibble)
library(lubridate)
library(tidyr)
library(dplyr)
library(stringr)
data <-
tribble(
~Date, ~a, ~b,
"01/20/2015", 20, 50,
"05/13/2015", 60, 70,
"10/18/2015", 22, 45,
"04/22/2016", 15, 40,
"04/25/2016", 20, 30,
"06/28/2016", 33, 45,
"01/01/2018", 90, 20,
"04/25/2018", 50, 30,
"10/19/2018" ,45, 55)
anal <-
data %>%
#here's how to manate the date bit
mutate(Date = mdy(Date),
yr = year(Date)) %>%
# then as Fnguyen's answer
group_by(yr) %>%
mutate(min_a = min(a),
max_a = max(a),
min_b = min(b),
max_b = max(b))%>%
ungroup() %>%
mutate(min_a = case_when(a == min_a ~ min_a,
TRUE ~ NA_real_),
max_a = case_when(a == max_a ~ max_a,
TRUE ~ NA_real_),
min_b = case_when(b == min_b ~ min_b,
TRUE ~ NA_real_),
max_b = case_when(b == max_b ~ max_b,
TRUE ~ NA_real_))%>%
filter(!is.na(min_a) | !is.na(max_a) | !is.na(min_b) | !is.na(max_b)) %>%
select(-c(a, b)) %>%
pivot_longer(cols = min_a:max_b, names_to = "metric", values_to = "val") %>%
na.omit() %>%
mutate(metric = factor(metric, levels = c("min_a", "max_a", "min_b", "max_b"), ordered = TRUE)) %>%
arrange(yr, metric) %>%
rowid_to_column() %>%
pivot_wider(names_from = metric, values_from = val) %>%
select(-c(rowid, yr))
anal
Which gives you:这给了你:
Here is a base R solution这是一个基本的 R 解决方案
f <- function(v) {
Date <- (v[c(which.min(v$a),
which.max(v$a),
which.min(v$b),
which.max(v$b)),"Date"])
q <- setNames(data.frame(diag(c(range(v$a),range(v$b)))),c("min.a","max.a","min.b","max.b"))
cbind(Date,q)
}
dfout <- do.call(rbind,
c(make.row.names = FALSE,
lapply(split(df,format(df$Date,"%Y")),f)))
such that这样
> dfout
Date min.a max.a min.b max.b
1 2015-01-20 20 0 0 0
2 2015-05-13 0 60 0 0
3 2015-10-18 0 0 45 0
4 2015-05-13 0 0 0 70
5 2016-04-22 15 0 0 0
6 2016-06-28 0 33 0 0
7 2016-04-25 0 0 30 0
8 2016-06-28 0 0 0 45
9 2018-10-19 45 0 0 0
10 2018-01-01 0 90 0 0
11 2018-01-01 0 0 20 0
12 2018-10-19 0 0 0 55
DATA数据
df <- structure(list(Date = structure(c(16455, 16568, 16726, 16913,
16916, 16980, 17532, 17646, 17823), class = "Date"), a = c(20L,
60L, 22L, 15L, 20L, 33L, 90L, 50L, 45L), b = c(50L, 70L, 45L,
40L, 30L, 45L, 20L, 30L, 55L)), row.names = c(NA, -9L), class = "data.frame")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.