简体   繁体   中英

How to select the window of rows which sum to a minimum value in each group and plot in R?

I want to divide a column in a df into groups and run something like rollsum() for each 36 sequential rows in a column (Online_h) inside each group and select those rows, where the sum is the minimum inside the group (compared to the sum of the rest again inside the group). It means that for each group I should get 36 rows summing as minimum compared to others.

My dataframe includes three columns of "Date", "Online_h" and "week". Column "week" is used to group the data by. Rows summing to minimum for each 36 sequential row should be calculated on the values in "Online_h".

The df looks like this: Tha dataframe

My current code looks like this:

df %>%
  group_by(week) %>%
  mutate(df$SumsofOnline <- rollapply(Online_h, width = 36, sum)) %>%
  select(min(SumsofOnline))

This code groups the data correctly based on labels in "week", but fails to get rows after rollapply. I think that the reason is because rollapply gives only numbers which is already the sum of the calculation, but I need to get 36 rows in Online_h with condition that the sum will be minimum in each group (group_by(week)).

After the values are received, I need to plot a barplot of facets for each group and highlight those Dates, where the 36 sequential values in Online_h sum to be the minimum compared to other sequential sums. For the plot, I have been using this code, but it is unfinished because the selection is not done correctly.

df%>%
  ggplot(aes(x = Date, y = Online_h)) +
      geom_bar(stat = "identity") +
      facet_grid(rows = vars(week) )

For highlighting I think of using, gghighlight().

Your help is highly appreciated.

I believe that the following solves the question's problem.

library(dplyr)

window <- 10  # test value
df %>%
  group_by(week) %>%
  mutate(Sums = zoo::rollapplyr(Online_h, width = window, sum, fill = NA)) %>%
  filter(Sums == min(Sums, na.rm = TRUE))
## A tibble: 3 x 3
## Groups:   week [3]
#  Online_h  week  Sums
#     <int> <int> <int>
#1       13    50   162
#2        6    51   184
#3       12    52   158

Test data

set.seed(2021)
week = rep(50:52, sample(150:152))
n <- length(week)
df <- data.frame(
  Online_h = sample(50, n, TRUE),
  week
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM