I have a dataset of over 300K rows and over 20 years. I'm trying to create a Load Duration Curve for every year for XX years (so # of MW used every hour of the year (8760 hours for every year or 8784 for leap year). Currently I make a new dataframe by filtering by year and then reordering by descending order of MW used (descending order for the curve) and then create another column to match the row order so that I can use that column as a placeholder for the x-axis. Seems pretty inefficient and could be difficult to update if needed (see playground for what I've been doing). I also don't want to use facet_wrap() because the graphs are too small for what is needed.
Dummy_file: Where hrxhr is the running total of hours in a given year.
YEAR | MONTH | DAY | HOUR OF DAY | MW | Month_num | Date | Date1 | hrxhr |
---|---|---|---|---|---|---|---|---|
2023 | Dec | 31 | 22 | 2416 | 12 | 2023-12-31 | 365 | 8758 |
2023 | Dec | 31 | 23 | 2412 | 12 | 2023-12-31 | 365 | 8759 |
2023 | Dec | 31 | 24 | 2400 | 12 | 2023-12-31 | 365 | 8760 |
2024 | Jan | 01 | 1 | 2271 | 12 | 2024-01-01 | 1 | 1 |
2023 | Jan | 01 | 2 | 2264 | 12 | 2024-01-01 | 1 | 2 |
### ------------ Load in source ------------ ###
dummy_file <- 'Dummydata.csv'
forecast_df <- read_csv(dummy_file)
### ---- Order df by MW (load) and YEAR ---- ###
ordered_df <- forecast_df[order(forecast_df$MW, decreasing = TRUE), ]
ordered_df <- ordered_df[order(ordered_df$YEAR, decreasing = FALSE), ]
### -------------- Playground -------------- ###
## Create a dataframe for the forecast for calendar year 2023
cy23_df <- ordered_df[ordered_df$YEAR == 2023,]
## Add placeholder column for graphing purposes (add order number)
cy23_df$placeholder <- row.names(cy23_df)
## Check df structure and change columns as needed
str(cy23_df)
# Change placeholder column from character to numeric for graphing purposes
cy23_df$placeholder <- as.numeric(cy23_df$placeholder)
# Check if changed correctly
class(cy23_df$placeholder) #YES
## Load duration curve - Interactive
LF_cy23_LDC <- plot_ly(cy23_df,
x= ~placeholder,
y= ~MW,
type= 'scatter',
mode = 'lines',
hoverinfo = 'text',
text = paste("Megawatts: ", cy23_df$MW,
"Date: ", cy23_df$MONTH, cy23_df$DAY,
"Hour: ", cy23_df$hrxhr)) %>%
layout(title = 'CY2023 Load Forecast - LDC')
# "Hour: ", orderby_MW$yrhour))
saveWidget(LF_cy23_LDC, "cy23_LDC.html")
Current Output for CY2023: Yaxis Megawatts used (MW) and Xaxis is a placeholder (placeholder) and then I just repeat the playground code for the rest of the years, but change 2023 to 2024, then 2025, etc.
Sorry if this is a long post, tmi, or not enough information. I'm fairly new to R and this community. Many thanks for your help!
Simply generalize your playground process in a user-defined method, then iterate through years with lapply
.
# USER DEFINED METHOD TO RUN A SINGLE YEAR
build_year_plot <- function(year) {
### -------------- Playground -------------- ###
## Create a dataframe for the forecast for calendar year
cy_df <- ordered_df[ordered_df$YEAR == year,]
## Add placeholder column for graphing purposes (add order number)
cy_df$placeholder <- row.names(cy_df)
## Check df structure and change columns as needed
str(cy_df)
# Change placeholder column from character to numeric for graphing purposes
cy_df$placeholder <- as.numeric(cy_df$placeholder)
# Check if changed correctly
class(cy_df$placeholder) #YES
## Load duration curve - Interactive
LF_cy_LDC <- plot_ly(
cy_df, x = ~placeholder, y = ~MW, type= 'scatter',
mode = 'lines', hoverinfo = 'text',
text = paste(
"Megawatts: ", cy_df$MW,
"Date: ", cy_df$MONTH, cy_df$DAY,
"Hour: ", cy_df$hrxhr
)
) %>% layout( # USING BASE R 4.1.0+ PIPE
title = paste0('CY', year, ' Load Forecast - LDC')
)
saveWidget(LF_cy_LDC, paste0("cy", year-2000, "_LDC.html"))
return(LF_cy_LDC)
}
# CALLER TO RUN THROUGH SEVERAL YEARS
LF_cy_plots <- lapply(2023:2025, build_year_plot)
Consider even by
(object-oriented wrapper to tapply
and roughly equivalent to split
+ lapply
) and avoid the year indexing. Notice input parameter changes below and variables used in title and filename:
# USER DEFINED METHOD TO RUN A SINGLE DATA FRAME
build_year_plot <- function(cy_df) {
### -------------- Playground -------------- ###
## Add placeholder column for graphing purposes (add order number)
cy_df$placeholder <- row.names(cy_df)
...SAME AS ABOVE...
) %>% layout(
title = paste0('CY', cy_df$YEAR[1], ' Load Forecast - LDC')
)
saveWidget(LF_cy_LDC, paste0("cy", cy_df$YEAR[1]-2000, "_LDC.html"))
return(LF_cy_LDC)
}
# CALLER TO RUN THROUGH SEVERAL YEARS
LF_cy_plots <- by(ordered_df, ordered_df$YEAR, build_year_plot)
Counterparts in tidyverse would be purrr.map
:
# METHOD RECEIVES YEAR (lapply counterpart)
LF_cy_plots <- purrr::map(2023:2025, build_year_plot)
# METHOD RECEIVES DATA FRAME (by counterpart)
LF_cy_plots <- ordered_year %>%
split(.$YEAR) %>%
purrr::map(build_year_plot)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.