I have a set of freeway contour data for different days. I have to calculate monthly average and eventually to yearly average speed.
I called in all.csv files (5 csv files) in one directory and created a list of 5 data frames using the script as below.
# Read the speed data
temp = list.files(pattern="*.csv") #put all csv files in the directory into list
list2env(
lapply(setNames(temp, make.names(gsub("*.csv$", "", temp))),
read.csv), envir = .GlobalEnv)
# get all the data frames in the current work space into a list:
rm(cbinded_df)
df_list = lapply(ls(), get)
Results is as below. Environment
All five data frame have 40752 records and 6 fields.
> head(export_route_contours_2019.01.01_2019.01.03)
vehicle_type time link_id link_name offset_mi length_mi source_id avg_speed units
1 all Jan 01 2019 00:00 1049879368 I-405 (N) 0.000 0.042 106+04417 68.1 mph
2 all Jan 01 2019 00:00 1049879363 I-405 (N) 0.042 0.044 106+04417 67.8 mph
3 all Jan 01 2019 00:00 1049879364 I-405 (N) 0.086 0.145 106+04417 67.8 mph
4 all Jan 01 2019 00:00 28428436 I-405 (N) 0.231 0.021 106+04418 71.3 mph
5 all Jan 01 2019 00:00 835598783 I-405 (N) 0.252 0.015 106+04418 71.3 mph
6 all Jan 01 2019 00:00 835598784 I-405 (N) 0.267 0.052 106+04418 71.3 mph
My question is that how can I create a merged data frame which includes 40752 records of the average of 'avg_speed' across the 5 data frames ** I could write a clunky script listing all the names of data frame. However, I have to repeat this process for 12 months and for more corridors.
All 5 data frames should be joined by either the record identifier or the combination of 'link_id' and 'length_mi.' I think there should be more efficient ways to do the job using R looping and global functions. It is challenging and I really want to learn how to do this with R script. I would greatly appreciate if anyone could give me a help on this.
You can use purrr::reduce()
to iterate joins over a list, and follow this with rowMeans to get the overall mean
library(tidyverse)
byvars = "link_id"
lapply(df_list, \(df) df %>% select(all_of(c(byvars, "avg_speed")))) %>%
reduce(left_join, by = byvars) %>%
mutate(overall_mean = rowMeans(.[,!colnames(.) %in% byvars]))
Adjust the byvars
to something else that defines the linkage (for example byvars = c("link_id", "length_mi")
), if necessary.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.