简体   繁体   中英

How can I create a new data frame which contains the average of rows across multiple data frames in a list?

I have a set of freeway contour data for different days. I have to calculate monthly average and eventually to yearly average speed.

I called in all.csv files (5 csv files) in one directory and created a list of 5 data frames using the script as below.

# Read the speed data
temp = list.files(pattern="*.csv")   #put all csv files in the directory into list
list2env(
  lapply(setNames(temp, make.names(gsub("*.csv$", "", temp))), 
         read.csv), envir = .GlobalEnv)

# get all the data frames in the current work space into a list:
rm(cbinded_df)
df_list = lapply(ls(), get)

Results is as below. Environment

All five data frame have 40752 records and 6 fields.

> head(export_route_contours_2019.01.01_2019.01.03)
  vehicle_type              time    link_id link_name offset_mi length_mi source_id avg_speed units
1          all Jan 01 2019 00:00 1049879368 I-405 (N)     0.000     0.042 106+04417      68.1   mph
2          all Jan 01 2019 00:00 1049879363 I-405 (N)     0.042     0.044 106+04417      67.8   mph
3          all Jan 01 2019 00:00 1049879364 I-405 (N)     0.086     0.145 106+04417      67.8   mph
4          all Jan 01 2019 00:00   28428436 I-405 (N)     0.231     0.021 106+04418      71.3   mph
5          all Jan 01 2019 00:00  835598783 I-405 (N)     0.252     0.015 106+04418      71.3   mph
6          all Jan 01 2019 00:00  835598784 I-405 (N)     0.267     0.052 106+04418      71.3   mph

My question is that how can I create a merged data frame which includes 40752 records of the average of 'avg_speed' across the 5 data frames ** I could write a clunky script listing all the names of data frame. However, I have to repeat this process for 12 months and for more corridors.
All 5 data frames should be joined by either the record identifier or the combination of 'link_id' and 'length_mi.' I think there should be more efficient ways to do the job using R looping and global functions. It is challenging and I really want to learn how to do this with R script. I would greatly appreciate if anyone could give me a help on this.

You can use purrr::reduce() to iterate joins over a list, and follow this with rowMeans to get the overall mean

library(tidyverse)

byvars = "link_id"

lapply(df_list, \(df) df %>% select(all_of(c(byvars, "avg_speed")))) %>% 
  reduce(left_join, by = byvars) %>%
  mutate(overall_mean = rowMeans(.[,!colnames(.) %in% byvars]))

Adjust the byvars to something else that defines the linkage (for example byvars = c("link_id", "length_mi") ), if necessary.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM