简体   繁体   中英

Combine and average each 15 data.frames from a list

My dataset is a list with 1000 elements of type data.frame ("sportdata"). Each data.frame element of the list represents one minute of data and has exactly the same number & names of columns and each data.frame has a maximum of 45 ID's (ie 45 rows, but in some minutes one or more ID's is missing, so it could be eg 35 rows). I want to combine and average the complete data set per 15 data.frames, add this in one data.frame and transpose the data.frame so that I've the ID's as columns and the average SpeedKph per 15min as rows.

My list of data.frames looks like this:

head(sportdata)
        [[1]]
                ID  Distance SpeedKph
         1:     1     2247       73
         2:     2     2247       73
         3:     3     1970       73
         4:     4     1964       74 
         5:     5     1971       73 
        [[2]]
                ID  Distance SpeedKph
         1:     1     2247       73
         2:     2     2247       75
         3:     3     1970       73
         4:     4     1964       74 
         5:     5     1971       73 
        [[3]]
                ID  Distance SpeedKph
         1:     1     2247       73
         2:     2     2247       80
         3:     3     1970       73
         4:     4     1964       74 
         5:     5     1971       56 

I have the code below to combine and average all the data.frames from my list, but I haven't found a way to combine and average the list per 15 elements (ie 15 minutes) and add this in one data.frame.

dfTotal <- rbindlist(sportdata)[,lapply(.SD,mean), list(ID)]    

I want my ideal output data.frame to look like:

   #ofData.Frames |   1   |  2  |  3  |...etc.
         01-15:      73     74    74
         16-30:      75     77    74
         31-45:      74     74    79
         46-60:      78     72    74
         ...etc.

Thanks in advance for your help!

UPDATE Sorry for not doing this directly, hereby my reproducible example.

my.df1 <- data.frame(ID = c(1:5),
                    Distance = c(2247,2247,1970,1964,1971),
                    SpeedKph = c(73,73,74,73,75))
my.df2 <- data.frame(ID = c(1:5),
                     Distance = c(2247,2247,1970,1964,1971),
                     SpeedKph = c(73,73,74,73,75))
my.df3 <- data.frame(ID = c(1:5),
                     Distance = c(2247,2247,1970,1964,1971),
                     SpeedKph = c(75,70,80,71,83))

my.list <- list(list1 = my.df1, list2 = my.df2, list3 = my.df3) 

A possible solution with data.table (which you are already using):

DT <- rbindlist(my.list, idcol = 'id')

DT[, grp := (id - 1) %/% 3
   ][, c(frames = toString(id), lapply(.SD, mean)), by = .(grp, ID), .SDcols = 3:4
     ][, dcast(.SD, frames ~ ID, value.var = c('Distance','SpeedKph'))]

which gives:

  frames Distance_1 Distance_2 Distance_3 Distance_4 Distance_5 SpeedKph_1 SpeedKph_2 SpeedKph_3 SpeedKph_4 SpeedKph_5 1: 1, 2, 3 2247 2247.000 1970.000 1964.000 1971 73.66667 72.00000 76.00000 72.33333 77.66667 2: 4, 5, 6 2229 2410.333 1962.667 1964.333 1966 74.66667 73.66667 77.33333 72.33333 77.66667 

Extended example data:

my.df1 <- data.frame(ID = c(1:5), Distance = c(2247,2247,1970,1964,1971), SpeedKph = c(73,73,74,73,75))
my.df2 <- data.frame(ID = c(1:5), Distance = c(2247,2247,1970,1964,1971), SpeedKph = c(73,73,74,73,75))
my.df3 <- data.frame(ID = c(1:5), Distance = c(2247,2247,1970,1964,1971), SpeedKph = c(75,70,80,71,83))
my.df4 <- data.frame(ID = c(1:5), Distance = c(2247,2137,1948,1965,1971), SpeedKph = c(73,78,74,73,71))
my.df5 <- data.frame(ID = c(1:5), Distance = c(2223,2247,1970,1964,1971), SpeedKph = c(76,73,74,73,79))
my.df6 <- data.frame(ID = c(1:5), Distance = c(2217,2847,1970,1964,1956), SpeedKph = c(75,70,84,71,83))

my.list <- list(my.df1, my.df2, my.df3, my.df4, my.df5, my.df6) 


In response the the comment:

# create some extra example data
my.df4a <- my.df4[-4,]
my.df5a <- my.df5[-c(4,5),]
my.df6a <- my.df6[-c(3,4),]
my.df7 <- my.df4[-c(4:6),]
my.df8 <- my.df5[-c(4:6),]
my.df9 <- my.df6[-c(4:6),]

# make another list of 9 dataframes
my.list2 <- list(my.df1, my.df2, my.df3, my.df4a, my.df5a, my.df6a, my.df7, my.df8, my.df9) 

# bind that list together in one data.table
DT2 <- rbindlist(my.list2, idcol = 'dfid')

# do an 'expand join' with 'CJ' and add the original transformation
DT2[CJ(dfid = dfid, ID = ID, unique = TRUE), on = .(dfid, ID)
    ][, grp := (dfid - 1) %/% 3
      ][, c(frames = toString(dfid), lapply(.SD, mean, na.rm = TRUE)), by = .(grp, ID), .SDcols = 3:4
        ][, dcast(.SD, frames ~ ID, value.var = c('Distance','SpeedKph'))]

this gives:

  frames Distance_1 Distance_2 Distance_3 Distance_4 Distance_5 SpeedKph_1 SpeedKph_2 SpeedKph_3 SpeedKph_4 SpeedKph_5 1: 1, 2, 3 2247 2247.000 1970.000 1964 1971.0 73.66667 72.00000 76.00000 72.33333 77.66667 2: 4, 5, 6 2229 2410.333 1959.000 NaN 1963.5 74.66667 73.66667 74.00000 NaN 77.00000 3: 7, 8, 9 2229 2410.333 1962.667 NaN NaN 74.66667 73.66667 77.33333 NaN NaN 


With regard to row order:

my.df10 <- my.df4
my.df11 <- my.df5
my.df12 <- my.df6

my.list3 <- list(my.df1, my.df2, my.df3, my.df4a, my.df5a, my.df6a, my.df7, my.df8, my.df9, my.df10, my.df11, my.df12) 

DT3 <- rbindlist(my.list3, idcol = 'dfid')

DT3[CJ(dfid = dfid, ID = ID, unique = TRUE), on = .(dfid, ID)
    ][, grp := (dfid - 1) %/% 3
      ][, c(frames = toString(dfid), lapply(.SD, mean, na.rm = TRUE)), by = .(grp, ID), .SDcols = 3:4
        ][, dcast(.SD, grp + frames ~ ID, value.var = c('Distance','SpeedKph'))]

this gives:

  grp frames Distance_1 Distance_2 Distance_3 Distance_4 Distance_5 SpeedKph_1 SpeedKph_2 SpeedKph_3 SpeedKph_4 SpeedKph_5 1: 0 1, 2, 3 2247 2247.000 1970.000 1964.000 1971.0 73.66667 72.00000 76.00000 72.33333 77.66667 2: 1 4, 5, 6 2229 2410.333 1959.000 NaN 1963.5 74.66667 73.66667 74.00000 NaN 77.00000 3: 2 7, 8, 9 2229 2410.333 1962.667 NaN NaN 74.66667 73.66667 77.33333 NaN NaN 4: 3 10, 11, 12 2229 2410.333 1962.667 1964.333 1966.0 74.66667 73.66667 77.33333 72.33333 77.66667 

Once you have your full dataset, try the following:

cut dataframe by 15s

First add a column of 1:nrow(df) , we'll use 1:1000 for this example.

require(tidyverse)    

DF <- data.frame(mean_speed = sample(40:100, 1000, replace = TRUE))

DF2 <- DF %>%
   mutate(index = 1:nrow(.),
   group = cut(index, c(seq(0, nrow(.), 15), nrow(.)))) %>%
   group_by(group) %>%
   mutate(row_num = row_number()) %>%
   select(-index) %>%
   spread(row_num, mean_speed)

We end up cutting rows into a sequence broken out by 15s. Then we group it by that and set up the row number. This will put 1:15 for each group. Then we want to deselect everything but the group, and the mean. Lastly we spread to move the format to wide.

EDIT: given your updated info. I would try the following:

DF2 <- dfTotal %>%
  mutate(group = cut(ID, c(seq(0, nrow(.), 15), nrow(.)))) %>%
  group_by(group) %>%
  select(-Distance) %>%
  spread(ID, SpeedKph)

The one thing I'm not sure about is if ID is 1:1000 in your larger dataframe, or if its 1:15. If you could provide your dataset with 50 rows, that would help. If ID is 1:15, you should be able to use the code above. If it's 1:1000, then you would need to add the mutate(row_num = row_number())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM