简体   繁体   English

合并列表中的每15个数据帧并求平均值

[英]Combine and average each 15 data.frames from a list

My dataset is a list with 1000 elements of type data.frame ("sportdata"). 我的数据集是一个包含1000个data.frame类型的列表(“ sportdata”)。 Each data.frame element of the list represents one minute of data and has exactly the same number & names of columns and each data.frame has a maximum of 45 ID's (ie 45 rows, but in some minutes one or more ID's is missing, so it could be eg 35 rows). 列表中的每个data.frame元素代表一分钟的数据,并且具有完全相同的列数和名称,并且每个data.frame最多具有45个ID(即45行,但在几分钟之内会丢失一个或多个ID),因此可能是35行)。 I want to combine and average the complete data set per 15 data.frames, add this in one data.frame and transpose the data.frame so that I've the ID's as columns and the average SpeedKph per 15min as rows. 我想组合和平均每15个data.frame的完整数据集,将其添加到一个data.frame中,然后转置data.frame,以便将ID作为列,将平均SpeedKph每15分钟作为行。

My list of data.frames looks like this: 我的data.frames列表如下所示:

head(sportdata)
        [[1]]
                ID  Distance SpeedKph
         1:     1     2247       73
         2:     2     2247       73
         3:     3     1970       73
         4:     4     1964       74 
         5:     5     1971       73 
        [[2]]
                ID  Distance SpeedKph
         1:     1     2247       73
         2:     2     2247       75
         3:     3     1970       73
         4:     4     1964       74 
         5:     5     1971       73 
        [[3]]
                ID  Distance SpeedKph
         1:     1     2247       73
         2:     2     2247       80
         3:     3     1970       73
         4:     4     1964       74 
         5:     5     1971       56 

I have the code below to combine and average all the data.frames from my list, but I haven't found a way to combine and average the list per 15 elements (ie 15 minutes) and add this in one data.frame. 我有下面的代码来组合和平均列表中的所有data.frame,但是我还没有找到一种方法来对列表中的每15个元素(即15分钟)进行组合和平均,并将其添加到一个data.frame中。

dfTotal <- rbindlist(sportdata)[,lapply(.SD,mean), list(ID)]    

I want my ideal output data.frame to look like: 我希望我的理想输出data.frame看起来像:

   #ofData.Frames |   1   |  2  |  3  |...etc.
         01-15:      73     74    74
         16-30:      75     77    74
         31-45:      74     74    79
         46-60:      78     72    74
         ...etc.

Thanks in advance for your help! 在此先感谢您的帮助!

UPDATE Sorry for not doing this directly, hereby my reproducible example. UPDATE很抱歉没有直接执行此操作,在此提供我可复制的示例。

my.df1 <- data.frame(ID = c(1:5),
                    Distance = c(2247,2247,1970,1964,1971),
                    SpeedKph = c(73,73,74,73,75))
my.df2 <- data.frame(ID = c(1:5),
                     Distance = c(2247,2247,1970,1964,1971),
                     SpeedKph = c(73,73,74,73,75))
my.df3 <- data.frame(ID = c(1:5),
                     Distance = c(2247,2247,1970,1964,1971),
                     SpeedKph = c(75,70,80,71,83))

my.list <- list(list1 = my.df1, list2 = my.df2, list3 = my.df3) 

A possible solution with data.table (which you are already using): data.table (您已经在使用)的可能解决方案:

DT <- rbindlist(my.list, idcol = 'id')

DT[, grp := (id - 1) %/% 3
   ][, c(frames = toString(id), lapply(.SD, mean)), by = .(grp, ID), .SDcols = 3:4
     ][, dcast(.SD, frames ~ ID, value.var = c('Distance','SpeedKph'))]

which gives: 这使:

  frames Distance_1 Distance_2 Distance_3 Distance_4 Distance_5 SpeedKph_1 SpeedKph_2 SpeedKph_3 SpeedKph_4 SpeedKph_5 1: 1, 2, 3 2247 2247.000 1970.000 1964.000 1971 73.66667 72.00000 76.00000 72.33333 77.66667 2: 4, 5, 6 2229 2410.333 1962.667 1964.333 1966 74.66667 73.66667 77.33333 72.33333 77.66667 

Extended example data: 扩展示例数据:

my.df1 <- data.frame(ID = c(1:5), Distance = c(2247,2247,1970,1964,1971), SpeedKph = c(73,73,74,73,75))
my.df2 <- data.frame(ID = c(1:5), Distance = c(2247,2247,1970,1964,1971), SpeedKph = c(73,73,74,73,75))
my.df3 <- data.frame(ID = c(1:5), Distance = c(2247,2247,1970,1964,1971), SpeedKph = c(75,70,80,71,83))
my.df4 <- data.frame(ID = c(1:5), Distance = c(2247,2137,1948,1965,1971), SpeedKph = c(73,78,74,73,71))
my.df5 <- data.frame(ID = c(1:5), Distance = c(2223,2247,1970,1964,1971), SpeedKph = c(76,73,74,73,79))
my.df6 <- data.frame(ID = c(1:5), Distance = c(2217,2847,1970,1964,1956), SpeedKph = c(75,70,84,71,83))

my.list <- list(my.df1, my.df2, my.df3, my.df4, my.df5, my.df6) 


In response the the comment: 在回应评论:

# create some extra example data
my.df4a <- my.df4[-4,]
my.df5a <- my.df5[-c(4,5),]
my.df6a <- my.df6[-c(3,4),]
my.df7 <- my.df4[-c(4:6),]
my.df8 <- my.df5[-c(4:6),]
my.df9 <- my.df6[-c(4:6),]

# make another list of 9 dataframes
my.list2 <- list(my.df1, my.df2, my.df3, my.df4a, my.df5a, my.df6a, my.df7, my.df8, my.df9) 

# bind that list together in one data.table
DT2 <- rbindlist(my.list2, idcol = 'dfid')

# do an 'expand join' with 'CJ' and add the original transformation
DT2[CJ(dfid = dfid, ID = ID, unique = TRUE), on = .(dfid, ID)
    ][, grp := (dfid - 1) %/% 3
      ][, c(frames = toString(dfid), lapply(.SD, mean, na.rm = TRUE)), by = .(grp, ID), .SDcols = 3:4
        ][, dcast(.SD, frames ~ ID, value.var = c('Distance','SpeedKph'))]

this gives: 这给出了:

  frames Distance_1 Distance_2 Distance_3 Distance_4 Distance_5 SpeedKph_1 SpeedKph_2 SpeedKph_3 SpeedKph_4 SpeedKph_5 1: 1, 2, 3 2247 2247.000 1970.000 1964 1971.0 73.66667 72.00000 76.00000 72.33333 77.66667 2: 4, 5, 6 2229 2410.333 1959.000 NaN 1963.5 74.66667 73.66667 74.00000 NaN 77.00000 3: 7, 8, 9 2229 2410.333 1962.667 NaN NaN 74.66667 73.66667 77.33333 NaN NaN 


With regard to row order: 关于行顺序:

my.df10 <- my.df4
my.df11 <- my.df5
my.df12 <- my.df6

my.list3 <- list(my.df1, my.df2, my.df3, my.df4a, my.df5a, my.df6a, my.df7, my.df8, my.df9, my.df10, my.df11, my.df12) 

DT3 <- rbindlist(my.list3, idcol = 'dfid')

DT3[CJ(dfid = dfid, ID = ID, unique = TRUE), on = .(dfid, ID)
    ][, grp := (dfid - 1) %/% 3
      ][, c(frames = toString(dfid), lapply(.SD, mean, na.rm = TRUE)), by = .(grp, ID), .SDcols = 3:4
        ][, dcast(.SD, grp + frames ~ ID, value.var = c('Distance','SpeedKph'))]

this gives: 这给出了:

  grp frames Distance_1 Distance_2 Distance_3 Distance_4 Distance_5 SpeedKph_1 SpeedKph_2 SpeedKph_3 SpeedKph_4 SpeedKph_5 1: 0 1, 2, 3 2247 2247.000 1970.000 1964.000 1971.0 73.66667 72.00000 76.00000 72.33333 77.66667 2: 1 4, 5, 6 2229 2410.333 1959.000 NaN 1963.5 74.66667 73.66667 74.00000 NaN 77.00000 3: 2 7, 8, 9 2229 2410.333 1962.667 NaN NaN 74.66667 73.66667 77.33333 NaN NaN 4: 3 10, 11, 12 2229 2410.333 1962.667 1964.333 1966.0 74.66667 73.66667 77.33333 72.33333 77.66667 

Once you have your full dataset, try the following: 拥有完整的数据集后,请尝试以下操作:

cut dataframe by 15s 将数据帧减少15秒

First add a column of 1:nrow(df) , we'll use 1:1000 for this example. 首先添加一列1:nrow(df) ,在此示例中,我们将使用1:1000

require(tidyverse)    

DF <- data.frame(mean_speed = sample(40:100, 1000, replace = TRUE))

DF2 <- DF %>%
   mutate(index = 1:nrow(.),
   group = cut(index, c(seq(0, nrow(.), 15), nrow(.)))) %>%
   group_by(group) %>%
   mutate(row_num = row_number()) %>%
   select(-index) %>%
   spread(row_num, mean_speed)

We end up cutting rows into a sequence broken out by 15s. 我们最终将行切割成15秒的序列。 Then we group it by that and set up the row number. 然后,我们将其分组并设置行号。 This will put 1:15 for each group. 每个小组将投入1:15 Then we want to deselect everything but the group, and the mean. 然后,我们要取消选择除组和均值以外的所有内容。 Lastly we spread to move the format to wide. 最后,我们进行了扩展以将格式扩展到更宽的范围。

EDIT: given your updated info. 编辑:给出您的更新信息。 I would try the following: 我会尝试以下方法:

DF2 <- dfTotal %>%
  mutate(group = cut(ID, c(seq(0, nrow(.), 15), nrow(.)))) %>%
  group_by(group) %>%
  select(-Distance) %>%
  spread(ID, SpeedKph)

The one thing I'm not sure about is if ID is 1:1000 in your larger dataframe, or if its 1:15. 我不确定的一件事是,在较大的数据框中ID是1:1000,还是ID是1:15。 If you could provide your dataset with 50 rows, that would help. 如果您可以为数据集提供50行,那将会有所帮助。 If ID is 1:15, you should be able to use the code above. 如果ID为1:15,则应该可以使用上面的代码。 If it's 1:1000, then you would need to add the mutate(row_num = row_number()) 如果是1:1000,则需要添加mutate(row_num = row_number())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM