简体   繁体   English

计算数据列表中不同列/行的平均值

[英]Computing average over different columns/rows in a list of data.frames

I've a list of 140 elements of type data.frame ('my.list'). 我列出了data.frame类型的140个元素(“ my.list”)。 I would like to compute 350 averages of certain values ranges in a certain column for a certain set of rows in a certain data.frame (this is a bit cryptic); 我想为特定data.frame中的一组特定行计算特定列中特定值范围的350个平均值(这有点神秘); so, 350 different averages like: 因此,有350种不同的平均值,例如:

  • Of data.frame #1, the average of column 'Measure1', row 1:5; 在data.frame#1中,列“ Measure1”的平均值,第1:5行;
  • Of data.frame #2, the average of column 'Measure3', row 1:4, etc. etc. 在data.frame#2中,“ Measure3”列,1:4行等的平均值。

I have another data.frame ('my.dfAverage') which indicates for which data.frame, column and rows it needs the average. 我有另一个data.frame('my.dfAverage'),它指示需要平均的data.frame,列和行。 I want to write the 350 different averages and standard deviations to this data.frame (so with the columns: 'average_id', 'dataframe_number', 'column_name', 'row_numbers', 'average' and 'st_dev'). 我想为此数据写350个不同的平均值和标准偏差(所以列如下:“ average_id”,“ dataframe_number”,“ column_name”,“ row_numbers”,“ average”和“ st_dev”)。 Some value ranges have NA's, these values can be dropped for computing the average. 某些值范围具有NA,可以删除这些值以计算平均值。

What is the best way to automatically compute the 350 averages and standard deviations from the list of data.frames based on the info in this data.frame? 根据此data.frame中的信息自动从data.frame列表中计算350个平均值和标准偏差的最佳方法是什么? I thought of creating a for-loop (or maybe the lapply function?), but I'm quite new to these functions, so I'm not sure what the way to go is here. 我曾想创建一个for循环(或者可能是lapply函数?),但是对于这些函数我还是很陌生,所以我不确定该怎么做。

Small reproducible example of my list of data.frames: 我的data.frames列表的一个小型可复制示例:

my.df1 <- data.frame(ID = c(1:5),
                    Measure1 = c(2247,2247,1970,1964,1971),
                    Measure2 = c(2247,2247,NA,1964,1971))
my.df2 <- data.frame(ID = c(1:4),
                    Measure3 = c(2247,NA,1970,1964),
                    Measure5 = c(2247,2247,NA,1964))
my.df3 <- data.frame(ID = c(1:4),
                    Measure6 = c(2247,600,1970,1964),
                    Measure8 = c(2247,2247,NA,1964))

my.list <- list(list1 = my.df1, list2 = my.df2, list3 = my.df3)

Desired output table for the averages and standard deviation: 平均值和标准偏差的所需输出表:

my.dfAverage <- data.frame(average_id = c(1:3),
                           dataframe_number = c(1,2,3),
                           column_name = c('Measure1','Measure3','Measure6'),
                           row_numbers = c('1:3','1:4','1:2'),
                           average = (NA),
                           st_dev = (NA))

A solution using . 使用的解决方案。

First, expand the my.dfAverage based on row_numbers . 首先,根据row_numbers展开my.dfAverage

library(tidyverse)

my.dfAverage2 <- my.dfAverage %>%
  separate(row_numbers, into = c("start", "end")) %>%
  mutate(row_numbers = map2(start, end, `:`)) %>%
  unnest() %>%
  select(-start, -end) %>%
  mutate(row_numbers = as.integer(row_numbers),
         dataframe_number = as.integer(dataframe_number))

Second, transform all data frames in my.list and combine them to a single data frame. 其次,转换my.list所有数据框并将它们组合为一个数据框。

my.list.df <- my.list %>%
  setNames(1:length(.)) %>%
  map_dfr(function(x){
  x2 <- x %>%
    gather(column_name, value, -ID)
  return(x2)
},.id = "dataframe_number") %>%
  mutate(ID = as.integer(ID), dataframe_number = as.integer(dataframe_number)) %>%
  rename(row_numbers = ID)

Third, merge my.dfAverage2 and my.list.df and calculate the mean and standard deviation. 第三,合并my.dfAverage2my.list.df并计算平均值和标准偏差。 my.dfAverage3 is the final output. my.dfAverage3是最终输出。

my.dfAverage3 <- my.dfAverage2 %>%
  left_join(my.list.df, by = c("dataframe_number", "column_name", "row_numbers")) %>%
  group_by(average_id, dataframe_number, column_name) %>%
  summarise(row_numbers = paste(min(row_numbers), max(row_numbers), sep = ":"),
            average = mean(value, na.rm = TRUE),
            st_dev = sd(value, na.rm = TRUE)) %>%
  ungroup()
my.dfAverage3
# A tibble: 3 x 6
#   average_id dataframe_number column_name row_numbers average st_dev
#        <int>            <int> <chr>       <chr>         <dbl>  <dbl>
# 1          1                1 Measure1    1:3            2155    160
# 2          2                2 Measure3    1:4            2060    162
# 3          3                3 Measure6    1:2            1424   1165

DATA 数据

my.list is the same as OP's my.list . my.list与OP的my.list相同。

my.dfAverage <- data.frame(average_id = c(1:3),
                           dataframe_number = c(1,2,3),
                           column_name = c('Measure1','Measure3','Measure6'),
                           row_numbers = c('1:3','1:4','1:2'))

This is a different approach than the one given above: I will use only base r functions: Point to note, ensure the data has stringsAsFactors=FALSE 这是与上面给出的方法不同的方法:我将仅使用base r函数:注意,确保数据具有stringsAsFactors=FALSE

write a function but ensure you index mylist correctly. 编写函数,但要确保正确索引mylist then compute the function on this ie f(...,na.rm=T) . 然后在此计算函数,即f(...,na.rm=T) to write a function using apply : 使用apply编写函数:

  fun1=function(f){with(my.dfAverage,
   mapply(function(x,y,z)
   f(x[eval(parse(text=y)),z],na.rm=T),my.list,row_numbers,column_name))}

 transform(my.dfAverage,average=fun1(mean),st_dev=fun1(sd))

  average_id dataframe_number column_name row_numbers  average    st_dev
1          1                1    Measure1         1:3 2154.667  159.9260
2          2                2    Measure3         1:4 2060.333  161.6859
3          3                3    Measure6         1:2 1423.500 1164.6049

Data Used: 使用的数据:

my.dfAverage <- data.frame(average_id = c(1:3),
                           dataframe_number = c(1,2,3),
                           column_name = c('Measure1','Measure3','Measure6'),
                           row_numbers = c('1:3','1:4','1:2'),
                           average = (NA),
                           st_dev = (NA),stringsAsFactors = F)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM