[英]Computing average over different columns/rows in a list of data.frames
I've a list of 140 elements of type data.frame ('my.list'). 我列出了data.frame类型的140个元素(“ my.list”)。 I would like to compute 350 averages of certain values ranges in a certain column for a certain set of rows in a certain data.frame (this is a bit cryptic);
我想为特定data.frame中的一组特定行计算特定列中特定值范围的350个平均值(这有点神秘); so, 350 different averages like:
因此,有350种不同的平均值,例如:
I have another data.frame ('my.dfAverage') which indicates for which data.frame, column and rows it needs the average. 我有另一个data.frame('my.dfAverage'),它指示需要平均的data.frame,列和行。 I want to write the 350 different averages and standard deviations to this data.frame (so with the columns: 'average_id', 'dataframe_number', 'column_name', 'row_numbers', 'average' and 'st_dev').
我想为此数据写350个不同的平均值和标准偏差(所以列如下:“ average_id”,“ dataframe_number”,“ column_name”,“ row_numbers”,“ average”和“ st_dev”)。 Some value ranges have NA's, these values can be dropped for computing the average.
某些值范围具有NA,可以删除这些值以计算平均值。
What is the best way to automatically compute the 350 averages and standard deviations from the list of data.frames based on the info in this data.frame? 根据此data.frame中的信息自动从data.frame列表中计算350个平均值和标准偏差的最佳方法是什么? I thought of creating a for-loop (or maybe the lapply function?), but I'm quite new to these functions, so I'm not sure what the way to go is here.
我曾想创建一个for循环(或者可能是lapply函数?),但是对于这些函数我还是很陌生,所以我不确定该怎么做。
Small reproducible example of my list of data.frames: 我的data.frames列表的一个小型可复制示例:
my.df1 <- data.frame(ID = c(1:5),
Measure1 = c(2247,2247,1970,1964,1971),
Measure2 = c(2247,2247,NA,1964,1971))
my.df2 <- data.frame(ID = c(1:4),
Measure3 = c(2247,NA,1970,1964),
Measure5 = c(2247,2247,NA,1964))
my.df3 <- data.frame(ID = c(1:4),
Measure6 = c(2247,600,1970,1964),
Measure8 = c(2247,2247,NA,1964))
my.list <- list(list1 = my.df1, list2 = my.df2, list3 = my.df3)
Desired output table for the averages and standard deviation: 平均值和标准偏差的所需输出表:
my.dfAverage <- data.frame(average_id = c(1:3),
dataframe_number = c(1,2,3),
column_name = c('Measure1','Measure3','Measure6'),
row_numbers = c('1:3','1:4','1:2'),
average = (NA),
st_dev = (NA))
A solution using tidyverse . 使用tidyverse的解决方案。
First, expand the my.dfAverage
based on row_numbers
. 首先,根据
row_numbers
展开my.dfAverage
。
library(tidyverse)
my.dfAverage2 <- my.dfAverage %>%
separate(row_numbers, into = c("start", "end")) %>%
mutate(row_numbers = map2(start, end, `:`)) %>%
unnest() %>%
select(-start, -end) %>%
mutate(row_numbers = as.integer(row_numbers),
dataframe_number = as.integer(dataframe_number))
Second, transform all data frames in my.list
and combine them to a single data frame. 其次,转换
my.list
所有数据框并将它们组合为一个数据框。
my.list.df <- my.list %>%
setNames(1:length(.)) %>%
map_dfr(function(x){
x2 <- x %>%
gather(column_name, value, -ID)
return(x2)
},.id = "dataframe_number") %>%
mutate(ID = as.integer(ID), dataframe_number = as.integer(dataframe_number)) %>%
rename(row_numbers = ID)
Third, merge my.dfAverage2
and my.list.df
and calculate the mean and standard deviation. 第三,合并
my.dfAverage2
和my.list.df
并计算平均值和标准偏差。 my.dfAverage3
is the final output. my.dfAverage3
是最终输出。
my.dfAverage3 <- my.dfAverage2 %>%
left_join(my.list.df, by = c("dataframe_number", "column_name", "row_numbers")) %>%
group_by(average_id, dataframe_number, column_name) %>%
summarise(row_numbers = paste(min(row_numbers), max(row_numbers), sep = ":"),
average = mean(value, na.rm = TRUE),
st_dev = sd(value, na.rm = TRUE)) %>%
ungroup()
my.dfAverage3
# A tibble: 3 x 6
# average_id dataframe_number column_name row_numbers average st_dev
# <int> <int> <chr> <chr> <dbl> <dbl>
# 1 1 1 Measure1 1:3 2155 160
# 2 2 2 Measure3 1:4 2060 162
# 3 3 3 Measure6 1:2 1424 1165
DATA 数据
my.list
is the same as OP's my.list
. my.list
与OP的my.list
相同。
my.dfAverage <- data.frame(average_id = c(1:3),
dataframe_number = c(1,2,3),
column_name = c('Measure1','Measure3','Measure6'),
row_numbers = c('1:3','1:4','1:2'))
This is a different approach than the one given above: I will use only base r
functions: Point to note, ensure the data has stringsAsFactors=FALSE
这是与上面给出的方法不同的方法:我将仅使用
base r
函数:注意,确保数据具有stringsAsFactors=FALSE
write a function but ensure you index mylist
correctly. 编写函数,但要确保正确索引
mylist
。 then compute the function on this ie f(...,na.rm=T)
. 然后在此计算函数,即
f(...,na.rm=T)
。 to write a function using apply
: 使用
apply
编写函数:
fun1=function(f){with(my.dfAverage,
mapply(function(x,y,z)
f(x[eval(parse(text=y)),z],na.rm=T),my.list,row_numbers,column_name))}
transform(my.dfAverage,average=fun1(mean),st_dev=fun1(sd))
average_id dataframe_number column_name row_numbers average st_dev
1 1 1 Measure1 1:3 2154.667 159.9260
2 2 2 Measure3 1:4 2060.333 161.6859
3 3 3 Measure6 1:2 1423.500 1164.6049
Data Used: 使用的数据:
my.dfAverage <- data.frame(average_id = c(1:3),
dataframe_number = c(1,2,3),
column_name = c('Measure1','Measure3','Measure6'),
row_numbers = c('1:3','1:4','1:2'),
average = (NA),
st_dev = (NA),stringsAsFactors = F)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.