如何对数据框列表执行功能

Question

I have a list of dataframes as follows (dput is way too big even with head=1 so I've had to do a mockup here with str(df_list)) 我有如下数据帧列表（即使head = 1，dput也太大了，所以我不得不在这里用str（df_list）做一个模型）

$ OC_AH_026C  :'data.frame':    13081 obs. of  3 variables:
  ..$ chr    : num [1:13081] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ leftPos: num [1:13081] 736092 818159 4105086 4140849 4464314 ...
  ..$ Means  : num [1:13081] 45.183 111.038 162.785 -0.712 83.473 ...
 $ OC_AH_026C.1:'data.frame':   13081 obs. of  3 variables:
  ..$ chr    : num [1:13081] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ leftPos: num [1:13081] 736092 818159 4105086 4140849 4464314 ...
  ..$ Means  : num [1:13081] 69.6 125.1 156.4 12.8 97.4 ...
 $ OC_AH_026T  :'data.frame':   13081 obs. of  3 variables:
  ..$ chr    : num [1:13081] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ leftPos: num [1:13081] 736092 818159 4105086 4140849 4464314 ...
  ..$ Means  : num [1:13081] 13 12.5 103.1 56.7 145.4 ...
 $ OC_AH_058T  :'data.frame':   13081 obs. of  3 variables:
  ..$ chr    : num [1:13081] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ leftPos: num [1:13081] 736092 818159 4105086 4140849 4464314 ...
  ..$ Means  : num [1:13081] 87.114 118.963 184.31 -0.173 171.733 ...
 $ OC_AH_084T  :'data.frame':   13081 obs. of  3 variables:
  ..$ chr    : num [1:13081] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ leftPos: num [1:13081] 736092 818159 4105086 4140849 4464314 ...
  ..$ Means  : num [1:13081] 29.111 103.142 57.476 -0.712 50.156 ...
 $ OC_AH_086T  :'data.frame':   13081 obs. of  3 variables:
  ..$ chr    : num [1:13081] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ leftPos: num [1:13081] 736092 818159 4105086 4140849 4464314 ...
  ..$ Means  : num [1:13081] 49.8 81 111.5 47 98.8 ...
 $ OC_AH_088T  :'data.frame':   13081 obs. of  3 variables:
  ..$ chr    : num [1:13081] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ leftPos: num [1:13081] 736092 818159 4105086 4140849 4464314 ...
  ..$ Means  : num [1:13081] 117 152 224 121 196 ...
 $ OC_AH_096T  :'data.frame':   13081 obs. of  3 variables:
  ..$ chr    : num [1:13081] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ leftPos: num [1:13081] 736092 818159 4105086 4140849 4464314 ...
  ..$ Means  : num [1:13081] 49.5 102.8 93.6 15.2 103.2 ...

I am trying to calculate all the significant scores for each of the third column of each dataframe (Means grouped into bins using dplyr) and if they are significantly elevated they are ascribed a 1 ,significantly depressed a -1 and neither, a zero in a new column for each dataframe. 我正在尝试计算每个数据帧的第三列（均值使用dplyr分组到bin中的均值）的所有显着得分，如果显着提高它们，则将其归为1，显着压下-1，而都不压下0每个数据框的新列。

To do the grouping I have done as follows which works fine: 为了进行分组，我做了如下工作，效果很好：

CLL <- function (col) {
col <- col %>%
  group_by(chr, binnum = (leftPos) %/% 500000) %>%
  summarise(Means = mean(Means)) %>%
  mutate(leftPos = (binnum+1) * 120000) %>%
  select(leftPos, Means)}

CML<-lapply(df_list, CLL)

I am stuck on then calculating the upper and lower limits for each Means column in each dataframe. 我坚持要计算每个数据框中每个Means列的上限和下限。 I think this is because I do not know how to reference this column because it is in a list of dataframes. 我认为这是因为我不知道如何引用此列，因为它在数据帧列表中。 For a non list dataframe I use: 对于非列表数据框，我使用：

UL = median(col2, na.rm = TRUE) + alpha*IQR(col2[1], na.rm = TRUE)
LL = median(col2, na.rm = TRUE) - alpha*IQR(col2, na.rm = TRUE)

I have tried to reference the third column of each dataframe as follows: 我试图参考每个数据框的第三列，如下所示：

tre<-lapply(CML, "[[", 3)

but of course this extracts the third column and puts it in 'tre' whereas I want to alter the dataframes in the list so that the third column has its relationship with the other two columns maintained. 但是，当然，这会提取第三列并将其放在“ tre”中，而我想更改列表中的数据框，以便保持第三列与其他两列的关系。

So..... a) How do I reference the Means column and get the upper and lower limit of each dataframe and then b) on the basis of whether the row in the Means column of each dataframe are >upper limit or 所以..... a）如何参考均值列并获取每个数据框的上限和下限，然后b）根据每个数据帧的均值列中的行是>上限还是上限

Answer 1

This is what you can do, which is similar to @Roland's answer. 这是您可以做的，与@Roland的答案类似。

Say that you have data that looks like this (a simplified version of the data you showed): 假设您的数据看起来像这样（显示的数据的简化版）：

df_list <- list(OC_AH_026C = data.frame(chr = 1, 
                                        leftPos= c(73, 81, 41, 44),
                                        Means = c(111, 111, 162, -0.7)),
                OC_AH_026C.1 = data.frame(chr = 1,
                                          leftPos = c(73, 81, 41, 44),
                                          Means = c(69, 125, 156, 12)))

You can use lapply to "loop" through the elements of the list like this, which calculates the UL and LL of an input (defaults to "leftPos"), additionally, it calculates a binary column ( res ) which indicates if the Means -value is outside of the confidence-interval: 您可以使用lapply这样遍历列表中的元素，从而计算输入的UL和LL（默认为“ leftPos”），此外，它还会计算一个二进制列（ res ），该列指示Means -值超出置信区间：

df_list2 <- lapply(df_list, function(df, alpha, col2) { 

  # perform all your calculations here
  df$LL <- median(df[, col2], na.rm = T) - alpha*IQR(df[, col2], na.rm = T)
  df$UL <- median(df[, col2], na.rm = T) + alpha*IQR(df[, col2], na.rm = T)

  # -1 if Means < LL, 
  # 1 if Means > UL
  # 0 otherwise, nest the operators 
  # if you wish to calculate more complex conditions
  df$res <- 0 + ((df$Means < df$LL)*(-1)) + ((df$Means > df$UL)*1)

  return(df)
}, alpha = 0.95, col2 = "Means")

df_list2
# $OC_AH_026C
# chr leftPos Means       LL       UL res
# 1   1      73 111.0 72.35875 149.6412   0
# 2   1      81 111.0 72.35875 149.6412   0
# 3   1      41 162.0 72.35875 149.6412   1
# 4   1      44  -0.7 72.35875 149.6412  -1
# 
# $OC_AH_026C.1
# chr leftPos Means   LL    UL res
# 1   1      73    69 22.9 171.1   0
# 2   1      81   125 22.9 171.1   0
# 3   1      41   156 22.9 171.1   0
# 4   1      44    12 22.9 171.1  -1

(I hope I got your question right of what you need, otherwise let me know and I will correct the answer). （希望我能正确地回答您的问题，否则请告诉我，我会纠正答案）。

data.table way 数据表方式

For the sake of the completeness, I incude a data.table -way, which is faster (but gets rid of the list-structure). 为了完整起见，我介绍了一个data.table ，它速度更快（但摆脱了列表结构）。 The approach looks like this: 该方法如下所示：

library(data.table)
library(magrittr) # for some piping

# combine all listed data.frames to one data.table with another column, which indicates the name
dt <- lapply(1:length(df_list), function(i) {
  nam <- names(df_list)[i]
  df <- df_list[[i]]
  tmpdt <- data.table(name = nam, df)
}) %>% rbindlist

# calculate the limits
alpha = 0.95
dt[, LL := median(Means, na.rm = T) - alpha*IQR(Means, na.rm = T), by = name]
dt[, UL := median(Means, na.rm = T) + alpha*IQR(Means, na.rm = T), by = name]

dt[, res := 0 + ((df$Means < df$LL)*(-1)) + ((df$Means > df$UL)*1)]

如何对数据框列表执行功能

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-11-17 09:46:50

data.table way 数据表方式

如何对数据框列表执行功能

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-11-17 09:46:50

data.table way 数据表方式

解决方案1
2 已采纳 2015-11-17 09:46:50