简体   繁体   中英

extract rows from a list of dataframes and bind filename with qualifying rows

for an input of list, extract rows when column p.value<0.005 and output a dataframe containing the filename as column 1 and the extracted rows.

input: a list of file: dateframes A,B,C and etc.

A.
col1, col2, col3, p.value
X     X      X      0.05
X     X      X      0.001

B.
col1, col2, col3, p.value
X     X      X      0.03
X     X      X      0.01

C. 
col1, col2, col3, p.value
X     X      X      0.1
X     X      X      0.0005

output.
Name, col1, col2, col3, p.value
A      X     X     X     0.001
C      X     X     X     0.0005

files = list.files(".", pattern="\\.assoc$")
data1=lapply(files, read.table, header=FALSE, sep=",")
data2 <- lapply(data1, function(x) {i <- which(x$p.value<0.005)
if (length(i) > 0) x[i, ] else NA })

for (i in 1:length(data2)){
data2[[i]]<-cbind(data2[[i]],files[i])}
data_rbind <- do.call("rbind", data2) 
colnames(data_rbind)[c(1:5)]<-c("Name", "Col1", "Col2", "Col3", "p.value")

the problem occurs in the following line, the lengths of the list are all NA when they were supposed not to

  data2 <- lapply(data1, function(x) {i <- which(x$p.value<0.005)
  if (length(i) > 0) x[i, ] else NA })

We loop over the named list with lapply , subset the rows based on the condition on the 'p.value' column, Filter out the list elements with 0 rows, then create the 'Name' from the names of the filtered data ('tmp') in Map and rbind the list elements to create a single dataset

tmp <- Filter(nrow, lapply(data1, subset, subset = p.value < 0.005))
do.call(rbind, unname(Map(cbind,  Name = names(tmp), tmp)))

-output

#    Name col1 col2 col3 p.value
#2        A    X    X    X  0.0010
#21       C    X    X    X  0.0005

Or use map from purrr to loop over the list , filter the rows where p.value is less than 0.005, specify the .id to create a new column 'Name'. As the list are named, it picks up that name in 'Name'. The _dfr will row bind the datasets to a single data.frame

library(dplyr)
library(purrr)
map_dfr(data1, ~ .x %>% 
          filter(p.value < 0.005), .id = 'column1')

-output

#     Name col1 col2 col3 p.value
#1       A    X    X    X  0.0010
#2       C    X    X    X  0.0005

data

data1 <- list(A = structure(list(col1 = c("X", "X"), col2 = c("X", "X"
), col3 = c("X", "X"), p.value = c(0.05, 0.001)), class = "data.frame", row.names = c(NA, 
-2L)), B = structure(list(col1 = c("X", "X"), col2 = c("X", "X"
), col3 = c("X", "X"), p.value = c(0.03, 0.01)), class = "data.frame", row.names = c(NA, 
-2L)), C = structure(list(col1 = c("X", "X"), col2 = c("X", "X"
), col3 = c("X", "X"), p.value = c(0.1, 5e-04)), class = "data.frame", row.names = c(NA, 
-2L)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM