简体   繁体   中英

R: for-loop solution to deleting columns from multiple data frames

My question is probably quite simple but I think my code could definitely be improved. Right now it's two for-loops but I'm sure there's a way to do what I need in a single loop, for the life of me I can't see what it is.

Having searched Stack, I found this excellent answer from Ananda where he was able to extract and keep columns within a range using lapply and for-loop methods. The structure of my data gets in the way, however, as I want to be able to pick specific columns to delete. My data structure looks like this:

1   AAAT_1  1   GROUP   ****    1   -13.70  0
2   AAAT_2  51  GROUP   ****    1   -9.21   0
3   AAAT_3  101 GROUP   ****    1   -7.60   0
4   AAAT_4  151 GROUP   ****    1   -6.28   0

It's extract from some docking software and the only columns I want to keep are 2 (eg AAAT_1) and 7 (eg -13.70). The code I've used to do it, two for-loops:

for (i in 1:length(temp)) {
  assign(temp[i], get(temp[i])[2:7])
}

....to keep the data from columns 2-7, followed by:

for (i in 1:length(temp)) {
  assign(temp[i], get(temp[i])[-2:-5])
}

....to delete the rest of the columns I didn't need, where temp[i] is just a list of data frames the loop is acting on.

So, as you can see, it's just two loops doing similar actions. Surely there's a way to be able to pick specific columns to keep/delete and do it all in one loop/lapply statement? Trying things like [2,7] in the get statement doesn't work, appears to keep only column 7 and turns each data frame into 'Values' instead. I'm not sure what's going so any insight there would be wonderful but, either way, if anyone can turn this two-loop solution into one, would be really appreciated. Definitely feel like I'm missing something really simple/obvious.

Cheers.

EDIT: Have taken into account the vectorised solutions from below to do the following instead. The names of raw imported data start with stuff like F0001, F0002, etc. hence the pattern to make the initial list .

lst <- mget(ls(pattern='^F\\d+')) 

lst <- lapply(lst, "[", TRUE, c("V2","V7") )

lst <- lapply(seq_along(lst), 
             function(i,x) {assign(paste0(temp[i]),x[[i]], envir=.GlobalEnv)},
             x=lst)

I know loops get a bad rap in R, was a natural solution to me as a CPP programmer but meh, this was far quicker. Initially, the only downside from the other example was that the assign command pasted a letter to each of the created tables in sequence 1,2,3,....,n when the list of raw imported data files weren't entirely in numerical order (ie 1,2,3,5,6,10,...etc.) so this didn't preserve that order. So I had to use a list of the files (our old friend temp ) to name them correctly. Minor thing and the code isn't much shorter than two loops but it's most certainly faster.

So, in short, the above three lines add all the imported raw data to a list, keep only the columns I need then split the list up into separate dataframes whilst preserving the correct names. Cheers for the help!

If you have a data frame, you index rows and columns with

data.frame[row, column]

So, data.frame[2,7]) will give you the value of the 2nd row in the 7th column. I guess you were looking for

temp <- temp[, c(2,7)]

or, if temp is a list of data frames

temp <- lapply(temp, function(x) x[, c(2,7)])

So, if you want to use a vector of numbers as column- or row-indices, create this vector with c(...) . If I understand your example right, you don't need any loop-command, if you use lapply .

A for loop? Maybe I'am missing something but just why do not use the solution proposed by @Daniel or a dplyr approach like this.

data
  V1     V2  V3    V4   V5 V6     V7 V8
1  1 AAAT_1   1 GROUP ****  1 -13.70  0
2  2 AAAT_2  51 GROUP ****  1  -9.21  0
3  3 AAAT_3 101 GROUP ****  1  -7.60  0
4  4 AAAT_4 151 GROUP ****  1  -6.28  0

and here the code:

library(dplyr)
data <- select(data, V2, V7)
data
      V2     V7
1 AAAT_1 -13.70
2 AAAT_2  -9.21
3 AAAT_3  -7.60
4 AAAT_4  -6.28

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM