My question is probably quite simple but I think my code could definitely be improved. Right now it's two for-loops but I'm sure there's a way to do what I need in a single loop, for the life of me I can't see what it is.
Having searched Stack, I found this excellent answer from Ananda where he was able to extract and keep columns within a range using lapply and for-loop methods. The structure of my data gets in the way, however, as I want to be able to pick specific columns to delete. My data structure looks like this:
1 AAAT_1 1 GROUP **** 1 -13.70 0
2 AAAT_2 51 GROUP **** 1 -9.21 0
3 AAAT_3 101 GROUP **** 1 -7.60 0
4 AAAT_4 151 GROUP **** 1 -6.28 0
It's extract from some docking software and the only columns I want to keep are 2 (eg AAAT_1) and 7 (eg -13.70). The code I've used to do it, two for-loops:
for (i in 1:length(temp)) {
assign(temp[i], get(temp[i])[2:7])
}
....to keep the data from columns 2-7, followed by:
for (i in 1:length(temp)) {
assign(temp[i], get(temp[i])[-2:-5])
}
....to delete the rest of the columns I didn't need, where temp[i] is just a list of data frames the loop is acting on.
So, as you can see, it's just two loops doing similar actions. Surely there's a way to be able to pick specific columns to keep/delete and do it all in one loop/lapply statement? Trying things like [2,7]
in the get
statement doesn't work, appears to keep only column 7 and turns each data frame into 'Values' instead. I'm not sure what's going so any insight there would be wonderful but, either way, if anyone can turn this two-loop solution into one, would be really appreciated. Definitely feel like I'm missing something really simple/obvious.
Cheers.
EDIT: Have taken into account the vectorised solutions from below to do the following instead. The names of raw imported data start with stuff like F0001, F0002, etc. hence the pattern to make the initial list
.
lst <- mget(ls(pattern='^F\\d+'))
lst <- lapply(lst, "[", TRUE, c("V2","V7") )
lst <- lapply(seq_along(lst),
function(i,x) {assign(paste0(temp[i]),x[[i]], envir=.GlobalEnv)},
x=lst)
I know loops get a bad rap in R, was a natural solution to me as a CPP programmer but meh, this was far quicker. Initially, the only downside from the other example was that the assign
command pasted a letter to each of the created tables in sequence 1,2,3,....,n when the list of raw imported data files weren't entirely in numerical order (ie 1,2,3,5,6,10,...etc.) so this didn't preserve that order. So I had to use a list of the files (our old friend temp
) to name them correctly. Minor thing and the code isn't much shorter than two loops but it's most certainly faster.
So, in short, the above three lines add all the imported raw data to a list, keep only the columns I need then split the list up into separate dataframes whilst preserving the correct names. Cheers for the help!
If you have a data frame, you index rows and columns with
data.frame[row, column]
So, data.frame[2,7])
will give you the value of the 2nd row in the 7th column. I guess you were looking for
temp <- temp[, c(2,7)]
or, if temp
is a list of data frames
temp <- lapply(temp, function(x) x[, c(2,7)])
So, if you want to use a vector of numbers as column- or row-indices, create this vector with c(...)
. If I understand your example right, you don't need any loop-command, if you use lapply
.
A for
loop? Maybe I'am missing something but just why do not use the solution proposed by @Daniel or a dplyr
approach like this.
data
V1 V2 V3 V4 V5 V6 V7 V8
1 1 AAAT_1 1 GROUP **** 1 -13.70 0
2 2 AAAT_2 51 GROUP **** 1 -9.21 0
3 3 AAAT_3 101 GROUP **** 1 -7.60 0
4 4 AAAT_4 151 GROUP **** 1 -6.28 0
and here the code:
library(dplyr)
data <- select(data, V2, V7)
data
V2 V7
1 AAAT_1 -13.70
2 AAAT_2 -9.21
3 AAAT_3 -7.60
4 AAAT_4 -6.28
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.