简体   繁体   中英

Factor analysis using R over sequential groups of columns in df

I have a df with 10,000 columns (SNPs frequencies). I need to carry out a simulation (factor analysis) with non-repeating vectors. In order to do this, I need to carry out factor analysis on subsets of columns, divided in groups of 10. For example, cols 1:10, 11:20; 21:30. Since manually specifying this would take ages, I need a simple script that does it. I wrote this but it does not seem to work. I cannot figure out how to tell R when to start and stop each iteration.

ind=seq(1,(ncol(df)-10),by=10)

for (i in ind) { start=i;end=i+9; rez = factanal(df,factors=1, start:end)  }

Just a small pointer:

 groups <- seq(from=1, to=10000, by=10)

This may be useful for splitting up your columns into groups of 10. Then, for each element of group, you can add something like 0:9 . See

> 1 + 0:9
 [1]  1  2  3  4  5  6  7  8  9 10

This can be used in subsetting your dataframe.

For instance,

for(i in groups){
  your_function( dat[, i + 0:9] )
}

will execute your function with the corresponding data. Make sure to store the output of the function appropriately. It may be useful to wrap it into a lapply call, as in

 lapply(groups, function(x) your_function(dat[, x + 0:9]))

to save the output in a list.

While this may be an answer to your question, let me nevertheless add what I would do since I think this may help you more in the long run: Instead of looping over columns, I would melt the dataframe into long format, create an index indicating groups of 10 as a new variable, and then use that variable as grouping variable in combination with dplyr 's group_by() operations for grouped analysis.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM