在df中的列的连续组上使用R进行因子分析

Question

I have a df with 10,000 columns (SNPs frequencies). 我有一个具有10,000列（SNP频率）的df。 I need to carry out a simulation (factor analysis) with non-repeating vectors. 我需要使用非重复向量进行仿真（因子分析）。 In order to do this, I need to carry out factor analysis on subsets of columns, divided in groups of 10. For example, cols 1:10, 11:20; 为此，我需要对分成10组的列子集进行因子分析。例如，cols 1:10，11:20; 21:30. 21:30。 Since manually specifying this would take ages, I need a simple script that does it. 由于手动指定它会花费很多时间，因此我需要一个简单的脚本来完成。 I wrote this but it does not seem to work. 我写了这个，但是似乎不起作用。 I cannot figure out how to tell R when to start and stop each iteration. 我无法弄清楚如何告诉R什么时候开始和停止每次迭代。

ind=seq(1,(ncol(df)-10),by=10)

for (i in ind) { start=i;end=i+9; rez = factanal(df,factors=1, start:end)  }

Answer 1

Just a small pointer: 只是一个小指针：

 groups <- seq(from=1, to=10000, by=10)

This may be useful for splitting up your columns into groups of 10. Then, for each element of group, you can add something like 0:9 . 这对于将列分为10组可能很有用。然后，对于组的每个元素，您可以添加0:9 。 See 看到

> 1 + 0:9
 [1]  1  2  3  4  5  6  7  8  9 10

This can be used in subsetting your dataframe. 这可以用于子集数据框。

For instance, 例如，

for(i in groups){
  your_function( dat[, i + 0:9] )
}

will execute your function with the corresponding data. 将使用相应的数据执行您的功能。 Make sure to store the output of the function appropriately. 确保适当存储函数的输出。 It may be useful to wrap it into a lapply call, as in 将其包装为一个lapply调用可能会很有用，例如

 lapply(groups, function(x) your_function(dat[, x + 0:9]))

to save the output in a list. 将输出保存在列表中。

While this may be an answer to your question, let me nevertheless add what I would do since I think this may help you more in the long run: Instead of looping over columns, I would melt the dataframe into long format, create an index indicating groups of 10 as a new variable, and then use that variable as grouping variable in combination with dplyr 's group_by() operations for grouped analysis. 虽然这可能是一个问题的答案，让我仍然增加，我会做什么，因为我认为这可以帮助你更从长远来看：不是遍历列，我会melt数据框为长格式，创建指示指数每组10个作为新变量，然后将该变量与dplyr的group_by()操作结合使用作为分组变量进行分组分析。

在df中的列的连续组上使用R进行因子分析

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-05-20 09:19:13

在df中的列的连续组上使用R进行因子分析

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-05-20 09:19:13

解决方案1
1 已采纳 2016-05-20 09:19:13