在R的数据框中将每两列一起重新采样

Question

I have a very large data frame that contains 100 rows and 400000 columns. 我有一个非常大的数据框，其中包含100行和400000列。

To sample each column, I can simply do: 要采样每一列，我可以简单地执行以下操作：

df <- apply(df, 2, sample)

But I want every two column to be sampled together. 但是我希望每两列一起采样。 For example, if originally col1 is c(1,2,3,4,5) and col2 is also c(6,7,8,9,10) , and after resampling, col1 becomes c(1,3,2,4,5) , I want col2 to be c(6,8,7,9,10) that follows the resampling pattern of col1. 例如，如果最初col1是c(1,2,3,4,5)并且col2也是c(6,7,8,9,10) ，并且在重采样后，col1变成c(1,3,2,4,5) ，我希望col2为遵循col1重采样模式的c(6,8,7,9,10) 。 Same thing for col3 & col4, col5 & col6, etc. col3和col4，col5和col6等相同。

I wrote a for loop to do this, which takes forever. 我编写了一个for循环来执行此操作，这需要永远的时间。 Is there a better way? 有没有更好的办法？ Thanks! 谢谢！

Answer 1

You might try this; 您可以尝试一下； split the data frame every two columns with split.default , for each sub data frame, sample the rows and then bind them together: 使用split.default每两列拆分一次数据帧，对于每个子数据帧，对行进行采样，然后将它们绑定在一起：

df <- data.frame(col1 = 1:5, col2 = 6:10, col3 = 11:15)

index <- seq_len(nrow(df))
cbind.data.frame(
    setNames(lapply(
        split.default(df, (seq_along(df) - 1) %/% 2), 
        function(sdf) sdf[sample(index),,drop=F]), 
    NULL)
)

#  col1 col2 col3
#5    5   10   12
#4    4    9   11
#1    1    6   15
#2    2    7   14
#3    3    8   13

在R的数据框中将每两列一起重新采样

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-05-26 03:24:59

在R的数据框中将每两列一起重新采样

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-05-26 03:24:59

解决方案1
1 已采纳 2017-05-26 03:24:59