简体   繁体   English

在R的数据框中将每两列一起重新采样

[英]resample each two columns together in a data frame in R

I have a very large data frame that contains 100 rows and 400000 columns. 我有一个非常大的数据框,其中包含100行和400000列。

To sample each column, I can simply do: 要采样每一列,我可以简单地执行以下操作:

df <- apply(df, 2, sample)

But I want every two column to be sampled together. 但是我希望每两列一起采样。 For example, if originally col1 is c(1,2,3,4,5) and col2 is also c(6,7,8,9,10) , and after resampling, col1 becomes c(1,3,2,4,5) , I want col2 to be c(6,8,7,9,10) that follows the resampling pattern of col1. 例如,如果最初col1是c(1,2,3,4,5)并且col2也是c(6,7,8,9,10) ,并且在重采样后,col1变成c(1,3,2,4,5) ,我希望col2为遵循col1重采样模式的c(6,8,7,9,10) Same thing for col3 & col4, col5 & col6, etc. col3和col4,col5和col6等相同。

I wrote a for loop to do this, which takes forever. 我编写了一个for循环来执行此操作,这需要永远的时间。 Is there a better way? 有没有更好的办法? Thanks! 谢谢!

You might try this; 您可以尝试一下; split the data frame every two columns with split.default , for each sub data frame, sample the rows and then bind them together: 使用split.default每两列拆分一次数据帧,对于每个子数据帧,对行进行采样,然后将它们绑定在一起:

df <- data.frame(col1 = 1:5, col2 = 6:10, col3 = 11:15)

index <- seq_len(nrow(df))
cbind.data.frame(
    setNames(lapply(
        split.default(df, (seq_along(df) - 1) %/% 2), 
        function(sdf) sdf[sample(index),,drop=F]), 
    NULL)
)

#  col1 col2 col3
#5    5   10   12
#4    4    9   11
#1    1    6   15
#2    2    7   14
#3    3    8   13

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM