[英]resample each two columns together in a data frame in R
I have a very large data frame that contains 100 rows and 400000 columns. 我有一个非常大的数据框,其中包含100行和400000列。
To sample each column, I can simply do: 要采样每一列,我可以简单地执行以下操作:
df <- apply(df, 2, sample)
But I want every two column to be sampled together. 但是我希望每两列一起采样。 For example, if originally col1 is c(1,2,3,4,5)
and col2 is also c(6,7,8,9,10)
, and after resampling, col1 becomes c(1,3,2,4,5)
, I want col2 to be c(6,8,7,9,10)
that follows the resampling pattern of col1. 例如,如果最初col1是c(1,2,3,4,5)
并且col2也是c(6,7,8,9,10)
,并且在重采样后,col1变成c(1,3,2,4,5)
,我希望col2为遵循col1重采样模式的c(6,8,7,9,10)
。 Same thing for col3 & col4, col5 & col6, etc. col3和col4,col5和col6等相同。
I wrote a for loop to do this, which takes forever. 我编写了一个for循环来执行此操作,这需要永远的时间。 Is there a better way? 有没有更好的办法? Thanks! 谢谢!
You might try this; 您可以尝试一下; split the data frame every two columns with split.default
, for each sub data frame, sample the rows and then bind them together: 使用split.default
每两列拆分一次数据帧,对于每个子数据帧,对行进行采样,然后将它们绑定在一起:
df <- data.frame(col1 = 1:5, col2 = 6:10, col3 = 11:15)
index <- seq_len(nrow(df))
cbind.data.frame(
setNames(lapply(
split.default(df, (seq_along(df) - 1) %/% 2),
function(sdf) sdf[sample(index),,drop=F]),
NULL)
)
# col1 col2 col3
#5 5 10 12
#4 4 9 11
#1 1 6 15
#2 2 7 14
#3 3 8 13
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.