dplyr：将data_frame随机分为两个

Question

How can I split a data_frame randomly into two without creating an index? 如何在不创建索引的情况下将data_frame随机分为两部分？ sample_n works for me to get one part of it, but how can I collect the other part? sample_n可以帮助我获得其中一部分，但是如何收集另一部分呢？

Answer 1

You can do an anti_join with the extracted part as y-dataframe and the original as x-dataframe. 你可以做一个anti_join与提取的部分为y非数据帧和原为x非数据帧。 A small example: 一个小例子：

library(dplyr)

df <- data_frame(x=1:20,y=runif(20))
dfy <- df %>% sample_n(10, replace=FALSE)
dfx <- anti_join(df, dfy, by="x")

this results in the following dataframes: 这导致以下数据帧：

> df
Source: local data frame [20 x 2]

    x          y
1   1 0.64147504
2   2 0.35766839
3   3 0.44875782
4   4 0.01905876
5   5 0.85655599
6   6 0.88191481
7   7 0.46532067
8   8 0.09831802
9   9 0.31158184
10 10 0.39504048
11 11 0.81358862
12 12 0.41702158
13 13 0.80441008
14 14 0.69928890
15 15 0.19040897
16 16 0.94120853
17 17 0.65289448
18 18 0.46844427
19 19 0.63177479
20 20 0.58288923

the one half: 一半：

> dfx
Source: local data frame [10 x 2]

    x         y
1  19 0.6317748
2  17 0.6528945
3  16 0.9412085
4  15 0.1904090
5  14 0.6992889
6  11 0.8135886
7   7 0.4653207
8   6 0.8819148
9   5 0.8565560
10  3 0.4487578

the other half: 另一半：

> dfy
Source: local data frame [10 x 2]

    x          y
1  18 0.46844427
2   8 0.09831802
3  12 0.41702158
4   4 0.01905876
5   2 0.35766839
6  10 0.39504048
7  13 0.80441008
8   9 0.31158184
9   1 0.64147504
10 20 0.58288923

dplyr：将data_frame随机分为两个

问题描述

1 个解决方案

解决方案1
6 已采纳 2015-09-19 17:47:51

dplyr：将data_frame随机分为两个

问题描述

1 个解决方案

解决方案1 6 已采纳 2015-09-19 17:47:51

解决方案1
6 已采纳 2015-09-19 17:47:51