[英]dplyr: Split data_frame into two randomly
How can I split a data_frame
randomly into two without creating an index? 如何在不创建索引的情况下将data_frame
随机分为两部分? sample_n
works for me to get one part of it, but how can I collect the other part? sample_n
可以帮助我获得其中一部分,但是如何收集另一部分呢?
You can do an anti_join
with the extracted part as y-dataframe and the original as x-dataframe. 你可以做一个anti_join
与提取的部分为y非数据帧和原为x非数据帧。 A small example: 一个小例子:
library(dplyr)
df <- data_frame(x=1:20,y=runif(20))
dfy <- df %>% sample_n(10, replace=FALSE)
dfx <- anti_join(df, dfy, by="x")
this results in the following dataframes: 这导致以下数据帧:
> df
Source: local data frame [20 x 2]
x y
1 1 0.64147504
2 2 0.35766839
3 3 0.44875782
4 4 0.01905876
5 5 0.85655599
6 6 0.88191481
7 7 0.46532067
8 8 0.09831802
9 9 0.31158184
10 10 0.39504048
11 11 0.81358862
12 12 0.41702158
13 13 0.80441008
14 14 0.69928890
15 15 0.19040897
16 16 0.94120853
17 17 0.65289448
18 18 0.46844427
19 19 0.63177479
20 20 0.58288923
the one half: 一半:
> dfx
Source: local data frame [10 x 2]
x y
1 19 0.6317748
2 17 0.6528945
3 16 0.9412085
4 15 0.1904090
5 14 0.6992889
6 11 0.8135886
7 7 0.4653207
8 6 0.8819148
9 5 0.8565560
10 3 0.4487578
the other half: 另一半:
> dfy
Source: local data frame [10 x 2]
x y
1 18 0.46844427
2 8 0.09831802
3 12 0.41702158
4 4 0.01905876
5 2 0.35766839
6 10 0.39504048
7 13 0.80441008
8 9 0.31158184
9 1 0.64147504
10 20 0.58288923
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.