[英]How to split a dataset using R such that the sum of values in a column is roughly identical across the subsets?
I have a dataset in R as follows:我在 R 中有一个数据集,如下所示:
x <- structure(list(value = c(7.496, 11.073, 11.329, 9.282, 8.748, 12.515, 7.46, 9.189, 9.62, 5.815, 5.945,
7.778, 10.077, 15.311, 8.591, 6.048, 7.568, 6.14, 6.591, 5.376,
8.038, 7.496, 7.983, 6.591, 6.591, 7.44, 6.453, 11.589, 5.751,
8.464, 7.577, 6.014, 12.733, 7.108, 14.857, 15.503, 12.468, 13.39,
10.796, 10.923, 7.215, 13.72, 7.574, 11.77, 10.409, 7.591, 6.174,
6.748, 10.091, 9.8, 6.527, 9.251, 6.622, 13.742, 4.454, 8.331,
7.702, 7.197, 9.629, 9.76, 3.663, 19.55, 8.107, 9.637, 10.146,
9.564, 6.947, 14.45, 10.266, 5.457, 10.629, 6.275, 2.48, 4.513,
6.755, 2.885, 5.773, 2.855, 2.429, 2.955, 2.486, 3.239, 4.29,
3.043, 3.501, 3.276, 4.018, 2.727, 5.199, 2.371, 3.732, 2.533,
4.482, 3.215, 7.782, 3.435, 4.201, 3.074, 3.475, 2.923, 3.025,
4.308, 3.932, 2.923, 3.491, 2.852, 3.916), id = 1:107), row.names = c(NA,
-107L), class = "data.frame")
What I would like to do is split the dataset into two such that I have two subsets where the sum of the value
column is approximately equal.我想做的是将数据集分成两个,这样我就有两个子集,其中
value
列的总和大致相等。 Ie the sum of x$value
is 776.8
so ideally for both subsets (lets call them x1
and x2
) x1$value
and x2$value
would be as close to 776.8/2 = 388.4
as possible.即
x$value
的总和为776.8
,因此理想情况下对于两个子集(我们称它们为x1
和x2
) x1$value
和x2$value
将尽可能接近776.8/2 = 388.4
。
Is there any way that this can be done in R?有什么方法可以在 R 中完成吗? I have searched other posts on Stackoverflow but to no avail.
我在 Stackoverflow 上搜索过其他帖子,但无济于事。
Just use cumsum
to get the cumulative sum of the "value" column.只需使用
cumsum
即可获得“值”列的累积和。 Use the result with a logical comparison to the desired splitting value ( sum(x$value)/2
) to split the dataset.使用与所需拆分值 (
sum(x$value)/2
) 进行逻辑比较的结果来拆分数据集。
split(x, cumsum(x$value) <= sum(x$value)/2)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.