简体   繁体   English

如何使用 R 拆分数据集,以使列中值的总和在子集中大致相同?

[英]How to split a dataset using R such that the sum of values in a column is roughly identical across the subsets?

I have a dataset in R as follows:我在 R 中有一个数据集,如下所示:

x <- structure(list(value = c(7.496, 11.073, 11.329, 9.282, 8.748, 12.515, 7.46, 9.189, 9.62, 5.815, 5.945, 
                                                        7.778, 10.077, 15.311, 8.591, 6.048, 7.568, 6.14, 6.591, 5.376, 
                                                        8.038, 7.496, 7.983, 6.591, 6.591, 7.44, 6.453, 11.589, 5.751, 
                                                        8.464, 7.577, 6.014, 12.733, 7.108, 14.857, 15.503, 12.468, 13.39, 
                                                        10.796, 10.923, 7.215, 13.72, 7.574, 11.77, 10.409, 7.591, 6.174, 
                                                        6.748, 10.091, 9.8, 6.527, 9.251, 6.622, 13.742, 4.454, 8.331, 
                                                        7.702, 7.197, 9.629, 9.76, 3.663, 19.55, 8.107, 9.637, 10.146, 
                                                        9.564, 6.947, 14.45, 10.266, 5.457, 10.629, 6.275, 2.48, 4.513, 
                                                        6.755, 2.885, 5.773, 2.855, 2.429, 2.955, 2.486, 3.239, 4.29, 
                                                        3.043, 3.501, 3.276, 4.018, 2.727, 5.199, 2.371, 3.732, 2.533, 
                                                        4.482, 3.215, 7.782, 3.435, 4.201, 3.074, 3.475, 2.923, 3.025, 
                                                        4.308, 3.932, 2.923, 3.491, 2.852, 3.916), id = 1:107), row.names = c(NA, 
                                                                                                                              -107L), class = "data.frame")

What I would like to do is split the dataset into two such that I have two subsets where the sum of the value column is approximately equal.我想做的是将数据集分成两个,这样我就有两个子集,其中value列的总和大致相等。 Ie the sum of x$value is 776.8 so ideally for both subsets (lets call them x1 and x2 ) x1$value and x2$value would be as close to 776.8/2 = 388.4 as possible.x$value的总和为776.8 ,因此理想情况下对于两个子集(我们称它们为x1x2x1$valuex2$value将尽可能接近776.8/2 = 388.4

Is there any way that this can be done in R?有什么方法可以在 R 中完成吗? I have searched other posts on Stackoverflow but to no avail.我在 Stackoverflow 上搜索过其他帖子,但无济于事。

Just use cumsum to get the cumulative sum of the "value" column.只需使用cumsum即可获得“值”列的累积和。 Use the result with a logical comparison to the desired splitting value ( sum(x$value)/2 ) to split the dataset.使用与所需拆分值 ( sum(x$value)/2 ) 进行逻辑比较的结果来拆分数据集。

split(x, cumsum(x$value) <= sum(x$value)/2)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM