在R中创建3行的随机组

Question

I'm trying to create as many random groups from a dataset as possible. 我正在尝试从数据集中创建尽可能多的随机组。 My data is kind of complicated to explain so I'll use iris for my example. 我的数据很难解释，因此我以iris为例。

In iris , the Species variable contains 3 unique values: setosa , versicolor , and virginica . 在iris ， Species变量包含3个唯一值： setosa ， versicolor和virginica 。

I want to randomize and group the dataset into groups of 3 rows, with each group containing unique Species only. 我想将数据集随机分为3行，每组仅包含唯一的Species。 (eg. 1 of each Species) （例如，每个物种1个）
Each group must have a cumsum(Sepal.Width >= 10) 每个组必须有一个cumsum(Sepal.Width >= 10)
Create a new ID that identifies each group. 创建一个新ID来标识每个组。

So far I've tried using the dplyr function group_by() and sample_n() . 到目前为止，我已经尝试使用dplyr函数group_by()和sample_n() 。 Also split() and sample() , but can't seem to get the desired result. 还要split()和sample() ，但似乎无法获得所需的结果。

Using split() I think might be the wrong way to do it. 我认为使用split()可能是错误的方法。 I was trying to make it work along these lines with no luck. 我一直在努力让它沿这条路线运转，但是没有运气。

split(unique(iris), sample(1:nrow(iris) %/% 3))

Answer 1

Try something like this: 尝试这样的事情：

#the sample
N=dim(iris)[1]
n=50 #sample size
set.seed(123)
si=iris[sample(N,n),c("Species","Sepal.Width")]

#the "cumsum"
lim=2.8 #for the conditional sum
Sepal.Width=sapply(split(si,si$Species),function(x)
   sum(x$Sepal.Width >= lim))
sol=data.frame(Species=names(Sepal.Width),Sepal.Width)
sol$ID=1:length(sol[,1])
sol
#               Species Sepal.Width ID
# setosa         setosa          18  1
# versicolor versicolor           8  2
# virginica   virginica          14  3

Answer 2

I think I understood the problem. 我想我明白这个问题。 Here's how you could do it using dplyr 这是使用dplyr的方法

First, load some packages and add a unique ID for each row in the iris data.frame. 首先，加载一些程序包，并为iris data.frame中的每一行添加唯一的ID。

library(dplyr)
library(tidyr)
iris = iris %>% mutate(Row.ID=1:n())

Then, let's split the Row.IDs according to species, and get a data.frame with all possible combinations of one row from each species 然后，让我们根据种类拆分Row.ID，并获得一个data.frame，其中包含每个种类的一行的所有可能组合

iris_split = split(iris$Row.ID, iris$Species)
combinations = do.call(expand.grid, iris_split)

Now, it's dplyr and tidyr time. 现在是dplyr和tidyr时间。 Let's gather those combinations in a variable called tmp , join tmp with the rest of the iris data.frame and then filter according to the criteria. 让我们将这些组合收集到一个名为tmp的变量中，将tmp与iris data.frame的其余部分合并，然后根据条件进行过滤。

tmp = combinations %>%
    mutate(Group.ID=1:n()) %>%
    gather(Var, Row.ID, -Group.ID) %>%
    select(-Var)

result = iris %>%
    inner_join(tmp) %>%
    group_by(Group.ID) %>%
    filter(sum(Sepal.Length) > 10) %>%
    arrange(Group.ID)

The result data.frame should be what you're looking for. result data.frame应该是您想要的。

在R中创建3行的随机组

问题描述

2 个解决方案

解决方案1
0 2015-10-07 03:00:45

解决方案2
0 2015-10-09 16:16:36

在R中创建3行的随机组

问题描述

2 个解决方案

解决方案1 0 2015-10-07 03:00:45

解决方案2 0 2015-10-09 16:16:36

解决方案1
0 2015-10-07 03:00:45

解决方案2
0 2015-10-09 16:16:36