R所有可能的组合

Question

我有休闲数据框：

> my.df
          x         y
1 0.4597406 0.8439140
2 0.4579697 0.7461805
3 0.5593259 0.6646701
4 0.3607346 0.7792931
5 0.8377520 1.0445919
6 0.5597406 1.0445919

我想创建所有可能的组合

> my.df
          x         y
1 0.4597406 0.8439140
2 0.4597406 0.7461805
3 0.4597406 0.6646701
4 0.4597406 0.7792931
5 0.4597406 1.0445919
6 0.4597406 1.0445919
7 0.4579697 0.8439140
8 0.4579697 0.7461805
9 0.4579697 0.6646701
... 
(Not all the combinations are showing here - This is to show the format that I would like to get the resulting data frame)

使用以下函数并没有真正给出确切的组合。

expand.grid(my.df)

生成所有可能组合的最佳方法是什么？

Answer 1

也许我们可以通过以下方式使用expand.grid

expand.grid(x = my.df$x, y = my.df$y)

Answer 2

我们可以只使用expand.grid

res <- expand.grid(my.df)
dim(res)
#[1] 36  2

或与data.table

library(data.table)
setDT(my.df)[,CJ(x,y)]

Answer 3

Cross Join在这种情况下很有帮助。 由于您没有提供可复制的示例。 我创建了自己的数据集。

df=data.frame(x=runif(5), y=runif(5))
xx=data.frame(df$x)
yy=data.frame(df$y)
library(sqldf)
sqldf("SELECT * FROM xx CROSS JOIN yy")

Answer 4

expand.grid（）将为您提供所有可能的组合，但不能为您提供唯一的组合。 如果需要后者，可以使用类似这样的函数

unique_comb <- function(data){
   x.cur <- unique(data$x)
   y.cur <- unique(data$y)
   n.x <- length(x.cur)
   n.y <- length(y.cur)
   matrix.com <- matrix(0,ncol=2,nrow=n.x*n.y)
   ind <- 1
   for(i in 1:n.x){
       for(j in 1:n.y){
          matrix.com[ind,] <- c(x.cur[i],y.cur[j])
         ind <- ind+1
       }
   }
   return(matrix.com)
}

或者正如JTT指出的那样，这可以与

expand.grid(unique(data$x),unique(data$y))

Answer 5

您可以通过这种方式使用合并功能

dat <- cars[1:6,1:2]
dat
  speed dist
1     4    2
2     4   10
3     7    4
4     7   22
5     8   16
6     9   10

merge(dat$speed,dat$dist,by=NULL)
   x  y
1  4  2
2  4  2
3  7  2
4  7  2
5  8  2
6  9  2
7  4 10
8  4 10
9  7 10
10 7 10
11 8 10
12 9 10
13 4  4
14 4  4
15 7  4
16 7  4
17 8  4
18 9  4
19 4 22
20 4 22
21 7 22
22 7 22
23 8 22
24 9 22
25 4 16
26 4 16
27 7 16
28 7 16
29 8 16
30 9 16
31 4 10
32 4 10
33 7 10
34 7 10
35 8 10
36 9 10

Answer 6

我知道每个人都向您抛出expand.grid() ，所以这是另一个选择...

my.df <- structure(list(x = c(0.4597406, 0.4579697, 0.5593259, 0.3607346, 0.837752, 0.5597406), 
                        y = c(0.843914, 0.7461805, 0.6646701, 0.7792931, 1.0445919, 1.0445919)), 
                   .Names = c("x", "y"), row.names = c(NA, -6L), class = "data.frame")

my.df
#>           x         y
#> 1 0.4597406 0.8439140
#> 2 0.4579697 0.7461805
#> 3 0.5593259 0.6646701
#> 4 0.3607346 0.7792931
#> 5 0.8377520 1.0445919
#> 6 0.5597406 1.0445919

tidyr具有complete()函数，可以“完成”您的数据组合，我相信您所追求的是。

tidyr::complete(my.df, x, y)
#> # A tibble: 30 x 2
#>            x         y
#>        <dbl>     <dbl>
#> 1  0.3607346 0.6646701
#> 2  0.3607346 0.7461805
#> 3  0.3607346 0.7792931
#> 4  0.3607346 0.8439140
#> 5  0.3607346 1.0445919
#> 6  0.4579697 0.6646701
#> 7  0.4579697 0.7461805
#> 8  0.4579697 0.7792931
#> 9  0.4579697 0.8439140
#> 10 0.4579697 1.0445919
#> # ... with 20 more rows

注意：这将产生唯一的组合-您的预期输出第5和第6行是相同的。

R所有可能的组合

问题描述

6 个解决方案

解决方案1
2 2017-01-24 06:04:32

解决方案2
2 2017-01-24 06:04:51

解决方案3
2 2017-01-24 06:06:17

解决方案4
2 2017-01-24 06:58:19

解决方案5
0 2017-01-24 06:07:29

解决方案6
0 2017-02-06 22:57:04

R所有可能的组合

问题描述

6 个解决方案

解决方案1 2 2017-01-24 06:04:32

解决方案2 2 2017-01-24 06:04:51

解决方案3 2 2017-01-24 06:06:17

解决方案4 2 2017-01-24 06:58:19

解决方案5 0 2017-01-24 06:07:29

解决方案6 0 2017-02-06 22:57:04

解决方案1
2 2017-01-24 06:04:32

解决方案2
2 2017-01-24 06:04:51

解决方案3
2 2017-01-24 06:06:17

解决方案4
2 2017-01-24 06:58:19

解决方案5
0 2017-01-24 06:07:29

解决方案6
0 2017-02-06 22:57:04