Given an R dataframe with column A, how do I create two new columns containing all ordered combinations of A

Question

I have a data.frame with one id column (x below), and a number of variables (y1,y2 below).

    x y1 y2
1   1 43 55
2   2 51 53
[...]

What I would like to generate from this is a dataframe where the first two columns cover every ordered combination of x (except where they are equal) along with columns for each variable related to the order. The data frame header and first two rows would look like this (did this by hand, excuse errors):

xi xj y1i y1j y2i y2j
 1  2  43  51  55  53
 2  1  51  43  53  55
[...]

So each row would container a source and destination (i and j) and then values for y1 at each source and destination.

I'm slowly learning R data manipulation, but this one is stumping me. Kudos for the one line does-it-all answer, as well as a more readable didactic answer.

Answer 1

This works (apart perhaps from order)

firstdf  <- data.frame(x  = c( 1, 2, 4, 5), 
                       y1 = c(43,51,57,49), y2 = c(55,53,47,44)) 
co       <- combn(firstdf$x,2)
seconddf <- data.frame(xi = c(co[1,], co[2,]), xj = c(co[2,], co[1,]))
thirddf  <- merge(merge(seconddf, firstdf, by.x = "xj", by.y = "x" ),
                  firstdf, by.x = "xi", by.y = "x", suffixes = c("j", "i") )

to produce

> thirddf
   xi xj y1j y2j y1i y2i
1   1  2  51  53  43  55
2   1  5  49  44  43  55
3   1  4  57  47  43  55
4   2  4  57  47  51  53
5   2  1  43  55  51  53
6   2  5  49  44  51  53
7   4  5  49  44  57  47
8   4  1  43  55  57  47
9   4  2  51  53  57  47
10  5  1  43  55  49  44
11  5  2  51  53  49  44
12  5  4  57  47  49  44

where the first and fifth rows match your example.

If you take firstdf as given and insist on one line then you can turn this into

merge(merge(data.frame(xi = c(combn(firstdf$x,2)[1,], combn(firstdf$x,2)[2,]), xj = c(combn(firstdf$x,2)[2,], combn(firstdf$x,2)[1,])), firstdf, by.x = "xj", by.y = "x" ), firstdf, by.x = "xi", by.y = "x", suffixes = c("j", "i") )

but I don't really see the point

Answer 2

Two lines is the best I can do and still keep it sensible: ( Edit: see bottom of answer for one-liner.)

Create some data:

n <- 4
a <- cbind(x=LETTERS[1:n], y=letters[1:n])
a

     x   y  
[1,] "A" "a"
[2,] "B" "b"
[3,] "C" "c"
[4,] "D" "d"

The code:

f <- function(x, i){cbind(i, x[i[,1],], x[i[,2],])}
f(a, t(combn(seq_len(nrow(a)), 2)))

The results:

             x   y   x   y  
[1,] "1" "2" "A" "a" "B" "b"
[2,] "1" "3" "A" "a" "C" "c"
[3,] "1" "4" "A" "a" "D" "d"
[4,] "2" "3" "B" "b" "C" "c"
[5,] "2" "4" "B" "b" "D" "d"
[6,] "3" "4" "C" "c" "D" "d"

EDIT

This can be turned into a one-liner by making use of anonymous functions:

(function(x, i=t(combn(seq_len(nrow(a)), 2))){cbind(i, x[i[,1],], x[i[,2],])})(a)

             x   y   x   y  
[1,] "1" "2" "A" "a" "B" "b"
[2,] "1" "3" "A" "a" "C" "c"
[3,] "1" "4" "A" "a" "D" "d"
[4,] "2" "3" "B" "b" "C" "c"
[5,] "2" "4" "B" "b" "D" "d"
[6,] "3" "4" "C" "c" "D" "d"

Answer 3

I'm not sure what you exactly want in general, but as far as my understanding, this may be close to what you want:

> library(combinat) # for permn
> library(plyr) # for llply
> 
> # sample data
> d <- data.frame(x = 1:3, y1 = rnorm(3), y2 = rnorm(3))
> d
  x          y1         y2
1 1 -0.17525893 -1.1660321
2 2 -0.05585689 -0.2059244
3 3  0.90500983 -1.3067601
> 
> # permutation of rows
> idx <- permn(nrow(d))
> idx
[[1]]
[1] 1 2 3

... snip ...

[[6]]
[1] 2 1 3

> 
> # a list of perm-ed data.frame
> d2 <- llply(idx, function(i)data.frame(idx = 1:nrow(d), d[i,]))
> d2
[[1]]
  idx x          y1         y2
1   1 1 -0.17525893 -1.1660321
2   2 2 -0.05585689 -0.2059244
3   3 3  0.90500983 -1.3067601

... snip ...

[[6]]
  idx x          y1         y2
2   1 2 -0.05585689 -0.2059244
1   2 1 -0.17525893 -1.1660321
3   3 3  0.90500983 -1.3067601

> 
> # merge htam
> d3 <- subset(Reduce(function(df1, df2) merge(df1, df2, by="idx"), d2), select = -c(idx))
> d3
  x.x        y1.x       y2.x x.y        y1.y       y2.y x.x.1      y1.x.1     y2.x.1 x.y.1      y1.y.1     y2.y.1 x.x.2      y1.x.2     y2.x.2 x.y.2
1   1 -0.17525893 -1.1660321   1 -0.17525893 -1.1660321     3  0.90500983 -1.3067601     3  0.90500983 -1.3067601     2 -0.05585689 -0.2059244     2
2   2 -0.05585689 -0.2059244   3  0.90500983 -1.3067601     1 -0.17525893 -1.1660321     2 -0.05585689 -0.2059244     3  0.90500983 -1.3067601     1
3   3  0.90500983 -1.3067601   2 -0.05585689 -0.2059244     2 -0.05585689 -0.2059244     1 -0.17525893 -1.1660321     1 -0.17525893 -1.1660321     3
       y1.y.2     y2.y.2
1 -0.05585689 -0.2059244
2 -0.17525893 -1.1660321
3  0.90500983 -1.3067601
> 
> # and here is the one-liner version
> subset(Reduce(function(df1, df2) merge(df1, df2, by="idx"), llply(permn(nrow(d)), function(i)data.frame(idx=1:nrow(d), d[i,]))), select=-c(idx))
  x.x        y1.x       y2.x x.y        y1.y       y2.y x.x.1      y1.x.1     y2.x.1 x.y.1      y1.y.1     y2.y.1 x.x.2      y1.x.2     y2.x.2 x.y.2
1   1 -0.17525893 -1.1660321   1 -0.17525893 -1.1660321     3  0.90500983 -1.3067601     3  0.90500983 -1.3067601     2 -0.05585689 -0.2059244     2
2   2 -0.05585689 -0.2059244   3  0.90500983 -1.3067601     1 -0.17525893 -1.1660321     2 -0.05585689 -0.2059244     3  0.90500983 -1.3067601     1
3   3  0.90500983 -1.3067601   2 -0.05585689 -0.2059244     2 -0.05585689 -0.2059244     1 -0.17525893 -1.1660321     1 -0.17525893 -1.1660321     3
       y1.y.2     y2.y.2
1 -0.05585689 -0.2059244
2 -0.17525893 -1.1660321
3  0.90500983 -1.3067601

If you provide information in more detail, probably you can get better answers.

Answer 4

Well, it's nowhere close to a one-liner (which I kind of doubt is possible) but here's a 'naive' approach:

dat <- data.frame(x=1:5,y1=6:10,y2=11:15)

#Collect all ordered pairs of elements of x
tmp <- expand.grid(dat$x,dat$x)
tmp <- tmp[tmp[,1] != tmp[,2],]

#Init a matrix to hold the results
rs <- as.matrix(cbind(tmp,matrix(NA,nrow(tmp),4)))

#Loop through each ordered pair
for (i in 1:nrow(rs)){
    rs[i,3:6] <- c(dat$y1[rs[i,1:2]],dat$y2[rs[i,1:2]])
}

I didn't name the columns, but that's easily done after the fact.

Not very elegant, but maybe something to get you started...

Given an R dataframe with column A, how do I create two new columns containing all ordered combinations of A

Question

4 answers

solution1
4 ACCPTED 2011-06-29 23:09:04

solution2
4 2011-06-29 23:35:44

solution3
2 2011-06-29 22:38:41

solution4
1 2011-06-29 22:32:31

Given an R dataframe with column A, how do I create two new columns containing all ordered combinations of A

Question

4 answers

solution1 4 ACCPTED 2011-06-29 23:09:04

solution2 4 2011-06-29 23:35:44

solution3 2 2011-06-29 22:38:41

solution4 1 2011-06-29 22:32:31

solution1
4 ACCPTED 2011-06-29 23:09:04

solution2
4 2011-06-29 23:35:44

solution3
2 2011-06-29 22:38:41

solution4
1 2011-06-29 22:32:31