简体   繁体   中英

R SQL create a pairwise dataset without repetitions

I am working on R and SQL (package sqldf) on a dataset like the following one:

View(dataset)

key1    key2    id    ...
01/01   XXX     A     ...
01/01   XXX     B     ...
01/01   YYY     C     ...
01/01   YYY     D     ...
02/01   XXX     A     ...
02/01   XXX     B     ...
02/01   XXX     C     ...

I would like to create a pairwise dataset with one pair for each group identified by key1 and key2, as following:

key1    key2    id_1    id_2    
01/01   XXX     A       B  
01/01   YYY     C       D
02/01   XXX     A       B
02/01   XXX     A       C
02/01   XXX     C       B

I have used

sqldf(c('select a.key1, a.key2, a.id as id_1, 
                  b.id as id_2 
                  from dataset a
                  inner join dataset b on a.key1=b.key2 and a.key2=b.key2  and a.id!=b.id'))

The problem is that with this query I obtain

key1    key2    id_1    id_2    
01/01   XXX     A       B
01/01   XXX     B       A    
01/01   YYY     C       D
01/01   YYY     D       C
02/01   XXX     A       B
02/01   XXX     B       A
02/01   XXX     A       C
02/01   XXX     C       A
02/01   XXX     C       B
02/01   XXX     B       C

I would like to avoid repetitions, since I want to make some comparaisons and it doesn't matter which id is put in the column id_1 and which in id_2.

Thank you very much!

将连接条件从a.id != b.id更改为a.id < b.id

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM