I am working on R and SQL (package sqldf) on a dataset like the following one:
View(dataset)
key1 key2 id ...
01/01 XXX A ...
01/01 XXX B ...
01/01 YYY C ...
01/01 YYY D ...
02/01 XXX A ...
02/01 XXX B ...
02/01 XXX C ...
I would like to create a pairwise dataset with one pair for each group identified by key1 and key2, as following:
key1 key2 id_1 id_2
01/01 XXX A B
01/01 YYY C D
02/01 XXX A B
02/01 XXX A C
02/01 XXX C B
I have used
sqldf(c('select a.key1, a.key2, a.id as id_1,
b.id as id_2
from dataset a
inner join dataset b on a.key1=b.key2 and a.key2=b.key2 and a.id!=b.id'))
The problem is that with this query I obtain
key1 key2 id_1 id_2
01/01 XXX A B
01/01 XXX B A
01/01 YYY C D
01/01 YYY D C
02/01 XXX A B
02/01 XXX B A
02/01 XXX A C
02/01 XXX C A
02/01 XXX C B
02/01 XXX B C
I would like to avoid repetitions, since I want to make some comparaisons and it doesn't matter which id is put in the column id_1 and which in id_2.
Thank you very much!
将连接条件从a.id != b.id
更改为a.id < b.id
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.