简体   繁体   English

SAS中的PROC SQL - 所有项目对

[英]PROC SQL in SAS - All Pairs of Items

I have a dataset in which I need to look at all pairs of items that are together from within another group. 我有一个数据集,我需要在其中查看来自另一个组内的所有项目对。 I've created a toy example below to further explain. 我在下面创建了一个玩具示例以进一步解释。

BUNCH    FRUITS
1        apples
1        bananas
1        mangos
2        apples
3        bananas
3        apples
4        bananas
4        apples

What I want is a listing of all possible pairs and sum the frequency they occur together within a bunch. 我想要的是列出所有可能的对,并将它们在一堆中出现的频率相加。 My output would ideally look like this: 我的输出理想情况如下:

FRUIT1    FRUIT2     FREQUENCY
APPLES    BANANAS    3
APPLES    MANGOS     1

My end goal is to make something that I'll eventually be able to import into Gephi for a network analysis. 我的最终目标是制作一些我最终能够导入Gephi进行网络分析的东西。 For this I need a Source and Target column (aka FRUIT1 and FRUIT2 above). 为此我需要一个Source和Target列(又名上面的FRUIT1和FRUIT2)。

I think there are a few other ways to approach this as well without using PROC SQL (Maybe using PROC TRANSPOSE) but this is where I've started. 我认为还有其他一些方法可以在不使用PROC SQL(也许使用PROC TRANSPOSE)的情况下解决这个问题,但这是我开始的地方。


SOLUTION

Thanks for the help. 谢谢您的帮助。 Sample code below for anyone interested in something similar: 以下示例代码适用于对类似内容感兴趣的人:

proc sql;
    create table fruit_combo as
    select a.FRUIT as FRUIT1, b.FRUIT as FRUIT2, count(*) as FREQUENCY
    from FRUITS a, FRUITS b
    where a.BUNCH=b.BUNCH and and not a.FRUIT= b.FRUIT
    group by FRUIT1, FRUIT2;
    quit;

Simplest approach is to do a cartesian (full) join of the table to itself, on t1.ID=t2.ID and t1.FRUIT ne t2.FRUIT. 最简单的方法是在t1.ID = t2.ID和t1.FRUIT ne t2.FRUIT上对表进行笛卡尔(完全)连接。 That will generate the full combination set, which you could then summarize. 这将生成完整的组合集,然后您可以进行总结。

Here's the copy/paste version of above. 这是上面的复制/粘贴版本。 A simple reading shows errors - duplicate rows of counts for banana-apple and apple-banana. 简单的阅读显示错误 - 香蕉苹果和苹果香蕉的重复计数行。 To get to the desired result an additional restriction was required (a.FRUIT gt b.FRUIT). 为了达到预期的结果,需要额外的限制(a.FRUIT gt b.FRUIT)。

data FRUITS ; 
input  BUNCH    FRUIT $;
cards;
1        apples
1        bananas
1        mangos
2        apples
3        bananas
3        apples
4        bananas
4        apples
;
run;


proc freq data=have ;
tables fruits; 
run;


proc sql;
    create table fruit_combo as
    select a.FRUIT as FRUIT1, b.FRUIT as FRUIT2, count(*) as FREQUENCY
    from FRUITS a, FRUITS b
    where a.BUNCH=b.BUNCH 
     and a.FRUIT ne b.FRUIT
     and a.FRUIT gt b.FRUIT
    group by FRUIT1, FRUIT2;
    quit;

proc print ; run;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM