[英]PROC SQL in SAS - All Pairs of Items
I have a dataset in which I need to look at all pairs of items that are together from within another group. 我有一个数据集,我需要在其中查看来自另一个组内的所有项目对。 I've created a toy example below to further explain.
我在下面创建了一个玩具示例以进一步解释。
BUNCH FRUITS
1 apples
1 bananas
1 mangos
2 apples
3 bananas
3 apples
4 bananas
4 apples
What I want is a listing of all possible pairs and sum the frequency they occur together within a bunch. 我想要的是列出所有可能的对,并将它们在一堆中出现的频率相加。 My output would ideally look like this:
我的输出理想情况如下:
FRUIT1 FRUIT2 FREQUENCY
APPLES BANANAS 3
APPLES MANGOS 1
My end goal is to make something that I'll eventually be able to import into Gephi for a network analysis. 我的最终目标是制作一些我最终能够导入Gephi进行网络分析的东西。 For this I need a Source and Target column (aka FRUIT1 and FRUIT2 above).
为此我需要一个Source和Target列(又名上面的FRUIT1和FRUIT2)。
I think there are a few other ways to approach this as well without using PROC SQL (Maybe using PROC TRANSPOSE) but this is where I've started. 我认为还有其他一些方法可以在不使用PROC SQL(也许使用PROC TRANSPOSE)的情况下解决这个问题,但这是我开始的地方。
SOLUTION 解
Thanks for the help. 谢谢您的帮助。 Sample code below for anyone interested in something similar:
以下示例代码适用于对类似内容感兴趣的人:
proc sql;
create table fruit_combo as
select a.FRUIT as FRUIT1, b.FRUIT as FRUIT2, count(*) as FREQUENCY
from FRUITS a, FRUITS b
where a.BUNCH=b.BUNCH and and not a.FRUIT= b.FRUIT
group by FRUIT1, FRUIT2;
quit;
Simplest approach is to do a cartesian (full) join of the table to itself, on t1.ID=t2.ID and t1.FRUIT ne t2.FRUIT. 最简单的方法是在t1.ID = t2.ID和t1.FRUIT ne t2.FRUIT上对表进行笛卡尔(完全)连接。 That will generate the full combination set, which you could then summarize.
这将生成完整的组合集,然后您可以进行总结。
Here's the copy/paste version of above. 这是上面的复制/粘贴版本。 A simple reading shows errors - duplicate rows of counts for banana-apple and apple-banana.
简单的阅读显示错误 - 香蕉苹果和苹果香蕉的重复计数行。 To get to the desired result an additional restriction was required (a.FRUIT gt b.FRUIT).
为了达到预期的结果,需要额外的限制(a.FRUIT gt b.FRUIT)。
data FRUITS ;
input BUNCH FRUIT $;
cards;
1 apples
1 bananas
1 mangos
2 apples
3 bananas
3 apples
4 bananas
4 apples
;
run;
proc freq data=have ;
tables fruits;
run;
proc sql;
create table fruit_combo as
select a.FRUIT as FRUIT1, b.FRUIT as FRUIT2, count(*) as FREQUENCY
from FRUITS a, FRUITS b
where a.BUNCH=b.BUNCH
and a.FRUIT ne b.FRUIT
and a.FRUIT gt b.FRUIT
group by FRUIT1, FRUIT2;
quit;
proc print ; run;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.