[英]How to sum two frequencies dataset with same variables?
嗨,我想創建兩個贏得一定數量比賽的網球運動員的頻率數據集。 兩個數據集的順序完全相同
我如何創建數據集:
PROC FREQ data=projet.matchs;
TABLES player1 / out = table1;
run;
player1 Fréquence Pourcentage Fréquencecumulée Pourcentagecumulé
Adrian Mannarino 3 1.18 3 1.18
Agnieszka Radwanska 2 0.79 5 1.97
Ajla Tomljanovic 1 0.39 6 2.36
Albert Ramos 1 0.39 7 2.76
第二個數據集表2
PROC FREQ data=projet.matchs;
TABLES player2 / out= table2;
run;
player2 Fréquence Pourcentage Fréquence cumulée Pourcentage cumulé
Adrian Mannarino 1 0.39 1 0.39
Alex Bolt 1 0.39 2 0.79
Alex De Minaur 1 0.39 3 1.18
Alexander Zverev 3 1.18 6 2.36
我想要的是用 table1 和 table2 的總和創建一個新數據集。 我的數據集要大得多,我剛剛放置了第 4 個第一個觀察結果。
任何幫助將不勝感激! 謝謝
這個怎么樣? 對你起作用嗎?
data combined / view=combined;
set table1 table2;
run;
proc means data=combined nway;
class player1;
var Fréquence,Pourcentage,Fréquence cumulée,Pourcentage cumulé;
run;
Yoy 可以使用 proc sql 和 summary 函數將其加入表中。
Have1 數據集:
+---------------------+-----------+-------------+------------------+-------------------+
| player1 | Frequence | Pourcentage | Frequencecumulee | Pourcentagecumule |
+---------------------+-----------+-------------+------------------+-------------------+
| Adrian Mannarino | 3 | 1.18 | 3 | 1.18 |
| Agnieszka Radwanska | 2 | 0.79 | 5 | 1.97 |
| Ajla Tomljanovic | 1 | 0.39 | 6 | 2.36 |
| Albert Ramos | 1 | 0.39 | 7 | 2.76 |
+---------------------+-----------+-------------+------------------+-------------------+
Have2 數據集:
+------------------+-----------+-------------+------------------+-------------------+
| player2 | Frequence | Pourcentage | Frequencecumulee | Pourcentagecumule |
+------------------+-----------+-------------+------------------+-------------------+
| Adrian Mannarino | 1 | 0.39 | 1 | 0.39 |
| Alex Bolt | 1 | 0.39 | 2 | 0.79 |
| Alex De Minaur | 1 | 0.39 | 3 | 1.18 |
| Alexander Zverev | 3 | 1.18 | 6 | 2.36 |
+------------------+-----------+-------------+------------------+-------------------+
解決方案:
proc sql noprint;
create table want1 as
select
coalesce(player1,player2) as player,
sum(t1.Frequence,t2.Frequence) as Frequence,
sum(t1.Pourcentage,t2.Pourcentage) as Pourcentage,
sum(t1.Frequencecumulee,t2.Frequencecumulee) as Frequencecumulee,
sum(t1.Pourcentagecumule,t2.Pourcentagecumule) as Pourcentagecumule
from
have1 t1
full join
have2 t2
on
strip(player1)=strip(player2);
quit;
輸出:
+---------------------+-----------+-------------+------------------+-------------------+
| player | Frequence | Pourcentage | Frequencecumulee | Pourcentagecumule |
+---------------------+-----------+-------------+------------------+-------------------+
| Adrian Mannarino | 4 | 1.57 | 4 | 1.57 |
| Agnieszka Radwanska | 2 | 0.79 | 5 | 1.97 |
| Ajla Tomljanovic | 1 | 0.39 | 6 | 2.36 |
| Albert Ramos | 1 | 0.39 | 7 | 2.76 |
| Alex Bolt | 1 | 0.39 | 2 | 0.79 |
| Alex De Minaur | 1 | 0.39 | 3 | 1.18 |
| Alexander Zverev | 3 | 1.18 | 6 | 2.36 |
+---------------------+-----------+-------------+------------------+-------------------+
或者你可以嘗試使用 data step + proc summary:
data want2;
set have2(rename=(player2=player)) have1(rename=(player1=player));
run;
proc summary data=want2 nway;
var Frequence Pourcentage Frequencecumulee Pourcentagecumule;
class player;
output out=want2 (drop=_:) sum=;
run;
輸出:
+---------------------+-----------+-------------+------------------+-------------------+
| player | Frequence | Pourcentage | Frequencecumulee | Pourcentagecumule |
+---------------------+-----------+-------------+------------------+-------------------+
| Adrian Mannarino | 4 | 1.57 | 4 | 1.57 |
| Agnieszka Radwanska | 2 | 0.79 | 5 | 1.97 |
| Ajla Tomljanovic | 1 | 0.39 | 6 | 2.36 |
| Albert Ramos | 1 | 0.39 | 7 | 2.76 |
| Alex Bolt | 1 | 0.39 | 2 | 0.79 |
| Alex De Minaur | 1 | 0.39 | 3 | 1.18 |
| Alexander Zverev | 3 | 1.18 | 6 | 2.36 |
+---------------------+-----------+-------------+------------------+-------------------+
當然,請改用 ODS 表輸出。 這為您提供了一個不錯的干凈版本。 名為temp
的表是 proc freq 的輸出,然后我將其清理到一個名為want
的可顯示表中。 它非常通用,因此在第一步中更改您的數據集名稱和變量名稱,其他一切都應該可以正常工作。
*Run frequency for tables;
ods table onewayfreqs=temp;
proc freq data=sashelp.class;
table sex age;
run;
*Format output;
data want;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
keep variable variable_value frequency percent cum:;
label variable='Variable'
variable_value='Variable Value';
run;
*Display;
proc print data=want(obs=20) label;
run;
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.