I have multiple datasets (100+) that all contain the same 3 columns (code_num, replicate, total_qty) each with a distinct code (code_num).
data code_num_1
code_num replicate total_qty
12345 376 45
12345 76 67
12345 943 300
.
.
data code_num_2
code_num replicate total_qty
12234 85 746
12234 900 35
12234 726 273
.
.
and etc.
I would like to run those datasets through a data step if possible:
data test;
set test_; <-- datasets will go here...
if _N_ in(&PercentileRow10,&PercentileRow20,&PercentileRow30,&PercentileRow40,&PercentileRow50,&PercentileRow60,&PercentileRow70, &PercentileRow80,&PercentileRow90);
run;
*Note: &percentilerow is a macro variable that will obtain the percentiles from the datasets. The column quantity will determine percentiles. I have this step beforehand:
proc sql no print;
create table ___ as select code_num, replicate, sum(qty) as total_qty from ____ group by code_num, replicate order by total_qty; quit;
Ideally, I would like to obtain the percentiles of each dataset and create a new dataset that will have each percentile and the associated replicate it occurred and the total quantity. Could I use a macro and do loop to run my datasets through this data set to produce new datasets?
data code_num_1_perc
percentile replicate qty
10 87 45
20 933 65
30 34 100
.
.
90 467 837
This is my ideal output for each dataset code_num_#. If possible
If I understand the requirements correct, the proposed methodology is flawed.
For example, the median (50th percentile) of a series such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 is 5.5. 5.5 is not a value in the data set so how would a replicate number be selected?
My recommendation would be a different process altogether. Look into PROC RANK to see how ties are handled and how you'd like them handled. You didn't specify which variable would used to calculate the percentiles.
data combined;
length source data_set_name $50.;
set code_num_: indsname = source;
data_set_name = source;
run;
proc rank data=combined out=combined_deciles groups=10;
by data_set_name;
var total_qty;
ranks PRanks;
run;
data want;
set combined_deciles;
by datasetName Pranks;
if first.Pranks;
run;
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.