简体   繁体   English

如何将大数据集拆分为 sas 中的小表

[英]how can I split a big data set to small tables in sas

I have a large data set of branches and accounts.我有大量的分支机构和账户数据集。 I would like to split the data set into to smaller tables by the variable BRANCH.我想通过变量 BRANCH 将数据集拆分为较小的表。 Is there a way to do so, even by PROC TABULATE or PROC REPORT?有没有办法做到这一点,即使是通过 PROC TABULATE 或 PROC REPORT?

My code:我的代码:

PROC SQL ;
    CREATE TABLE Branch_trans as
    SELECT  Branch,
            account_id
    FROM work.BRANCH
;
QUIT ;

If you want to create separate datasets by branch, you can use a macro to do so.如果要按分支创建单独的数据集,可以使用宏来执行此操作。 The below macro will get the distinct number of branches and subset the data into individual files suffixed 1, 2, 3, etc.下面的宏将获得不同数量的分支并将数据子集到后缀为 1、2、3 等的单个文件中。

You will need to know the distinct number of branches.您将需要知道不同的分支数量。 If your dataset is large, this will take some time to complete.如果您的数据集很大,这将需要一些时间才能完成。 You can run these all in parallel to make it run faster, but the code will increase in complexity.您可以并行运行所有这些以使其运行得更快,但代码会增加复杂性。

%macro splitData(group=, data=, out=);

    proc sql noprint;
        select distinct &group.
        into :groupvalues separated by '|'
        from &data.
        ;
    quit;

    %do i = 1 %to %sysfunc(countw(&groupvalues., |));
        %let groupvalue = %scan(&groupvalues., &i., |);

        data &out._&i.;
            set &data.;
            where &group. = "&groupvalue.";
        run;
    %end;

%mend;
%splitData(data=sashelp.cars, group=origin, out=want);

Using PROC PRINT with BY statement as such:将 PROC PRINT 与 BY 语句一起使用,如下所示:

PROC PRINT DATA=have ;
BY Branch ;
RUN ;

maybe it help for you but more then your question.也许它对你有帮助,但不仅仅是你的问题。 This soultion split data by branch so you can modify report by branch if do you want:此灵魂按分支拆分数据,因此您可以根据需要按分支修改报告:

    /* this is an example table, the branch 3 has 2 row*/
    data fulldata;
    branchid=1; a="aaa";output;
    branchid=2; a="bbb";output;
    branchid=3; a="ccc";output;
    branchid=3; a="ddd";output;
    run;

    %macro x;
    /*sort for the distinct branch number*/
    proc sort data=fulldata out=temptable nodupkey;
    by branchid;
    run;

    %let branchcount=0;

    /*save the banch number, branchid and branch count into macro variables */
    data _null_;
    set temptable end=x;
    call symput("branch" || strip(_n_),strip(branchid));
    if x then call symput('branchcount',_n_);
    run;

    /* cycle in the branch count and split the table by brancid */
    %let i = 1;
    %do %while (&i<=&branchcount);
        data branch&i;
        set fulldata;
        where branchid=&&branch&i;
        run;
        proc report data=branch&i; /* you can modify if you want */
        quit;
        %let i=%eval(&i.+1);
    %end;
    %mend;
    %x;

    /* it make 3 table and the third has 2 rows */
    /*important : the branchid is numeric you need use like : 
     where branchid="&&branch&i";*/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM