繁体   English   中英

SAS 按每个变量的计数器分组 - 创建主键

[英]SAS group by counters per variable - primary key creation

我有一些数据需要分成 12 个左右不同的组,没有键,数据的顺序很重要。

数据具有多个组,并且这些组在其中具有单数和/或嵌套组。 由于数据采用分层格式,因此每个组都将被拆分。 所以每个“GROUP”都有自己的格式,然后都需要连接到一行(或多行)行。

样本数据文件:

"TRANS","23115168","","","OTVST","","23115168","","COMLT","","",20180216,"OAMI","501928",,
"MTPNT","UPDTE",2415799999,"","","17","","",,20180216,
"ASSET","","REPRT","METER","","CR","E6VG470","LPG",2017,"E6S05633099999","","","LI"
"METER","","U","S1",6.0000,"","",20171108,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00990"
"ASSET","","REMVE","METER","","CR","E6VG470","LPG",2017,"E6S05633099999","","","LI"
"METER","","U","S1",6.0000,"","",20171108,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00990"
"ASSET","","INSTL","METER","","CR","E6VG470","LPG",2017,"E6S06769699999","","","LI"
"METER","","U","S1",6.0000,"","",20180216,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00000"
"APPNT","",20180216,,"","123900",""

输入数据时应存在的层次结构。 我在想以后可能会有几张桌子可以连接在一起。 (用于说明父子级别的数字)

1. Transaction [TRANS]
   1.1. Meter Point [MTPNT]
      1.1.1. Asset [ASSET]
         1.1.1.1. Meter [METER]
         1.1.1.2. Converter [CONVE]
         1.1.1.3. Register Details [REGST]
            1.1.1.3.1. Reading [READG]
         1.1.1.4. Market Participant [MKPRT]
         1.1.1.5. Name [NAME]
            1.1.1.5.1. Address [ADDRS]
            1.1.1.5.2. Contact Mechanism [CONTM]
   1.2. Appointment [APPNT]
   1.3. Name [NAME]
      1.3.1. Address [ADDRS]
      1.3.2. Contact Mechanism [CONTM]
   1.4. Market Participant [MKPRT]

行业 GAS 数据,因此在此流程中,每个 MTPNT 可以有很多 ASSET,而这些很多 ASSET 可以有很多 REGST,因为这是为 READG 保存仪表读数的地方

我尝试过按组使用并首先迭代。 处理,但我以前没有处理过这种类型的数据。 我需要一种方法来拆分为每个分组创建一个键,当拆分并定义字段时,可以将其重新连接在一起。

我已经尝试操作infile以便所有数据出现在每个 TRANS 的一行上,但是我仍然遇到应用字段的问题,并且排序是最重要的。

我已经设法为某些组获得了一些密钥,但是在拆分后它们并没有完全重新组合在一起。

data TRANS;
    set mpancreate_a;
    by DataItmGrp NOTSORTED;

    if first.DataItmGrp then 
        do;
            if DataItmGrp = "TRANS" then 
                TRANSKey+1;
        end;
run;

data TRANS;
    set TRANS;
    TRANSKey2 + 1;
    by DataItmGrp NOTSORTED;

    if first.DataItmGrp then
        do;
            if DataItmGrp = "TRANS" then
                TRANSKEY2=1;
        end;


run;

data MTPNT;
    set TRANS;
    by DataItmGrp NOTSORTED;

    if first.DataItmGrp then
        do;
            if DataItmGrp = "MTPNT" then
                MTPNTKEY+1;
        end;
run;

data MTPNT;
    set MTPNT;
    by  MTPNTKEY NOTSORTED;

    if first.MTPNTKEY  and DataItmGrp = "MTPNT" then
        MTPNTKEY2=0;
    MTPNTKEY2+1;
run;

data ASSET;
    set MTPNT;

    IF MTPNTKEY = 0 THEN
        MTPNTKEY2=0;
    by DataItmGrp NOTSORTED;

    if first.DataItmGrp then
        do;
            if DataItmGrp = "ASSET" then
                ASSETKEY+1;
        end;
run;

data ASSET;
    set ASSET;
    by  ASSETKEY NOTSORTED;

    if first.ASSETKEY  and DataItmGrp = "ASSET" then
        ASSETKEY2=0;
    ASSETKEY2+1;

    IF ASSETKEY =0 THEN
        ASSETKEY2=0;
run;

我想要为找到的每个组提供一个计数器,并为该特定组保留一个计数器 - 但我无法根据上面的层次结构计算出如何进出分组

我希望一旦我有了这些密钥,我就可以按组拆分数据,然后将它们重新连接在一起


        _n_     TRANS   TRANS2  MTPNT   MTPNT2
TRANS   1       1       0       0       0
MTPNT   2       2       1       1       1
ASSET   3       3       1       2       1
METER   4       4       1       3       1
READG   5       5       1       4       1
MTPNT   6       6       1       1       2
ASSET   7       7       1       2       2
METER   8       8       1       3       2
READG   9       9       1       4       2
APPNT   10      10      1       5       2
TRANS   11      1       2       6       2
MTPNT   12      2       2       1       3
ASSET   13      3       2       2       3
METER   14      4       2       3       3
READG   15      5       2       4       3
MTPNT   16      6       2       1       4
ASSET   17      7       2       2       4
METER   18      8       2       3       4
READG   19      9       2       4       4
APPNT   20      10      2       5       4   




从没有明确标记的数据文件输入分层数据是有问题的。 我的最佳建议是了解您想要提取的突出价值是什么,以及您想在什么背景下了解它们。 对于这个问题,最简单的第一种方法是使用带有分类变量的单个整体表来捕获下降到显着值(仪表读数)的路径。

更复杂的情况是每行中的第一个令牌驱动该行的输入以及它所属的 output 表。 由于没有绝对或相对层次结构的地标 position(如在 NAME 和 MKPRT 中),因此没有 100% 可靠的方式将它们放置在层次结构中,这也会影响从后续数据行读取的项目的放置。

根据现实世界中真正的复杂性和对规则的遵守情况,您可能会或可能不会“错过”某些值的阅读。

假设有一个更简单的目标是获取仪表读数。

data want;

length tier level1-level6 $8 path $64 meterReadingString $8 dummy $1;
retain level1-level5 path;
attrib readingdate informat=yymmdd10. format=yymmdd10.;

infile cards dsd missover;

input @1 tier @; * held input - dont advance read line yet;

if tier="TRANS" then do;
  level1 = tier;
  call missing (of level2-level6);
  path = catx("/", of level:);
end;

if tier="MTPNT" and path="TRANS" then do;
  level2 = tier;
  call missing (of level3-level6);
  path = catx("/", of level:);
end;

if tier="ASSET" and path="TRANS/MTPNT" then do;
  level3 = tier;
  call missing (of level4-level6);
  path = catx("/", of level:);
end;

if tier="METER" and path="TRANS/MTPNT/ASSET" then do;
  level4 = tier;
  call missing (of level5-level6);
  path = catx("/", of level:);
end;

if tier="REGST" and path="TRANS/MTPNT/ASSET/METER" then do;
  level5 = tier;
  call missing (of level6-level6);
  path = catx("/", of level:);
end;

if tier="READG" and path="TRANS/MTPNT/ASSET/METER/REGST" then do;
  level6 = tier;
  path = catx("/", of level:);
  input @1 tier readingdate dummy meterReadingString @; * reread line according to tier;

  meterReading = input(meterReadingString, best12.);

  if path = "TRANS/MTPNT/ASSET/METER/REGST/READG" then OUTPUT;
end;    

datalines;
"TRANS","23115168","","","OTVST","","23115168","","COMLT","","",20180216,"OAMI","501928",,
"MTPNT","UPDTE",2415799999,"","","17","","",,20180216,
"ASSET","","REPRT","METER","","CR","E6VG470","LPG",2017,"E6S05633099999","","","LI"
"METER","","U","S1",6.0000,"","",20171108,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00990"
"ASSET","","REMVE","METER","","CR","E6VG470","LPG",2017,"E6S05633099999","","","LI"
"METER","","U","S1",6.0000,"","",20171108,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00990"
"ASSET","","INSTL","METER","","CR","E6VG470","LPG",2017,"E6S06769699999","","","LI"
"METER","","U","S1",6.0000,"","",20180216,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00000"
"APPNT","",20180216,,"","123900",""
run;

您可以将其用作更复杂的阅读器的基础,该阅读器具有不同的output <tier>数据集,用于遇到的每个层或层的路径。 每层需要不同的input语句,类似于读取READG的方式。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM