简体   繁体   English

SAS 按每个变量的计数器分组 - 创建主键

[英]SAS group by counters per variable - primary key creation

I have some data which needs to be split into 12 or so different groups, there is no key and the order the data is in is important.我有一些数据需要分成 12 个左右不同的组,没有键,数据的顺序很重要。

The data has a number of groups and those groups have singular and / or nested groups within that.数据具有多个组,并且这些组在其中具有单数和/或嵌套组。 Each group will be split out as the data is in a hierarchical format.由于数据采用分层格式,因此每个组都将被拆分。 so each "GROUP" then has its own format which then all needs to be joined up on one line (or many) rows.所以每个“GROUP”都有自己的格式,然后都需要连接到一行(或多行)行。

Sample data file:样本数据文件:

"TRANS","23115168","","","OTVST","","23115168","","COMLT","","",20180216,"OAMI","501928",,
"MTPNT","UPDTE",2415799999,"","","17","","",,20180216,
"ASSET","","REPRT","METER","","CR","E6VG470","LPG",2017,"E6S05633099999","","","LI"
"METER","","U","S1",6.0000,"","",20171108,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00990"
"ASSET","","REMVE","METER","","CR","E6VG470","LPG",2017,"E6S05633099999","","","LI"
"METER","","U","S1",6.0000,"","",20171108,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00990"
"ASSET","","INSTL","METER","","CR","E6VG470","LPG",2017,"E6S06769699999","","","LI"
"METER","","U","S1",6.0000,"","",20180216,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00000"
"APPNT","",20180216,,"","123900",""

The hierarchy that should exist when data is input.输入数据时应存在的层次结构。 I am thinking there could be several tables that can be joined together later.我在想以后可能会有几张桌子可以连接在一起。 (numbers for illustration of parent child levels) (用于说明父子级别的数字)

1. Transaction [TRANS]
   1.1. Meter Point [MTPNT]
      1.1.1. Asset [ASSET]
         1.1.1.1. Meter [METER]
         1.1.1.2. Converter [CONVE]
         1.1.1.3. Register Details [REGST]
            1.1.1.3.1. Reading [READG]
         1.1.1.4. Market Participant [MKPRT]
         1.1.1.5. Name [NAME]
            1.1.1.5.1. Address [ADDRS]
            1.1.1.5.2. Contact Mechanism [CONTM]
   1.2. Appointment [APPNT]
   1.3. Name [NAME]
      1.3.1. Address [ADDRS]
      1.3.2. Contact Mechanism [CONTM]
   1.4. Market Participant [MKPRT]

The industry GAS data, so in this flow you can have many ASSET per MTPNT, and those many ASSET can have many REGST because this is where the meter reading is kept for READG行业 GAS 数据,因此在此流程中,每个 MTPNT 可以有很多 ASSET,而这些很多 ASSET 可以有很多 REGST,因为这是为 READG 保存仪表读数的地方

I have tried using by groups and iterative first.我尝试过按组使用并首先迭代。 processing, but i have not worked with this type of data before.处理,但我以前没有处理过这种类型的数据。 I need a way to split create a key per grouping, which when split up and the fields are defined, can be joined back together.我需要一种方法来拆分为每个分组创建一个键,当拆分并定义字段时,可以将其重新连接在一起。

I have tried manipulating the infile so that all the data appears on one line per TRANS, but then i still have the issue of applying the fields, and ordering is paramount.我已经尝试操作infile以便所有数据出现在每个 TRANS 的一行上,但是我仍然遇到应用字段的问题,并且排序是最重要的。

I have managed to get a few keys for some of the groups, but after splitting they dont quite join back together.我已经设法为某些组获得了一些密钥,但是在拆分后它们并没有完全重新组合在一起。

data TRANS;
    set mpancreate_a;
    by DataItmGrp NOTSORTED;

    if first.DataItmGrp then 
        do;
            if DataItmGrp = "TRANS" then 
                TRANSKey+1;
        end;
run;

data TRANS;
    set TRANS;
    TRANSKey2 + 1;
    by DataItmGrp NOTSORTED;

    if first.DataItmGrp then
        do;
            if DataItmGrp = "TRANS" then
                TRANSKEY2=1;
        end;


run;

data MTPNT;
    set TRANS;
    by DataItmGrp NOTSORTED;

    if first.DataItmGrp then
        do;
            if DataItmGrp = "MTPNT" then
                MTPNTKEY+1;
        end;
run;

data MTPNT;
    set MTPNT;
    by  MTPNTKEY NOTSORTED;

    if first.MTPNTKEY  and DataItmGrp = "MTPNT" then
        MTPNTKEY2=0;
    MTPNTKEY2+1;
run;

data ASSET;
    set MTPNT;

    IF MTPNTKEY = 0 THEN
        MTPNTKEY2=0;
    by DataItmGrp NOTSORTED;

    if first.DataItmGrp then
        do;
            if DataItmGrp = "ASSET" then
                ASSETKEY+1;
        end;
run;

data ASSET;
    set ASSET;
    by  ASSETKEY NOTSORTED;

    if first.ASSETKEY  and DataItmGrp = "ASSET" then
        ASSETKEY2=0;
    ASSETKEY2+1;

    IF ASSETKEY =0 THEN
        ASSETKEY2=0;
run;

i want a counter for each group found, and a retained counter for that particular group - but i cannot work out how to get in and out of the groupings based on the hierarchy above我想要为找到的每个组提供一个计数器,并为该特定组保留一个计数器 - 但我无法根据上面的层次结构计算出如何进出分组

i'm hoping that once i have these keys, i can split the data by group and then left join back together我希望一旦我有了这些密钥,我就可以按组拆分数据,然后将它们重新连接在一起


        _n_     TRANS   TRANS2  MTPNT   MTPNT2
TRANS   1       1       0       0       0
MTPNT   2       2       1       1       1
ASSET   3       3       1       2       1
METER   4       4       1       3       1
READG   5       5       1       4       1
MTPNT   6       6       1       1       2
ASSET   7       7       1       2       2
METER   8       8       1       3       2
READG   9       9       1       4       2
APPNT   10      10      1       5       2
TRANS   11      1       2       6       2
MTPNT   12      2       2       1       3
ASSET   13      3       2       2       3
METER   14      4       2       3       3
READG   15      5       2       4       3
MTPNT   16      6       2       1       4
ASSET   17      7       2       2       4
METER   18      8       2       3       4
READG   19      9       2       4       4
APPNT   20      10      2       5       4   




The input of hierarchical data from a data file that has no definitive markers is problematic.从没有明确标记的数据文件输入分层数据是有问题的。 The best suggestion I have is to understand what are the salient values you want to extract and in what context do you want to know them.我的最佳建议是了解您想要提取的突出价值是什么,以及您想在什么背景下了解它们。 For this problem a simplest first approach would be to have a single monolithic table with categorical variables to capture the path that descends to the salient value (meter reading).对于这个问题,最简单的第一种方法是使用带有分类变量的单个整体表来捕获下降到显着值(仪表读数)的路径。

A more complex situation would be the first token in each line drives the input for that line and the output table it belongs to.更复杂的情况是每行中的第一个令牌驱动该行的输入以及它所属的 output 表。 Since there are no landmarks as to hierarchy absolute or relative position (as in the NAME and MKPRT) there is no 100% confident way to place them in the hierarchy and that can also affect the placement of items read-in from subsequent data lines.由于没有绝对或相对层次结构的地标 position(如在 NAME 和 MKPRT 中),因此没有 100% 可靠的方式将它们放置在层次结构中,这也会影响从后续数据行读取的项目的放置。

Depending on the true complexity and adherence to rules in the real world you may or may not 'miss out' the reading of some values.根据现实世界中真正的复杂性和对规则的遵守情况,您可能会或可能不会“错过”某些值的阅读。

Suppose there is the simpler goal of just getting the meter readings.假设有一个更简单的目标是获取仪表读数。

data want;

length tier level1-level6 $8 path $64 meterReadingString $8 dummy $1;
retain level1-level5 path;
attrib readingdate informat=yymmdd10. format=yymmdd10.;

infile cards dsd missover;

input @1 tier @; * held input - dont advance read line yet;

if tier="TRANS" then do;
  level1 = tier;
  call missing (of level2-level6);
  path = catx("/", of level:);
end;

if tier="MTPNT" and path="TRANS" then do;
  level2 = tier;
  call missing (of level3-level6);
  path = catx("/", of level:);
end;

if tier="ASSET" and path="TRANS/MTPNT" then do;
  level3 = tier;
  call missing (of level4-level6);
  path = catx("/", of level:);
end;

if tier="METER" and path="TRANS/MTPNT/ASSET" then do;
  level4 = tier;
  call missing (of level5-level6);
  path = catx("/", of level:);
end;

if tier="REGST" and path="TRANS/MTPNT/ASSET/METER" then do;
  level5 = tier;
  call missing (of level6-level6);
  path = catx("/", of level:);
end;

if tier="READG" and path="TRANS/MTPNT/ASSET/METER/REGST" then do;
  level6 = tier;
  path = catx("/", of level:);
  input @1 tier readingdate dummy meterReadingString @; * reread line according to tier;

  meterReading = input(meterReadingString, best12.);

  if path = "TRANS/MTPNT/ASSET/METER/REGST/READG" then OUTPUT;
end;    

datalines;
"TRANS","23115168","","","OTVST","","23115168","","COMLT","","",20180216,"OAMI","501928",,
"MTPNT","UPDTE",2415799999,"","","17","","",,20180216,
"ASSET","","REPRT","METER","","CR","E6VG470","LPG",2017,"E6S05633099999","","","LI"
"METER","","U","S1",6.0000,"","",20171108,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00990"
"ASSET","","REMVE","METER","","CR","E6VG470","LPG",2017,"E6S05633099999","","","LI"
"METER","","U","S1",6.0000,"","",20171108,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00990"
"ASSET","","INSTL","METER","","CR","E6VG470","LPG",2017,"E6S06769699999","","","LI"
"METER","","U","S1",6.0000,"","",20180216,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00000"
"APPNT","",20180216,,"","123900",""
run;

You can use this as the basis of a more complicated reader that has a different output <tier> data set for each tier or path to tier encountered.您可以将其用作更复杂的阅读器的基础,该阅读器具有不同的output <tier>数据集,用于遇到的每个层或层的路径。 You would need a different input statement per tier, similar to how READG is read.每层需要不同的input语句,类似于读取READG的方式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM