簡體   English   中英

SAS 按每個變量的計數器分組 - 創建主鍵

[英]SAS group by counters per variable - primary key creation

我有一些數據需要分成 12 個左右不同的組,沒有鍵,數據的順序很重要。

數據具有多個組,並且這些組在其中具有單數和/或嵌套組。 由於數據采用分層格式,因此每個組都將被拆分。 所以每個“GROUP”都有自己的格式,然后都需要連接到一行(或多行)行。

樣本數據文件:

"TRANS","23115168","","","OTVST","","23115168","","COMLT","","",20180216,"OAMI","501928",,
"MTPNT","UPDTE",2415799999,"","","17","","",,20180216,
"ASSET","","REPRT","METER","","CR","E6VG470","LPG",2017,"E6S05633099999","","","LI"
"METER","","U","S1",6.0000,"","",20171108,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00990"
"ASSET","","REMVE","METER","","CR","E6VG470","LPG",2017,"E6S05633099999","","","LI"
"METER","","U","S1",6.0000,"","",20171108,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00990"
"ASSET","","INSTL","METER","","CR","E6VG470","LPG",2017,"E6S06769699999","","","LI"
"METER","","U","S1",6.0000,"","",20180216,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00000"
"APPNT","",20180216,,"","123900",""

輸入數據時應存在的層次結構。 我在想以后可能會有幾張桌子可以連接在一起。 (用於說明父子級別的數字)

1. Transaction [TRANS]
   1.1. Meter Point [MTPNT]
      1.1.1. Asset [ASSET]
         1.1.1.1. Meter [METER]
         1.1.1.2. Converter [CONVE]
         1.1.1.3. Register Details [REGST]
            1.1.1.3.1. Reading [READG]
         1.1.1.4. Market Participant [MKPRT]
         1.1.1.5. Name [NAME]
            1.1.1.5.1. Address [ADDRS]
            1.1.1.5.2. Contact Mechanism [CONTM]
   1.2. Appointment [APPNT]
   1.3. Name [NAME]
      1.3.1. Address [ADDRS]
      1.3.2. Contact Mechanism [CONTM]
   1.4. Market Participant [MKPRT]

行業 GAS 數據,因此在此流程中,每個 MTPNT 可以有很多 ASSET,而這些很多 ASSET 可以有很多 REGST,因為這是為 READG 保存儀表讀數的地方

我嘗試過按組使用並首先迭代。 處理,但我以前沒有處理過這種類型的數據。 我需要一種方法來拆分為每個分組創建一個鍵,當拆分並定義字段時,可以將其重新連接在一起。

我已經嘗試操作infile以便所有數據出現在每個 TRANS 的一行上,但是我仍然遇到應用字段的問題,並且排序是最重要的。

我已經設法為某些組獲得了一些密鑰,但是在拆分后它們並沒有完全重新組合在一起。

data TRANS;
    set mpancreate_a;
    by DataItmGrp NOTSORTED;

    if first.DataItmGrp then 
        do;
            if DataItmGrp = "TRANS" then 
                TRANSKey+1;
        end;
run;

data TRANS;
    set TRANS;
    TRANSKey2 + 1;
    by DataItmGrp NOTSORTED;

    if first.DataItmGrp then
        do;
            if DataItmGrp = "TRANS" then
                TRANSKEY2=1;
        end;


run;

data MTPNT;
    set TRANS;
    by DataItmGrp NOTSORTED;

    if first.DataItmGrp then
        do;
            if DataItmGrp = "MTPNT" then
                MTPNTKEY+1;
        end;
run;

data MTPNT;
    set MTPNT;
    by  MTPNTKEY NOTSORTED;

    if first.MTPNTKEY  and DataItmGrp = "MTPNT" then
        MTPNTKEY2=0;
    MTPNTKEY2+1;
run;

data ASSET;
    set MTPNT;

    IF MTPNTKEY = 0 THEN
        MTPNTKEY2=0;
    by DataItmGrp NOTSORTED;

    if first.DataItmGrp then
        do;
            if DataItmGrp = "ASSET" then
                ASSETKEY+1;
        end;
run;

data ASSET;
    set ASSET;
    by  ASSETKEY NOTSORTED;

    if first.ASSETKEY  and DataItmGrp = "ASSET" then
        ASSETKEY2=0;
    ASSETKEY2+1;

    IF ASSETKEY =0 THEN
        ASSETKEY2=0;
run;

我想要為找到的每個組提供一個計數器,並為該特定組保留一個計數器 - 但我無法根據上面的層次結構計算出如何進出分組

我希望一旦我有了這些密鑰,我就可以按組拆分數據,然后將它們重新連接在一起


        _n_     TRANS   TRANS2  MTPNT   MTPNT2
TRANS   1       1       0       0       0
MTPNT   2       2       1       1       1
ASSET   3       3       1       2       1
METER   4       4       1       3       1
READG   5       5       1       4       1
MTPNT   6       6       1       1       2
ASSET   7       7       1       2       2
METER   8       8       1       3       2
READG   9       9       1       4       2
APPNT   10      10      1       5       2
TRANS   11      1       2       6       2
MTPNT   12      2       2       1       3
ASSET   13      3       2       2       3
METER   14      4       2       3       3
READG   15      5       2       4       3
MTPNT   16      6       2       1       4
ASSET   17      7       2       2       4
METER   18      8       2       3       4
READG   19      9       2       4       4
APPNT   20      10      2       5       4   




從沒有明確標記的數據文件輸入分層數據是有問題的。 我的最佳建議是了解您想要提取的突出價值是什么,以及您想在什么背景下了解它們。 對於這個問題,最簡單的第一種方法是使用帶有分類變量的單個整體表來捕獲下降到顯着值(儀表讀數)的路徑。

更復雜的情況是每行中的第一個令牌驅動該行的輸入以及它所屬的 output 表。 由於沒有絕對或相對層次結構的地標 position(如在 NAME 和 MKPRT 中),因此沒有 100% 可靠的方式將它們放置在層次結構中,這也會影響從后續數據行讀取的項目的放置。

根據現實世界中真正的復雜性和對規則的遵守情況,您可能會或可能不會“錯過”某些值的閱讀。

假設有一個更簡單的目標是獲取儀表讀數。

data want;

length tier level1-level6 $8 path $64 meterReadingString $8 dummy $1;
retain level1-level5 path;
attrib readingdate informat=yymmdd10. format=yymmdd10.;

infile cards dsd missover;

input @1 tier @; * held input - dont advance read line yet;

if tier="TRANS" then do;
  level1 = tier;
  call missing (of level2-level6);
  path = catx("/", of level:);
end;

if tier="MTPNT" and path="TRANS" then do;
  level2 = tier;
  call missing (of level3-level6);
  path = catx("/", of level:);
end;

if tier="ASSET" and path="TRANS/MTPNT" then do;
  level3 = tier;
  call missing (of level4-level6);
  path = catx("/", of level:);
end;

if tier="METER" and path="TRANS/MTPNT/ASSET" then do;
  level4 = tier;
  call missing (of level5-level6);
  path = catx("/", of level:);
end;

if tier="REGST" and path="TRANS/MTPNT/ASSET/METER" then do;
  level5 = tier;
  call missing (of level6-level6);
  path = catx("/", of level:);
end;

if tier="READG" and path="TRANS/MTPNT/ASSET/METER/REGST" then do;
  level6 = tier;
  path = catx("/", of level:);
  input @1 tier readingdate dummy meterReadingString @; * reread line according to tier;

  meterReading = input(meterReadingString, best12.);

  if path = "TRANS/MTPNT/ASSET/METER/REGST/READG" then OUTPUT;
end;    

datalines;
"TRANS","23115168","","","OTVST","","23115168","","COMLT","","",20180216,"OAMI","501928",,
"MTPNT","UPDTE",2415799999,"","","17","","",,20180216,
"ASSET","","REPRT","METER","","CR","E6VG470","LPG",2017,"E6S05633099999","","","LI"
"METER","","U","S1",6.0000,"","",20171108,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00990"
"ASSET","","REMVE","METER","","CR","E6VG470","LPG",2017,"E6S05633099999","","","LI"
"METER","","U","S1",6.0000,"","",20171108,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00990"
"ASSET","","INSTL","METER","","CR","E6VG470","LPG",2017,"E6S06769699999","","","LI"
"METER","","U","S1",6.0000,"","",20180216,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00000"
"APPNT","",20180216,,"","123900",""
run;

您可以將其用作更復雜的閱讀器的基礎,該閱讀器具有不同的output <tier>數據集,用於遇到的每個層或層的路徑。 每層需要不同的input語句,類似於讀取READG的方式。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM