[英]SAS group by counters per variable - primary key creation
我有一些數據需要分成 12 個左右不同的組,沒有鍵,數據的順序很重要。
數據具有多個組,並且這些組在其中具有單數和/或嵌套組。 由於數據采用分層格式,因此每個組都將被拆分。 所以每個“GROUP”都有自己的格式,然后都需要連接到一行(或多行)行。
樣本數據文件:
"TRANS","23115168","","","OTVST","","23115168","","COMLT","","",20180216,"OAMI","501928",,
"MTPNT","UPDTE",2415799999,"","","17","","",,20180216,
"ASSET","","REPRT","METER","","CR","E6VG470","LPG",2017,"E6S05633099999","","","LI"
"METER","","U","S1",6.0000,"","",20171108,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00990"
"ASSET","","REMVE","METER","","CR","E6VG470","LPG",2017,"E6S05633099999","","","LI"
"METER","","U","S1",6.0000,"","",20171108,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00990"
"ASSET","","INSTL","METER","","CR","E6VG470","LPG",2017,"E6S06769699999","","","LI"
"METER","","U","S1",6.0000,"","",20180216,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00000"
"APPNT","",20180216,,"","123900",""
輸入數據時應存在的層次結構。 我在想以后可能會有幾張桌子可以連接在一起。 (用於說明父子級別的數字)
1. Transaction [TRANS]
1.1. Meter Point [MTPNT]
1.1.1. Asset [ASSET]
1.1.1.1. Meter [METER]
1.1.1.2. Converter [CONVE]
1.1.1.3. Register Details [REGST]
1.1.1.3.1. Reading [READG]
1.1.1.4. Market Participant [MKPRT]
1.1.1.5. Name [NAME]
1.1.1.5.1. Address [ADDRS]
1.1.1.5.2. Contact Mechanism [CONTM]
1.2. Appointment [APPNT]
1.3. Name [NAME]
1.3.1. Address [ADDRS]
1.3.2. Contact Mechanism [CONTM]
1.4. Market Participant [MKPRT]
行業 GAS 數據,因此在此流程中,每個 MTPNT 可以有很多 ASSET,而這些很多 ASSET 可以有很多 REGST,因為這是為 READG 保存儀表讀數的地方
我嘗試過按組使用並首先迭代。 處理,但我以前沒有處理過這種類型的數據。 我需要一種方法來拆分為每個分組創建一個鍵,當拆分並定義字段時,可以將其重新連接在一起。
我已經嘗試操作infile以便所有數據出現在每個 TRANS 的一行上,但是我仍然遇到應用字段的問題,並且排序是最重要的。
我已經設法為某些組獲得了一些密鑰,但是在拆分后它們並沒有完全重新組合在一起。
data TRANS;
set mpancreate_a;
by DataItmGrp NOTSORTED;
if first.DataItmGrp then
do;
if DataItmGrp = "TRANS" then
TRANSKey+1;
end;
run;
data TRANS;
set TRANS;
TRANSKey2 + 1;
by DataItmGrp NOTSORTED;
if first.DataItmGrp then
do;
if DataItmGrp = "TRANS" then
TRANSKEY2=1;
end;
run;
data MTPNT;
set TRANS;
by DataItmGrp NOTSORTED;
if first.DataItmGrp then
do;
if DataItmGrp = "MTPNT" then
MTPNTKEY+1;
end;
run;
data MTPNT;
set MTPNT;
by MTPNTKEY NOTSORTED;
if first.MTPNTKEY and DataItmGrp = "MTPNT" then
MTPNTKEY2=0;
MTPNTKEY2+1;
run;
data ASSET;
set MTPNT;
IF MTPNTKEY = 0 THEN
MTPNTKEY2=0;
by DataItmGrp NOTSORTED;
if first.DataItmGrp then
do;
if DataItmGrp = "ASSET" then
ASSETKEY+1;
end;
run;
data ASSET;
set ASSET;
by ASSETKEY NOTSORTED;
if first.ASSETKEY and DataItmGrp = "ASSET" then
ASSETKEY2=0;
ASSETKEY2+1;
IF ASSETKEY =0 THEN
ASSETKEY2=0;
run;
我想要為找到的每個組提供一個計數器,並為該特定組保留一個計數器 - 但我無法根據上面的層次結構計算出如何進出分組
我希望一旦我有了這些密鑰,我就可以按組拆分數據,然后將它們重新連接在一起
_n_ TRANS TRANS2 MTPNT MTPNT2
TRANS 1 1 0 0 0
MTPNT 2 2 1 1 1
ASSET 3 3 1 2 1
METER 4 4 1 3 1
READG 5 5 1 4 1
MTPNT 6 6 1 1 2
ASSET 7 7 1 2 2
METER 8 8 1 3 2
READG 9 9 1 4 2
APPNT 10 10 1 5 2
TRANS 11 1 2 6 2
MTPNT 12 2 2 1 3
ASSET 13 3 2 2 3
METER 14 4 2 3 3
READG 15 5 2 4 3
MTPNT 16 6 2 1 4
ASSET 17 7 2 2 4
METER 18 8 2 3 4
READG 19 9 2 4 4
APPNT 20 10 2 5 4
從沒有明確標記的數據文件輸入分層數據是有問題的。 我的最佳建議是了解您想要提取的突出價值是什么,以及您想在什么背景下了解它們。 對於這個問題,最簡單的第一種方法是使用帶有分類變量的單個整體表來捕獲下降到顯着值(儀表讀數)的路徑。
更復雜的情況是每行中的第一個令牌驅動該行的輸入以及它所屬的 output 表。 由於沒有絕對或相對層次結構的地標 position(如在 NAME 和 MKPRT 中),因此沒有 100% 可靠的方式將它們放置在層次結構中,這也會影響從后續數據行讀取的項目的放置。
根據現實世界中真正的復雜性和對規則的遵守情況,您可能會或可能不會“錯過”某些值的閱讀。
假設有一個更簡單的目標是獲取儀表讀數。
data want;
length tier level1-level6 $8 path $64 meterReadingString $8 dummy $1;
retain level1-level5 path;
attrib readingdate informat=yymmdd10. format=yymmdd10.;
infile cards dsd missover;
input @1 tier @; * held input - dont advance read line yet;
if tier="TRANS" then do;
level1 = tier;
call missing (of level2-level6);
path = catx("/", of level:);
end;
if tier="MTPNT" and path="TRANS" then do;
level2 = tier;
call missing (of level3-level6);
path = catx("/", of level:);
end;
if tier="ASSET" and path="TRANS/MTPNT" then do;
level3 = tier;
call missing (of level4-level6);
path = catx("/", of level:);
end;
if tier="METER" and path="TRANS/MTPNT/ASSET" then do;
level4 = tier;
call missing (of level5-level6);
path = catx("/", of level:);
end;
if tier="REGST" and path="TRANS/MTPNT/ASSET/METER" then do;
level5 = tier;
call missing (of level6-level6);
path = catx("/", of level:);
end;
if tier="READG" and path="TRANS/MTPNT/ASSET/METER/REGST" then do;
level6 = tier;
path = catx("/", of level:);
input @1 tier readingdate dummy meterReadingString @; * reread line according to tier;
meterReading = input(meterReadingString, best12.);
if path = "TRANS/MTPNT/ASSET/METER/REGST/READG" then OUTPUT;
end;
datalines;
"TRANS","23115168","","","OTVST","","23115168","","COMLT","","",20180216,"OAMI","501928",,
"MTPNT","UPDTE",2415799999,"","","17","","",,20180216,
"ASSET","","REPRT","METER","","CR","E6VG470","LPG",2017,"E6S05633099999","","","LI"
"METER","","U","S1",6.0000,"","",20171108,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00990"
"ASSET","","REMVE","METER","","CR","E6VG470","LPG",2017,"E6S05633099999","","","LI"
"METER","","U","S1",6.0000,"","",20171108,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00990"
"ASSET","","INSTL","METER","","CR","E6VG470","LPG",2017,"E6S06769699999","","","LI"
"METER","","U","S1",6.0000,"","",20180216,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00000"
"APPNT","",20180216,,"","123900",""
run;
您可以將其用作更復雜的閱讀器的基礎,該閱讀器具有不同的output <tier>
數據集,用於遇到的每個層或層的路徑。 每層需要不同的input
語句,類似於讀取READG
的方式。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.