[英]Python output fixed width format text file with special lines as SAS do
我有如下示例數據:
# df
VAR1 SEQ VAR2 VAR3 DATE VAR4 VAR5 VAR6 VAR7
AAA 1 YYY 01 20000630 AL 11111 ABCD PA
BBB 1 YYY 01 20100701 GA 12345 EDED NY
BBB 2 YYY 01 20150815 GA 12345 NY
BBB 3 YYY 01 19950105 GA 12345 YTRU NY
BBB 4 YYY 01 20000701 GA 12345 IIII NY
BBB 5 YYY 01 20210701 GA 12345 NY
CCC 1 NNN 01 20210630 CA 33333 SSSS NJ
CCC 2 NNN 01 20210629 CA 33333 NJ
在SAS
,我們可以導出固定寬度格式的文件,如下所示:
BLANK_VAR1 = " "
%MACRO FRIST;
PUT @ 1 "00FIRST"
@ 8 VAR1 $CHAR5.
@ 13 BLANK_VAR1 $CHAR2.
@ 15 VAR2 $CHAR3.
;
%MEND FRIST;
%MACRO SECOND;
PUT @ 1 "00SECOND"
@ 9 VAR3 $CHAR2.
@ 11 BLANK_VAR1 $CHAR2.
@ 13 VAR4 $CHAR2.
@ 15 VAR5 $CHAR5.
;
%MEND SECOND;
%MACRO THIRD(sequence);
num = &sequence.;
PUT @ 1 num Z2.0
@ 3 "THIRD" $CHAR5.
@ 8 DATE $CHAR8.
;
%MEND THIRD;
%MACRO FOURTH(sequence);
num = &sequence.;
PUT @ 1 num Z2.0
@ 3 "FOURTH" $CHAR5.
@ 9 VAR6 $CHAR25.
@ 34 BLANK_VAR1 $CHAR2.
@ 36 VAR7 $CHAR2.
;
%MEND FOURTH;
filename outtmp "/home/folder/outfile_tmp";
DATA _NULL_;
SET df;
BY VAR1 SEQ;
FILE outtmp;
IF FIRST.VAR1 THEN DO;
%FRIST;
%SECOND;
REC_CNT = 0;
END;
REC_CNT + 1;
IF REC_CNT LE 3 THEN DO;
%THIRD(REC_CNT);
IF VAR6 NE ' ' THEN DO;
%FOURTH(COUNTN);
END;
END;
RUN;
filename output "/home/folder/output";
%MACRO INREC;
PUT 001 RECIN $CHAR150.;
%MEND INREC;
%MACRO FILE_FIRST;
DATE = TODAY();
PUT @ 1 "###FIRSTLINE###"
@ 16 DATE JULIAN5.
@ 21 BLANK_VAR1 $CHAR2.
@ 23 "###FIRSTLINEEND###"
;
%MEND FILE_FIRST;
%MACRO FILE_LAST;
DATE = TODAY();
PUT @ 1 "###LASTLINE###"
@ 15 DATE JULIAN5.
@ 20 BLANK_VAR1 $CHAR2.
@ 22 "###LASTLINEEND###"
;
%MEND FILE_LAST;
DATA output;
INFILE outtmp truncover;
INPUT
@ 001 RECIN $CHAR150.;
RUN;
DATA _NULL_;
SET output end=last;
file output lrecl=256 ;
IF _N_ = 1 THEN DO;
%FILE_FIRST;
END;
%INREC;
IF last THEN DO;
%FILE_LAST;
END;
RUN;
###FIRSTLINE###21182 ###LASTLINEEND###
00FIRSTAAA YYY
00SECOND01 AL11111
01THIRD20000630
01FOURTHABCD PA
00FIRSTBBB YYY
00SECOND01 GA12345
01THIRD20100701
01FOURTHEDED NY
02THIRD20150815
03THIRD19950105
03FOURTHYTRU NY
00FIRSTCCC NNN
00SECOND01 CA33333
01THIRD20210630
01FOURTHSSSS NJ
###LASTLINE###21182 ###LASTLINEEND###
上述程序的邏輯是:
VAR1
,則只輸出FIRST
和SECOND
一次。SEQ
小於 3 時輸出THIRD
部分。如果SEQ
大於 3,則不輸出。 忽略。FOURTH
部分,如果VAR6
沒有丟失。THIRD
和FOURTH
部分中,前兩個字符串應從01
更改為03
取決於記錄。 如何在Python
復制這種格式?
我發現帶有fmt
參數的np.savetxt()
可能是一種方式鏈接; 但是,文件應該與原始數據幀的順序相同。
pandas
有函數read_fwf()
來讀取固定寬度格式的文件; 但是,沒有要導出的to_fwf()
函數。
我已經被困了好幾天了,所以任何想法都應該有所幫助!
這並不是一個很好的方法,但也許它可以讓您了解如何執行邏輯。 我只是在寫一個列表,然后你可以寫出這個列表 - 但你可能應該像 JonSG 在他(已刪除)的答案中所做的那樣,你可以使用文件編寫器。 使用數據類可能有更好的方法,但這不是我的專長。
import pandas as pd
df = pd.read_csv(r"h:\temp\df_text.csv")
outlist = []
for index,row in df.iterrows():
if(row['SEQ']==1):
tempstr = '00FIRST'+row.VAR1+' '+row.VAR2
outlist.append(tempstr)
tempstr = '00SECOND'+str(row.VAR3)+' '+str(row.VAR4)+str(row.VAR5)
outlist.append(tempstr)
if(row['SEQ'] <= 3):
seqval ='0'+str(row.SEQ) if row.SEQ < 10 else str(row.SEQ)
tempstr = str(row['SEQ'])+'THIRD'+str(row.DATE)
outlist.append(tempstr)
if (row.VAR6 != ' '):
tempstr = str(row['SEQ'])+'FOURTH'+row.VAR6+' '+row.VAR7
outlist.append(tempstr)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.