[英]Combine CSV files and concatenate some columns into single column via AIX awk, sed, ksh
我有多個文件,像這樣:
File_1.csv:
"Job Id", "Batch Id","Id","Success","Created","Error","Col1","Col2","Col3"
aaabbb111,xxxyyy999,"false","false","Horrible_Error: Really Bad Error occured: yeah", "Val1", "Val2", "Val3"
cccddd222,pppqqq888,"","false","Horrible_Error: Anoter Bad Error occured: ouch", "Val1", "Val2", "Val3"
File_2.csv:
"Job Id", "Batch Id","Id","Success","Created","Error","Col1","Col2","Col3","Col4", "Col5"
aaabbb111,xxxyyy999,"false","false","Horrible_Error: Really Bad Error occured: oops","Val1","Val2","Val3","Val4","Val5"
cccddd222,pppqqq888,"","false","Horrible_Error: Anoter Bad Error occured: oh-no", "Val1","Val1","Val2","Val3","Val4","Val5"
每個文件的前6列始終具有相同的名稱。 其余列的名稱和數量各不相同,我想將它們捕獲為單列,並用雙引號,方括號或大括號括起來,或者用其他表示此數據的方式包圍它們。
我需要能夠將這些文件組合成一個看起來像這樣的文件。 標頭是可選的,僅用於說明目的:
"File_Name"|"Job Id"|"Batch Id"|"Id"|"Success"|"Created"|"Error"|"Tran_Header"|"Tran_Record"
File_1.csv|aaabbb111|xxxyyy999|"false"|"false"|"Horrible_Error: Really Bad Error occured: yeah"|["Col1","Col2","Col3"]|["Val1","Val2","Val3"]
File_1.csv|cccddd222|pppqqq888|""|"false"|"Horrible_Error: Anoter Bad Error occured: ouch"|["Col1","Col2","Col3"]|["Val1","Val2","Val3"]
File_2.csv|aaabbb111|xxxyyy999|"false"|"false"|"Horrible_Error: Really Bad Error occured: oops"|["Col1","Col2","Col3","Col4", "Col5"]|["Val1","Val2","Val3","Val4","Val5"]
File_2.csv|cccddd222|pppqqq888|""|"false"|"Horrible_Error: Anoter Bad Error occured: oh-no"|["Col1","Col2","Col3","Col4", "Col5"]|["Val1","Val1","Val2","Val3","Val4","Val5"]
我嘗試了以下方法來合並文件,但是這段代碼有時會阻塞替換雙引號,然后我的ETL工具反過來會阻塞分析串聯的列集(而且我也不知道如何將標頭捕獲到單獨的列中) :
outdirectory=/some/directory
outfilename=some_file_name.csv
for i in *.csv
do
filename=$(echo "${i}")
tail +2 "${i}" | sed -e 's/,/#|#/1' -e 's/,/#|#/1' -e 's/,/#|#/1' -e 's/,/#|#/1' -e 's/,/#|#/1' -e 's/,/#|#/1' -e s/\"//g -e "s/^/#${filename}/" -e s/$/#/ | sed s/#/\"/g >> "${outdirectory}/${outfilename}"
mv $i $srcdir/
done
任何幫助或想法,我們將不勝感激。 我對UNIX shell腳本一無所知。 差點忘了,我在AIX v6.2上
使用awk
的解決方案(我使用gnu-awk)
awk 'BEGIN{FS=",";OFS="|"}
{
if(FNR==1){
if(NR==1){
print "\"File_Name\"",$1,$2,$3,$4,$5,$6,"\"Tran_Header\"","\"Tran_Record\"";
}
$1=$2=$3=$4=$5=$6="";
gsub("[|]+",",",$0);
gsub("^,","",$0);
titleCol = $0;
}else{
temp = FILENAME OFS $1 OFS $2 OFS $3 OFS $4 OFS $5 OFS "["titleCol"]";
$1=$2=$3=$4=$5="";
gsub("[|]+",",",$0);
gsub("^,","",$0);
print temp OFS "["$0"]";
}
}' *.csv
你得到:
"File_Name"|"Job Id"|"Batch Id"|"Id"|"Success"|"Created"|"Error"|"Tran_Header"|"Tran_Record" File_1.csv|aaabbb111|xxxyyy999|"false"|"false"|"Horrible_Error: Really Bad Error occured: yeah"|["Col1","Col2","Col3"]|["Val1","Val2","Val3"] File_1.csv|cccddd222|pppqqq888|""|"false"|"Horrible_Error: Anoter Bad Error occured: ouch"|["Col1","Col2","Col3"]|["Val1","Val2","Val3"] File_2.csv|aaabbb111|xxxyyy999|"false"|"false"|"Horrible_Error: Really Bad Error occured: oops"|["Col1","Col2","Col3","Col4","Col5"]|["Val1","Val2","Val3","Val4","Val5"] File_2.csv|cccddd222|pppqqq888|""|"false"|"Horrible_Error: Anoter Bad Error occured: oh-no"|["Col1","Col2","Col3","Col4","Col5"]|["Val1","Val1","Val2","Val3","Val4","Val5"]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.