[英]Linux File Sorting based on subset of data
OS - Red Hat Enterprise Linux 8.2 (Ootpa)操作系统 - 红帽企业 Linux 8.2 (Ootpa)
Following is my flat file.以下是我的平面文件。
G010XX OLTP PDOA210105210304210105000000000000000000000000F
G2019998199916 86010027472 XXXLXXSEXXX860 XXUPMU TEST
G2019998199916 86010027472 XXXLXXSEXXX860 XXUPMU TEST
G2019998199916 86010027472 XXXLXXSEXXX860 XXUPMU TEST
G2019998199916 86010027524 XXXLXXSEXXX860 XXUPMU TEST
G2019998199916 86010027524 XXXLXXSEXXX860 XXUPMU TEST
G2019998199916 86010027524 XXXLXXSEXXX860 XXUPMU TEST
G2029998199916 86010027472 XXXLXXSEXXXXXW-00000000000000000
G2029998199916 86010027472 XXXLXXSEXXXXXW-00000000000000001
G2029998199916 86010027472 XXXLXXSEXXXXXW000000000039213488
G2029998199916 86010027524 XXXLXXSEXXXXXW000000000000000000
G2029998199916 86010027524 XXXLXXSEXXXXXW000000000000000002
G2029998199916 86010027524 XXXLXXSEXXXXXW000000000000099357
G2039998199916 86010027472 XXXLXXSEXXXXXW201210201210095900
G2039998199916 86010027472 XXXLXXSEXXXXXW201210201210110141
G2039998199916 86010027472 XXXLXXSEXXXXXW201210201210141946
G2039998199916 86010027524 XXXLXXSEXXXXXW201210201210163210
G2039998199916 86010027524 XXXLXXSEXXXXXW201211201211141445
G2039998199916 86010027524 XXXLXXSEXXXXXW201211201211144629
G2049998199916 86010027472 XXXLXXSEXXXXXW201210201210095900
G2049998199916 86010027472 XXXLXXSEXXXXXW201210201210110141
G2049998199916 86010027472 XXXLXXSEXXXXXW201210201210141946
G2049998199916 86010027524 XXXLXXSEXXXXXW201210201210163210
G2049998199916 86010027524 XXXLXXSEXXXXXW201211201211141445
G2049998199916 86010027524 XXXLXXSEXXXXXW201211201211144629
G020000011140000000000000000000000.000
My requirement is to我的要求是
Expected Output预计 Output
G010XX OLTP PDOA210105210304210105000000000000000000000000F
G2019998199916 86010027472 XXXLXXSEXXX860 XXUPMU TEST
G2029998199916 86010027472 XXXLXXSEXXXXXW-00000000000000000
G2039998199916 86010027472 XXXLXXSEXXXXXW201210201210095900
G2049998199916 86010027472 XXXLXXSEXXXXXW201210201210095900
G2039998199916 86010027472 XXXLXXSEXXXXXW201210201210110141
G2049998199916 86010027472 XXXLXXSEXXXXXW201210201210110141
G2039998199916 86010027472 XXXLXXSEXXXXXW201210201210141946
G2049998199916 86010027472 XXXLXXSEXXXXXW201210201210141946
G2019998199916 86010027524 XXXLXXSEXXX860 XXUPMU TEST
G2029998199916 86010027524 XXXLXXSEXXXXXW000000000000000000
G2039998199916 86010027524 XXXLXXSEXXXXXW201210201210163210
G2049998199916 86010027524 XXXLXXSEXXXXXW201210201210163210
G2039998199916 86010027524 XXXLXXSEXXXXXW201211201211141445
G2049998199916 86010027524 XXXLXXSEXXXXXW201211201211141445
G2039998199916 86010027524 XXXLXXSEXXXXXW201211201211144629
G2049998199916 86010027524 XXXLXXSEXXXXXW201211201211144629
G020000011140000000000000000000000.000
I have tried two approaches.我尝试了两种方法。 But not getting desired output.
但没有得到想要的 output。 oa is my file name.
oa 是我的文件名。
[tmp] $ (
> grep "^G010" oa && \
> ( \
> grep "^G201" oa|sort -u -k 1.1,1.4 -k 1.23,1.56 && \
> grep "^G202" oa|sort -u -k 1.1,1.4 -k 1.23,1.56 && \
> grep -E "^(G203|G204|G205|G206)" oa | sort -k 1.23,1.56 -k 2.71,2.88 -k 3.1,3.4 \
> ) && \
> grep "^G020" oa
> )
G010KR OLTP PDOA210105210304210105000000000000000000000000F
G2019998199916 86010027472 SCBLKRSEXXX860 KRUPMU TEST
G2019998199916 86010027524 SCBLKRSEXXX860 KRUPMU TEST
G2029998199916 86010027472 SCBLKRSEXXXKRW-00000000000000000
G2029998199916 86010027524 SCBLKRSEXXXKRW000000000000000000
G2039998199916 86010027472 SCBLKRSEXXXKRW201210201210110141
G2049998199916 86010027472 SCBLKRSEXXXKRW201210201210110141
G2039998199916 86010027472 SCBLKRSEXXXKRW201210201210141946
G2049998199916 86010027472 SCBLKRSEXXXKRW201210201210141946
G2039998199916 86010027472 SCBLKRSEXXXKRW201210201210095900
G2049998199916 86010027472 SCBLKRSEXXXKRW201210201210095900
G2039998199916 86010027524 SCBLKRSEXXXKRW201211201211141445
G2049998199916 86010027524 SCBLKRSEXXXKRW201211201211141445
G2039998199916 86010027524 SCBLKRSEXXXKRW201210201210163210
G2049998199916 86010027524 SCBLKRSEXXXKRW201210201210163210
G2039998199916 86010027524 SCBLKRSEXXXKRW201211201211144629
G2049998199916 86010027524 SCBLKRSEXXXKRW201211201211144629
G020000011140000000000000000000000.000
[tmp] $ (
> grep "^G010" oa && \
> ( \
> grep "^G201" oa|sort -u -k 1.1,1.4 -k 1.23,1.56 && \
> grep "^G202" oa|sort -u -k 1.1,1.4 -k 1.23,1.56 && \
> grep -E "^(G203|G204|G205|G206)" oa | sort -k 1.23,1.56 -k 2.71,2.88 -k 3.1,3.4 \
> ) | sort -k 1.23,1.56 && \
> grep "^G020" oa
> )
G010KR OLTP PDOA210105210304210105000000000000000000000000F
G2019998199916 86010027472 SCBLKRSEXXX860 KRUPMU TEST
G2029998199916 86010027472 SCBLKRSEXXXKRW-00000000000000000
G2039998199916 86010027472 SCBLKRSEXXXKRW201210201210095900
G2039998199916 86010027472 SCBLKRSEXXXKRW201210201210110141
G2039998199916 86010027472 SCBLKRSEXXXKRW201210201210141946
G2049998199916 86010027472 SCBLKRSEXXXKRW201210201210095900
G2049998199916 86010027472 SCBLKRSEXXXKRW201210201210110141
G2049998199916 86010027472 SCBLKRSEXXXKRW201210201210141946
G2019998199916 86010027524 SCBLKRSEXXX860 KRUPMU TEST
G2029998199916 86010027524 SCBLKRSEXXXKRW000000000000000000
G2039998199916 86010027524 SCBLKRSEXXXKRW201210201210163210
G2039998199916 86010027524 SCBLKRSEXXXKRW201211201211141445
G2039998199916 86010027524 SCBLKRSEXXXKRW201211201211144629
G2049998199916 86010027524 SCBLKRSEXXXKRW201210201210163210
G2049998199916 86010027524 SCBLKRSEXXXKRW201211201211141445
G2049998199916 86010027524 SCBLKRSEXXXKRW201211201211144629
G020000011140000000000000000000000.000
Awk would be an ideal candidate for this (GNU awk for array sorting): Awk 将是一个理想的候选者(GNU awk 用于数组排序):
awk 'NR==1 { print;next } $3 == "" { endline=$0;next } { code=substr($1,1,4);map[code][$2][$3]=$0} END {PROCINFO["sorted_in"]="@ind_str_asc";for (i in map) { for (j in map[i]) { for (k in map[i][j]) { print map[i][j][k] } } } print endline }' ootpafile
Explanation:解释:
awk 'NR==1 {
print; # Print the line
next # Skip to the next line
}
$3 == "" {
endline=$0; # Set a variable endline to the current line where the 3rd space delimited field is empty
next
}
{
code=substr($1,1,4); # Extract the first 4 characters into a variable code
map[code][$2][$3]=$0 # Store the line in a 3 dimentional array indexed by code and other fields
}
END {
PROCINFO["sorted_in"]="@ind_str_asc"; # Set the ordering of the array
for (i in map) {
for (j in map[i]) {
for (k in map[i][j]) {
print map[i][j][k] # Loop through the array and print the entries
}
}
}
print endline # Print the end line
}' ootpa
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.