简体   繁体   English

Linux 基于数据子集的文件排序

[英]Linux File Sorting based on subset of data

OS - Red Hat Enterprise Linux 8.2 (Ootpa)操作系统 - 红帽企业 Linux 8.2 (Ootpa)

Following is my flat file.以下是我的平面文件。

G010XX   OLTP    PDOA210105210304210105000000000000000000000000F                        
G2019998199916        86010027472                       XXXLXXSEXXX860  XXUPMU TEST     
G2019998199916        86010027472                       XXXLXXSEXXX860  XXUPMU TEST     
G2019998199916        86010027472                       XXXLXXSEXXX860  XXUPMU TEST     
G2019998199916        86010027524                       XXXLXXSEXXX860  XXUPMU TEST     
G2019998199916        86010027524                       XXXLXXSEXXX860  XXUPMU TEST     
G2019998199916        86010027524                       XXXLXXSEXXX860  XXUPMU TEST     
G2029998199916        86010027472                       XXXLXXSEXXXXXW-00000000000000000
G2029998199916        86010027472                       XXXLXXSEXXXXXW-00000000000000001
G2029998199916        86010027472                       XXXLXXSEXXXXXW000000000039213488
G2029998199916        86010027524                       XXXLXXSEXXXXXW000000000000000000
G2029998199916        86010027524                       XXXLXXSEXXXXXW000000000000000002
G2029998199916        86010027524                       XXXLXXSEXXXXXW000000000000099357
G2039998199916        86010027472                       XXXLXXSEXXXXXW201210201210095900
G2039998199916        86010027472                       XXXLXXSEXXXXXW201210201210110141
G2039998199916        86010027472                       XXXLXXSEXXXXXW201210201210141946
G2039998199916        86010027524                       XXXLXXSEXXXXXW201210201210163210
G2039998199916        86010027524                       XXXLXXSEXXXXXW201211201211141445
G2039998199916        86010027524                       XXXLXXSEXXXXXW201211201211144629
G2049998199916        86010027472                       XXXLXXSEXXXXXW201210201210095900
G2049998199916        86010027472                       XXXLXXSEXXXXXW201210201210110141
G2049998199916        86010027472                       XXXLXXSEXXXXXW201210201210141946
G2049998199916        86010027524                       XXXLXXSEXXXXXW201210201210163210
G2049998199916        86010027524                       XXXLXXSEXXXXXW201211201211141445
G2049998199916        86010027524                       XXXLXXSEXXXXXW201211201211144629
G020000011140000000000000000000000.000                                                  
  1. Position 1-4 is Record Type Position 1-4 为记录型
  2. Position 23-54 is Account Number Position 23-54 是帐号
  3. Position 71-88 transaction date details (only for Record Type G203 and G204) Position 71-88 交易日期详细信息(仅适用于记录类型 G203 和 G204)

My requirement is to我的要求是

  1. Eliminate duplicates on record G201 and G202 based on account number根据帐号消除记录 G201 和 G202 上的重复项
  2. Perform sorting as follows Level 1 - Sort By account number Level 2 - Sort By record type Level 2 - Sort by transaction date (only available in G203 and G204)执行如下排序 级别 1 - 按帐号排序 级别 2 - 按记录类型排序 级别 2 - 按交易日期排序(仅在 G203 和 G204 中可用)

Expected Output预计 Output

G010XX   OLTP    PDOA210105210304210105000000000000000000000000F                        
G2019998199916        86010027472                       XXXLXXSEXXX860  XXUPMU TEST     
G2029998199916        86010027472                       XXXLXXSEXXXXXW-00000000000000000
G2039998199916        86010027472                       XXXLXXSEXXXXXW201210201210095900
G2049998199916        86010027472                       XXXLXXSEXXXXXW201210201210095900
G2039998199916        86010027472                       XXXLXXSEXXXXXW201210201210110141
G2049998199916        86010027472                       XXXLXXSEXXXXXW201210201210110141
G2039998199916        86010027472                       XXXLXXSEXXXXXW201210201210141946
G2049998199916        86010027472                       XXXLXXSEXXXXXW201210201210141946
G2019998199916        86010027524                       XXXLXXSEXXX860  XXUPMU TEST     
G2029998199916        86010027524                       XXXLXXSEXXXXXW000000000000000000
G2039998199916        86010027524                       XXXLXXSEXXXXXW201210201210163210
G2049998199916        86010027524                       XXXLXXSEXXXXXW201210201210163210
G2039998199916        86010027524                       XXXLXXSEXXXXXW201211201211141445
G2049998199916        86010027524                       XXXLXXSEXXXXXW201211201211141445
G2039998199916        86010027524                       XXXLXXSEXXXXXW201211201211144629
G2049998199916        86010027524                       XXXLXXSEXXXXXW201211201211144629
G020000011140000000000000000000000.000                                                  

I have tried two approaches.我尝试了两种方法。 But not getting desired output.但没有得到想要的 output。 oa is my file name. oa 是我的文件名。

[tmp] $ (
> grep "^G010" oa && \
> ( \
> grep "^G201" oa|sort -u -k 1.1,1.4 -k 1.23,1.56 && \
> grep "^G202" oa|sort -u -k 1.1,1.4 -k 1.23,1.56 && \
> grep -E "^(G203|G204|G205|G206)" oa | sort -k 1.23,1.56 -k 2.71,2.88 -k 3.1,3.4 \
> ) && \
> grep "^G020" oa
> )
G010KR   OLTP    PDOA210105210304210105000000000000000000000000F
G2019998199916        86010027472                       SCBLKRSEXXX860  KRUPMU TEST
G2019998199916        86010027524                       SCBLKRSEXXX860  KRUPMU TEST
G2029998199916        86010027472                       SCBLKRSEXXXKRW-00000000000000000
G2029998199916        86010027524                       SCBLKRSEXXXKRW000000000000000000
G2039998199916        86010027472                       SCBLKRSEXXXKRW201210201210110141
G2049998199916        86010027472                       SCBLKRSEXXXKRW201210201210110141
G2039998199916        86010027472                       SCBLKRSEXXXKRW201210201210141946
G2049998199916        86010027472                       SCBLKRSEXXXKRW201210201210141946
G2039998199916        86010027472                       SCBLKRSEXXXKRW201210201210095900
G2049998199916        86010027472                       SCBLKRSEXXXKRW201210201210095900
G2039998199916        86010027524                       SCBLKRSEXXXKRW201211201211141445
G2049998199916        86010027524                       SCBLKRSEXXXKRW201211201211141445
G2039998199916        86010027524                       SCBLKRSEXXXKRW201210201210163210
G2049998199916        86010027524                       SCBLKRSEXXXKRW201210201210163210
G2039998199916        86010027524                       SCBLKRSEXXXKRW201211201211144629
G2049998199916        86010027524                       SCBLKRSEXXXKRW201211201211144629
G020000011140000000000000000000000.000

[tmp] $ (
> grep "^G010" oa && \
> ( \
> grep "^G201" oa|sort -u -k 1.1,1.4 -k 1.23,1.56 && \
> grep "^G202" oa|sort -u -k 1.1,1.4 -k 1.23,1.56 && \
> grep -E "^(G203|G204|G205|G206)" oa | sort -k 1.23,1.56 -k 2.71,2.88 -k 3.1,3.4 \
> ) | sort -k 1.23,1.56 && \
> grep "^G020" oa
> )
G010KR   OLTP    PDOA210105210304210105000000000000000000000000F
G2019998199916        86010027472                       SCBLKRSEXXX860  KRUPMU TEST
G2029998199916        86010027472                       SCBLKRSEXXXKRW-00000000000000000
G2039998199916        86010027472                       SCBLKRSEXXXKRW201210201210095900
G2039998199916        86010027472                       SCBLKRSEXXXKRW201210201210110141
G2039998199916        86010027472                       SCBLKRSEXXXKRW201210201210141946
G2049998199916        86010027472                       SCBLKRSEXXXKRW201210201210095900
G2049998199916        86010027472                       SCBLKRSEXXXKRW201210201210110141
G2049998199916        86010027472                       SCBLKRSEXXXKRW201210201210141946
G2019998199916        86010027524                       SCBLKRSEXXX860  KRUPMU TEST
G2029998199916        86010027524                       SCBLKRSEXXXKRW000000000000000000
G2039998199916        86010027524                       SCBLKRSEXXXKRW201210201210163210
G2039998199916        86010027524                       SCBLKRSEXXXKRW201211201211141445
G2039998199916        86010027524                       SCBLKRSEXXXKRW201211201211144629
G2049998199916        86010027524                       SCBLKRSEXXXKRW201210201210163210
G2049998199916        86010027524                       SCBLKRSEXXXKRW201211201211141445
G2049998199916        86010027524                       SCBLKRSEXXXKRW201211201211144629
G020000011140000000000000000000000.000

Awk would be an ideal candidate for this (GNU awk for array sorting): Awk 将是一个理想的候选者(GNU awk 用于数组排序):

awk 'NR==1 { print;next } $3 == "" { endline=$0;next } { code=substr($1,1,4);map[code][$2][$3]=$0} END {PROCINFO["sorted_in"]="@ind_str_asc";for (i in map) { for (j in map[i]) { for (k in map[i][j]) { print map[i][j][k] } } } print endline }' ootpafile

Explanation:解释:

awk 'NR==1 { 
             print; # Print the line
             next # Skip to the next line
            } 
   $3 == "" { 
             endline=$0; # Set a variable endline to the current line where the 3rd space delimited field is empty
             next 
            } 
            { 
             code=substr($1,1,4); # Extract the first 4 characters into a variable code
             map[code][$2][$3]=$0 # Store the line in a 3 dimentional array indexed by code and other fields
             } 
         END {
              PROCINFO["sorted_in"]="@ind_str_asc"; # Set the ordering of the array
              for (i in map) { 
                for (j in map[i]) { 
                  for (k in map[i][j]) { 
                     print map[i][j][k] # Loop through the array and print the entries
                  } 
                 } 
               } 
               print endline # Print the end line
              }' ootpa

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM