简体   繁体   中英

Merge two rows in UNIX using Sed / Awk

Consider source file in UNIX with below pipe delimited rows. This example has five rows. Row # 1,2,and 4 are good but Row # 3 and 5 split into two rows because of newline in the text. I have to merge line 3 into single row and line 5 into single row by removing new line only at t and then load into oracle table.

How this can be achieved using sed / awk or any other UNIX command?

Example for input:

 1. 9187-001|COS 60W 16G T1A CLV|||||10  
 2. 9184-002|COS 48W 28G NT SKO|FOOTAGE/SEQUENCE GRIDS||||10  
 3. 9679-229|COS 56G 40G NT SVO|"FOOTAGE/SEQUENCE GRIDS FOR STREETSCAPE STORES  
(ALL COSMETICS ON 60"" HIGH GONDOLAS"||||10  
 4. 9184-230|COS48W 48G NT LIF SKO|LIFE STORE COSMETIC FOOTAGE/SEQUENCE GRID||||10  
 5. 9679-230|COS 56G 44G NT SVO|"FOOTAGE/SEQUENCE GRIDS FOR STREETSCAPE STORES  
(ALL COSMETICS ON 60"" HIGH GONDOLAS"||||10 

Desired output:

1. 9187-001|COS 60W 16G T1A CLV|||||10  
2. 9184-002|COS 48W 28G NT SKO|FOOTAGE/SEQUENCE GRIDS||||10  
3. 9679-229|COS 56G 40G NT SVO|"FOOTAGE/SEQUENCE GRIDS FOR STREETSCAPE STORES(ALL COSMETICS ON 60"" HIGH GONDOLAS"||||10  
4. 9184-230|COS48W 48G NT LIF SKO|LIFE STORE COSMETIC FOOTAGE/SEQUENCE GRID||||10  
5. 9679-230|COS 56G 44G NT SVO|"FOOTAGE/SEQUENCE GRIDS FOR STREETSCAPE STORES(ALL COSMETICS ON 60"" HIGH GONDOLAS"||||10

Through perl,

perl -00pe 's/\n(?!\h*\d)//g' file

OR

$ perl -00pe 's/\n(?=\()//g' file
 1. 9187-001|COS 60W 16G T1A CLV|||||10  
 2. 9184-002|COS 48W 28G NT SKO|FOOTAGE/SEQUENCE GRIDS||||10  
 3. 9679-229|COS 56G 40G NT SVO|"FOOTAGE/SEQUENCE GRIDS FOR STREETSCAPE STORES(ALL COSMETICS ON 60"" HIGH GONDOLAS"||||10  
 4. 9184-230|COS48W 48G NT LIF SKO|LIFE STORE COSMETIC FOOTAGE/SEQUENCE GRID||||10  
 5. 9679-230|COS 56G 44G NT SVO|"FOOTAGE/SEQUENCE GRIDS FOR STREETSCAPE STORES(ALL COSMETICS ON 60"" HIGH GONDOLAS"||||10 

It appears that each line should have 7 fields:

awk -F'|' '
    {$0 = prev $0} 
    NF < 7 {prev = $0} 
    NF == 7 {print; prev=""}
' file

But really, you should be using a proper CSV parser:

perl -MText::CSV -Mautodie -E '
    $csv = Text::CSV->new({binary => 1, sep_char => "|", quote_space => 0});
    open $fh, "<", shift;
    while ($row = $csv->getline($fh)) {
        $csv->combine( map {s/\n//g; $_} @$row );
        say $csv->string();
    }
' file
 1. 9187-001|COS 60W 16G T1A CLV|||||10  
 2. 9184-002|COS 48W 28G NT SKO|FOOTAGE/SEQUENCE GRIDS||||10  
 3. 9679-229|COS 56G 40G NT SVO|"FOOTAGE/SEQUENCE GRIDS FOR STREETSCAPE STORES  (ALL COSMETICS ON 60"" HIGH GONDOLAS"||||10  
 4. 9184-230|COS48W 48G NT LIF SKO|LIFE STORE COSMETIC FOOTAGE/SEQUENCE GRID||||10  
 5. 9679-230|COS 56G 44G NT SVO|"FOOTAGE/SEQUENCE GRIDS FOR STREETSCAPE STORES  (ALL COSMETICS ON 60"" HIGH GONDOLAS"||||10 

With GNU awk for multi-char RS:

$ awk -v RS='^$' -v ORS= '{gsub(/\s*\n\(/,"(")}1' file
 1. 9187-001|COS 60W 16G T1A CLV|||||10
 2. 9184-002|COS 48W 28G NT SKO|FOOTAGE/SEQUENCE GRIDS||||10
 3. 9679-229|COS 56G 40G NT SVO|"FOOTAGE/SEQUENCE GRIDS FOR STREETSCAPE STORES(ALL COSMETICS ON 60"" HIGH GONDOLAS"||||10
 4. 9184-230|COS48W 48G NT LIF SKO|LIFE STORE COSMETIC FOOTAGE/SEQUENCE GRID||||10
 5. 9679-230|COS 56G 44G NT SVO|"FOOTAGE/SEQUENCE GRIDS FOR STREETSCAPE STORES(ALL COSMETICS ON 60"" HIGH GONDOLAS"||||10

也可以用awk完成

awk '{if(!match($0,"[0-9]\\. ")){print prev$0}else{print $0}; prev=$0}' file

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM