简体   繁体   中英

Split a file based on a particular search pattern and also have a particular number of lines in the split file using AWK command

I have a requirement where I have to split a file at an underscore pattern and the lines in the resulting file should have a maximum of 5 lines, if the result exceeds 5 lines the file name should be changed and the rest of the lines should be put in the other split files and then the result should be grouped.

For eg:

My file contains,

ADD1_5001AB
ADD1_5002AB
ADD1_5003BC
ADD1_5004AB
ADD1_5005AB
ADD1_5006BC
ADD1_5007AB
ADD1_5008AB
ADD1_5009BC
ADD1_5010AB
ADD1_5011AB
ADD1_5012BC
ADD2_5100XY
ADD2_5101YZ
CANC1_5200AB
CANC1_5201BC
CANC2_5301GH 
CANC2_5302FG

So my result should have 6 files,

1st file should contain,

ADD1_5001AB
ADD1_5002AB 
ADD1_5003BC 
ADD1_5004AB
ADD1_5005AB

2nd file should contain,

ADD1_5006BC
ADD1_5007AB
ADD1_5008AB
ADD1_5009BC
ADD1_5010AB

3rd file should contain,

ADD1_5011AB
ADD1_5012BC

4th file should contain,

ADD2_5100XY
ADD2_5101YZ

5th file should contain,

CANC1_5200AB
CANC1_5201BC

6th file should contain,

CANC2_5301GH
CANC2_5302FG

Kindly Help.

You could use

awk -F _ 'prefix != $1 || line == 5 { line = 0; ++slab; out = sprintf("out%02d.txt", slab); prefix = $1 } { ++line; print > out }' input.txt

Where input.txt is the input file. This works as follows: _ is used as a field separator, so $1 is the prefix before the first _ . Then:

# prefix contains the last seen first field. When it changes or when the last
# slab grew to five lines long, we need to start a new output file. So
prefix != $1 || line == 5 {
  line = 0                            # reset line counter
  ++slab                              # increase slab number
  out = sprintf("out%02d.txt", slab)  # use that number to generate a new output
                                      # file name
  prefix = $1                         # and remember the new prefix
}

# then, for all lines:
{
  ++line                              # increase line counter
  print > out                         # and print the line to the current output
                                      # file.
}

As given, this will generate files out01.txt , out02.txt and so forth. Change the format string in the sprintf call to customize that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM