How to Split a Delimited Text file in Linux, based on no of records, which has end-of-record separator in data fields

Question

Problem Statement:

I have a delimited text file offloaded from Teradata which happens to have "\\n" (newline characters or EOL markers) inside data fields.

The same EOL marker is at the end of each new line for one entire line or record.

I need to split this file in two or more files (based on no of records given by me) while retaining the newline chars in data fields but against the line breaks at the end of each lines.

Example:

1|Alan
Wake|15
2|Nathan
Drake|10
3|Gordon
Freeman|11

Expectation :

file1.txt

1|Alan
Wake|15
2|Nathan
Drake|10

file2.txt

3|Gordon
Freeman|11

What i have tried :

 awk 'BEGIN{RS="\n"}NR%2==1{x="SplitF"++i;}{print > x}' inputfile.txt

The code can't discern between data field newlines and actual newlines. Is there a way it can be achieved?

EDIT:: i have changed the problem statement with example. Please share your thoughts on the new example.

Answer 1

Use the following awk approach:

awk '{ r=(r!="")?r RS $0 : $0; if(NR%4==0){ print r > "file"++i".txt"; r="" } }
       END{ if(r) print r > "file"++i".txt" }' inputfile.txt

NR%4==0 - your logical single line occupies two physical records, so we expect to separate on each 4 records

Results :

> cat file1.txt 
1|Alan
Wake
2|Nathan
Drake

> cat file2.txt 
3|Gordon
Freeman

Answer 2

If you are using GNU awk you can do this by setting RS appropriately, eg:

parse.awk

BEGIN { RS="[0-9]\\|" }

# Skip the empty first record by checking NF (Note: this will also skip
# any empty records later in the input)
NF {
  # Send record with the appropriate key to a numbered file
  printf("%s", d $0) > "file" i ".txt"
}

# When we found enough records, close current file and 
# prepare i for opening the next one
#
# Note: NR-1 because of the empty first record
(NR-1)%n == 0 { 
  close("file" i ".txt")
  i++
}

# Remember the record key in d, again, 
# becuase of the empty first record
{ d=RT }

Run it like this:

gawk -f parse.awk n=2 infile

Where n is the number of records to put into each file.

Output:

file1.txt

1|Alan
Wake|15
2|Nathan
Drake|10

file2.txt

3|Gordon
Freeman|11

How to Split a Delimited Text file in Linux, based on no of records, which has end-of-record separator in data fields

Question

2 answers

solution1
2 2017-06-16 11:17:53

solution2
0 ACCPTED 2017-06-16 11:56:11

How to Split a Delimited Text file in Linux, based on no of records, which has end-of-record separator in data fields

Question

2 answers

solution1 2 2017-06-16 11:17:53

solution2 0 ACCPTED 2017-06-16 11:56:11

solution1
2 2017-06-16 11:17:53

solution2
0 ACCPTED 2017-06-16 11:56:11