简体   繁体   中英

How to Split a Delimited Text file in Linux, based on no of records, which has end-of-record separator in data fields

Problem Statement:

I have a delimited text file offloaded from Teradata which happens to have "\\n" (newline characters or EOL markers) inside data fields.

The same EOL marker is at the end of each new line for one entire line or record.

I need to split this file in two or more files (based on no of records given by me) while retaining the newline chars in data fields but against the line breaks at the end of each lines.

Example:

1|Alan
Wake|15
2|Nathan
Drake|10
3|Gordon
Freeman|11

Expectation :

file1.txt

1|Alan
Wake|15
2|Nathan
Drake|10  

file2.txt

3|Gordon
Freeman|11 

What i have tried :

 awk 'BEGIN{RS="\n"}NR%2==1{x="SplitF"++i;}{print > x}' inputfile.txt

The code can't discern between data field newlines and actual newlines. Is there a way it can be achieved?

EDIT:: i have changed the problem statement with example. Please share your thoughts on the new example.

Use the following awk approach:

awk '{ r=(r!="")?r RS $0 : $0; if(NR%4==0){ print r > "file"++i".txt"; r="" } }
       END{ if(r) print r > "file"++i".txt" }' inputfile.txt
  • NR%4==0 - your logical single line occupies two physical records, so we expect to separate on each 4 records

Results :

> cat file1.txt 
1|Alan
Wake
2|Nathan
Drake

> cat file2.txt 
3|Gordon
Freeman

If you are using GNU awk you can do this by setting RS appropriately, eg:

parse.awk

BEGIN { RS="[0-9]\\|" }

# Skip the empty first record by checking NF (Note: this will also skip
# any empty records later in the input)
NF {
  # Send record with the appropriate key to a numbered file
  printf("%s", d $0) > "file" i ".txt"
}

# When we found enough records, close current file and 
# prepare i for opening the next one
#
# Note: NR-1 because of the empty first record
(NR-1)%n == 0 { 
  close("file" i ".txt")
  i++
}

# Remember the record key in d, again, 
# becuase of the empty first record
{ d=RT }

Run it like this:

gawk -f parse.awk n=2 infile

Where n is the number of records to put into each file.

Output:

file1.txt

1|Alan
Wake|15
2|Nathan
Drake|10

file2.txt

3|Gordon
Freeman|11

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM