如何在Linux中根据记录数分割定界文本文件，该文件在数据字段中具有记录结尾分隔符

Question

Problem Statement: 问题陈述：

I have a delimited text file offloaded from Teradata which happens to have "\\n" (newline characters or EOL markers) inside data fields. 我从Teradata卸载了一个分隔的文本文件，该文件恰好在数据字段中包含“ \\ n”（换行符或EOL标记）。

The same EOL marker is at the end of each new line for one entire line or record. 对于一个完整的行或记录，在每个新行的末尾都使用相同的EOL标记。

I need to split this file in two or more files (based on no of records given by me) while retaining the newline chars in data fields but against the line breaks at the end of each lines. 我需要将此文件拆分为两个或多个文件（基于我给出的记录数），同时在数据字段中保留换行符，但要针对每行末尾的换行符。

Example: 例：

1|Alan
Wake|15
2|Nathan
Drake|10
3|Gordon
Freeman|11

Expectation : 期望值：

file1.txt file1.txt

1|Alan
Wake|15
2|Nathan
Drake|10

file2.txt file2.txt

3|Gordon
Freeman|11

What i have tried : 我试过的

 awk 'BEGIN{RS="\n"}NR%2==1{x="SplitF"++i;}{print > x}' inputfile.txt

The code can't discern between data field newlines and actual newlines. 代码无法区分数据字段换行符和实际换行符。 Is there a way it can be achieved? 有没有办法可以实现？

EDIT:: i have changed the problem statement with example. 编辑：：我已经改变了问题的例子。 Please share your thoughts on the new example. 请分享您对新示例的想法。

Answer 1

Use the following awk approach: 使用以下awk方法：

awk '{ r=(r!="")?r RS $0 : $0; if(NR%4==0){ print r > "file"++i".txt"; r="" } }
       END{ if(r) print r > "file"++i".txt" }' inputfile.txt

NR%4==0 - your logical single line occupies two physical records, so we expect to separate on each 4 records NR%4==0您的逻辑单行占用两条物理记录，因此我们希望每4条记录分开

Results : 结果：

> cat file1.txt 
1|Alan
Wake
2|Nathan
Drake

> cat file2.txt 
3|Gordon
Freeman

Answer 2

If you are using GNU awk you can do this by setting RS appropriately, eg: 如果您使用的是GNU awk，则可以通过适当设置RS来做到这一点，例如：

parse.awk parse.awk

BEGIN { RS="[0-9]\\|" }

# Skip the empty first record by checking NF (Note: this will also skip
# any empty records later in the input)
NF {
  # Send record with the appropriate key to a numbered file
  printf("%s", d $0) > "file" i ".txt"
}

# When we found enough records, close current file and 
# prepare i for opening the next one
#
# Note: NR-1 because of the empty first record
(NR-1)%n == 0 { 
  close("file" i ".txt")
  i++
}

# Remember the record key in d, again, 
# becuase of the empty first record
{ d=RT }

Run it like this: 像这样运行它：

gawk -f parse.awk n=2 infile

Where n is the number of records to put into each file. 其中n是要放入每个文件中的记录数。

Output: 输出：

file1.txt file1.txt

1|Alan
Wake|15
2|Nathan
Drake|10

file2.txt file2.txt

3|Gordon
Freeman|11

如何在Linux中根据记录数分割定界文本文件，该文件在数据字段中具有记录结尾分隔符

问题描述

2 个解决方案

解决方案1
2 2017-06-16 11:17:53

解决方案2
0 已采纳 2017-06-16 11:56:11

如何在Linux中根据记录数分割定界文本文件，该文件在数据字段中具有记录结尾分隔符

问题描述

2 个解决方案

解决方案1 2 2017-06-16 11:17:53

解决方案2 0 已采纳 2017-06-16 11:56:11

解决方案1
2 2017-06-16 11:17:53

解决方案2
0 已采纳 2017-06-16 11:56:11