简体   繁体   English

如何在Linux中根据记录数分割定界文本文件,该文件在数据字段中具有记录结尾分隔符

[英]How to Split a Delimited Text file in Linux, based on no of records, which has end-of-record separator in data fields

Problem Statement: 问题陈述:

I have a delimited text file offloaded from Teradata which happens to have "\\n" (newline characters or EOL markers) inside data fields. 我从Teradata卸载了一个分隔的文本文件,该文件恰好在数据字段中包含“ \\ n”(换行符或EOL标记)。

The same EOL marker is at the end of each new line for one entire line or record. 对于一个完整的行或记录,在每个新行的末尾都使用相同的EOL标记。

I need to split this file in two or more files (based on no of records given by me) while retaining the newline chars in data fields but against the line breaks at the end of each lines. 我需要将此文件拆分为两个或多个文件(基于我给出的记录数),同时在数据字段中保留换行符,但要针对每行末尾的换行符。

Example: 例:

1|Alan
Wake|15
2|Nathan
Drake|10
3|Gordon
Freeman|11

Expectation : 期望值:

file1.txt file1.txt

1|Alan
Wake|15
2|Nathan
Drake|10  

file2.txt file2.txt

3|Gordon
Freeman|11 

What i have tried : 我试过的

 awk 'BEGIN{RS="\n"}NR%2==1{x="SplitF"++i;}{print > x}' inputfile.txt

The code can't discern between data field newlines and actual newlines. 代码无法区分数据字段换行符和实际换行符。 Is there a way it can be achieved? 有没有办法可以实现?

EDIT:: i have changed the problem statement with example. 编辑::我已经改变了问题的例子。 Please share your thoughts on the new example. 请分享您对新示例的想法。

Use the following awk approach: 使用以下awk方法:

awk '{ r=(r!="")?r RS $0 : $0; if(NR%4==0){ print r > "file"++i".txt"; r="" } }
       END{ if(r) print r > "file"++i".txt" }' inputfile.txt
  • NR%4==0 - your logical single line occupies two physical records, so we expect to separate on each 4 records NR%4==0您的逻辑单行占用两条物理记录,因此我们希望每4条记录分开

Results : 结果

> cat file1.txt 
1|Alan
Wake
2|Nathan
Drake

> cat file2.txt 
3|Gordon
Freeman

If you are using GNU awk you can do this by setting RS appropriately, eg: 如果您使用的是GNU awk,则可以通过适当设置RS来做到这一点,例如:

parse.awk parse.awk

BEGIN { RS="[0-9]\\|" }

# Skip the empty first record by checking NF (Note: this will also skip
# any empty records later in the input)
NF {
  # Send record with the appropriate key to a numbered file
  printf("%s", d $0) > "file" i ".txt"
}

# When we found enough records, close current file and 
# prepare i for opening the next one
#
# Note: NR-1 because of the empty first record
(NR-1)%n == 0 { 
  close("file" i ".txt")
  i++
}

# Remember the record key in d, again, 
# becuase of the empty first record
{ d=RT }

Run it like this: 像这样运行它:

gawk -f parse.awk n=2 infile

Where n is the number of records to put into each file. 其中n是要放入每个文件中的记录数。

Output: 输出:

file1.txt file1.txt

1|Alan
Wake|15
2|Nathan
Drake|10

file2.txt file2.txt

3|Gordon
Freeman|11

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在awk中使用if语句,以在Linux中对具有日期和其他值以分隔格式的文件使用 - How to use if statement in awk, for a file which has date and other values in delimited format in linux 如何使用linux命令提取与文本文件中特定字段匹配的文本 - how to extract text which matches particular fields in text file using linux commands Uniqing基于字段子集的分隔文件 - Uniqing a delimited file based on a subset of fields 如何在制表符分隔的文本linux文件中的特定列中添加常量 - How to add constant to specific columns in tab delimited text linux file 如何将 pipe 分隔数据文件与 python 或 linux 中的表数据连接起来 - how to join pipe delimited data file with a table data in python or linux 需要在Linux中将制表符分隔的文本拆分为多个变量 - need to split tab delimited text into multiple variables in linux 在linux中基于文件内的文本拆分大文件的最快方法 - Quickest way to split a large file based on text within the file in linux 如何根据给定的模式将linux中的文本文件从文件底部拆分到顶部 - How to split a text file in linux from the bottom of the file to the top based on a given pattern 如何将一个文本文件逐行拆分为两个按定界符分隔的文本文件? - How can I split a text file line by line into 2 text files by delimited column? 想要从具有页眉页脚的文件中分隔数据,以便数据整理以进行性能分析 - want delimited data from file which has headers footers,for data crunching to do performance analysis
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM