AWK拆分为多个分隔符行

Question

I'm trying to split a file using AWK one-line but the code below that I came with is not working properly. 我正在尝试使用AWK单行拆分文件，但我下面的代码无法正常工作。

awk '
BEGIN { idx=0; file="original_file.split." }
/^REC_DELIMITER.(HIGH|TOP)$/ { idx++ }
/^REC_DELIMITER.TOP$/,/^REC_DELIMITER.(HIGH|TOP)$/ { print > file sprintf("%03d", idx) }
' original_file

Test file is "original_file": 测试文件是“original_file”：

REC_DELIMITER.TOP
lineA1
lineA2
lineA3
REC_DELIMITER.HIGH
lineB1
lineB2
lineB3
REC_DELIMITER.TOP
lineC1
lineC2
lineC3
REC_DELIMITER.HIGH
lineD1
lineD2
lineD3

AWK code above is for REC_DELIMITER.TOP and it is giving me these files: 上面的AWK代码用于REC_DELIMITER.TOP，它给我这些文件：

original_file.split.001:
REC_DELIMITER.TOP

original_file.split.003:
REC_DELIMITER.TOP

however, I'm trying to get this: 但是，我试图得到这个：

original_file.split.001:
REC_DELIMITER.TOP
lineA1
lineA2
lineA3

original_file.split.003:
REC_DELIMITER.TOP
lineC1
lineC2
lineC3

There will be other record delimiters, and when needed, we can run for them like REC_DELIMITER.HIGH, this way getting files like below: 将有其他记录分隔符，并在需要时，我们可以像REC_DELIMITER.HIGH一样运行它们，这样获取如下文件：

original_file.split.002:
REC_DELIMITER.HIGH
lineB1
lineB2
lineB3

original_file.split.004:
REC_DELIMITER.HIGH
lineD1
lineD2
lineD3

Any help guys is very appreciate, I have been trying to get this working past few days and AWK code above is the best I was able to get. 任何帮助人员都非常感谢，我一直试图让这个工作过去几天，上面的AWK代码是我能够得到的最好的。 I need now help from AWK masters. 我现在需要来自AWK大师的帮助。 :) :)

Thank you! 谢谢！

Answer 1

You can try something like this: 你可以尝试这样的事情：

awk '
/REC_DELIMITER\.TOP/ {
    a=1
    b=0
    file = sprintf (FILENAME".split.%03d",++n)
}    
/REC_DELIMITER\.HIGH/ {
    b=1
    a=0
    file = sprintf (FILENAME".split.%03d",++n)
}  
a {
    print $0 > file
}    
b {
    print $0 > file
}' file

Answer 2

You need something like this (untested): 你需要这样的东西（未经测试）：

awk -v dtype="TOP" '
BEGIN { dbase = "^REC_DELIMITER\\."; delim = dbase dtype "$" }
$0 ~ dbase { inBlock=0 }
$0 ~ delim { inBlock=1; idx++ }
inBlock { print > sprintf("original_file.split.%03d", idx) }
' original_file

Answer 3

awk -vRS=REC_DELIMITER '/^.TOP\n/{print RS $0 > sprintf("original_file.split.%03d",n)};!++n' original_file

(Give or take an extra newline at the end.) （最后给予或采取额外的换行。）

Generally, when input is supposed to be treated as a series of multi-line records with a special line as delimiter, the most direct approach is to set RS (and often ORS) to that delimiter. 通常，当输入被视为一系列具有特殊行作为分隔符的多行记录时，最直接的方法是将RS（通常是ORS）设置为该分隔符。

Normally you'd want to add newlines to its beginning and/or end, but this case is a little special so it's easier without them. 通常你想在它的开头和/或结尾添加换行符，但这种情况有点特殊，所以没有它们会更容易。

Edited to add: You need GNU Awk for this. 编辑补充：你需要GNU Awk。 Standard Awk considers only the first character of RS. 标准Awk仅考虑RS的第一个字符。

Answer 4

I made some changes so the different delimiters go to the their own file, even when they occur later in the file. 我做了一些更改，以便不同的分隔符转到他们自己的文件，即使它们稍后出现在文件中。 make a file like splitter.awk with the contents below, the chmod +x it and run it with ./splitter.awk original_file 使用下面的内容创建一个像splitter.awk这样的文件，chmod + x然后运行它./splitter.awk original_file

#!/usr/bin/awk -f
BEGIN {
  idx=0;
  file="original_file.split.";
  out=""
}
{
  if($0 ~ /^REC_DELIMITER.(TOP|HIGH)/){
    if (!cnt[$0]) {
      cnt[$0] = ++idx;
    }
    out=cnt[$0];
  }
  print >  file sprintf("%03d", out)
}

Answer 5

I'm not very used to AWK, however, plasticide's answer put me towards right direction and I finally got AWK script working as requirements. 我不是很习惯AWK，然而，plasticide的回答让我朝着正确的方向前进，我最终得到了AWK脚本作为要求。

In below code, first IF turn echo to 0 if a demilier is found. 在下面的代码中，如果找到demilier，则首先IF将echo转为0。 Second IF turn echo to 1 if the wanted delimiter is found, then the want ones are are split from file. 如果找到所需的分隔符，则第二个IF将echo转为1，然后将所需的分隔符从文件中分割出来。

awk 'BEGIN {
  idx=0; echo=1; file="original_file.split."
}
{
  #All the delimiters to consider in given file
  if($0 ~ /^(REC_DELIMITER.TOP|REC_DELIMITER.HIGH|REC_DELIMITER.LOW|REC_NO_CATEGORY)$/) {
    echo=0
  }
  #Delimiters that should actually be pulled
  if($0 ~ /^(REC_DELIMITER.HIGH|REC_DELIMITER.LOW)$/ {
    idx++; echo=1
  }
  #Print to a file is match wanted delimmiter
  if(echo) {
    print > file idx
  }
}' original_file

Thank you all. 谢谢你们。 I really appreciate it very much. 我非常感激。

AWK拆分为多个分隔符行

问题描述

5 个解决方案

解决方案1
5 2013-06-11 20:04:25

解决方案2
3 2013-06-11 20:05:03

解决方案3
2 2013-06-12 01:16:19

解决方案4
1 2013-06-11 20:18:48

解决方案5
-2 已采纳 2013-06-17 18:02:06

AWK拆分为多个分隔符行

问题描述

5 个解决方案

解决方案1 5 2013-06-11 20:04:25

解决方案2 3 2013-06-11 20:05:03

解决方案3 2 2013-06-12 01:16:19

解决方案4 1 2013-06-11 20:18:48

解决方案5 -2 已采纳 2013-06-17 18:02:06

解决方案1
5 2013-06-11 20:04:25

解决方案2
3 2013-06-11 20:05:03

解决方案3
2 2013-06-12 01:16:19

解决方案4
1 2013-06-11 20:18:48

解决方案5
-2 已采纳 2013-06-17 18:02:06