awk：在第n个分隔符出现时分割文件，错误的第一个分割文件

Question

I want to split a text file like the one pasted below (sorry for the length), on every n occurence of ">". 我希望在每次出现“>”时拆分文本文件，如下面粘贴的文件（对不起长度）。 For example, every 2nd occurrence of ">", but I need to able able to change that number. 例如，每隔第二次出现“>”，但我需要能够更改该数字。

test_split.txt: test_split.txt：

>eeefkdfn
a
a
a
>c 4ufjdhf
b
b
b
b
>
c
c
> c
d
d
d
d
d
>3
>cr
>c3
e
e
e
e
e
> 5
f
f
f
f
>cr
g
g
g
g
> cr dkjfddf
h
h
h
h

So I want to have output files this these (only showing the first two): 所以我希望这些输出文件（仅显示前两个）：

file_1.txt: file_1.txt：

>eeefkdfn
a
a
a
>c 4ufjdhf
b
b
b
b

file_2.txt: file_2.txt：

>
c
c
> c
d
d
d
d
d

etc. 等等

Question: 题：

I have been trying to achieve that result using this awk command: 我一直在尝试使用这个awk命令来实现这个结果：

awk '/^>/ {n++} { file = sprintf("file_%s.txt", int(n/2)); print >> file; }' < test_split.txt

And instead of the desired result, I am getting correct output (split) files, except for the first one, which only contains one occurence of ">" (instead of two), like this: 而不是期望的结果，我得到正确的输出（拆分）文件，除了第一个，其中只包含一个“>”（而不是两个），如下所示：

cat test_0.txt cat test_0.txt

>eeefkdfn
a
a
a

cat test_1.txt cat test_1.txt

>chr1 4ufjdhf
b
b
b
b
>
c
c

Any idea why that is? 知道为什么会这样吗？ Thank you! 谢谢！

Answer 1

This seems more simple: 这似乎更简单：

awk 'BEGIN{i=1}/^>/{cont++}cont==3{i++;cont=1}{print > "file_"i".txt"} file

Will gives you the expected result: 威尔会给你预期的结果：

$ cat file_1.txt
>eeefkdfn
a
a
a
>c 4ufjdhf
b
b
b
b

$ cat file_2.txt
>
c
c
> c
d
d
d
d
d

Explanation 说明

BEGIN{i=1} : File counter initialization. BEGIN{i=1} ：文件计数器初始化。

/^>/{cont++} : To count every > found. /^>/{cont++} ：计算每个>找到的。

cont==3{i++;cont=1} : To increase the file counter and initialize the cont var every third appearance of the > char which becomes first again. cont==3{i++;cont=1} ：增加文件计数器并初始化>每隔一次> char的第三次出现的cont var。

{print > "file_"i".txt"} : Direct the output to the expected file. {print > "file_"i".txt"} ：将输出{print > "file_"i".txt"}到预期文件。

Answer 2

You can use this awk for dynamic control over number n where file will be split on nth occurrence of > in input data: 您可以使用此AWK超过数动态控制n哪里文件将在被分割nth的发生>在输入数据：

awk -v n=2 'function ofile() {
   if (op)
      close(op)
   op = sprintf("file_%d.txt", ++p)
}
BEGIN {
   ofile()
}
/>/ {
   ++i
}
i > n {
   i=1
   ofile()
}
{
   print $0 > op
}
END {
   close(op)
}' file

Here is one liner in case you want to copy/paste: 如果您想复制/粘贴，这是一个衬垫：

awk -v n=2 'function ofile() {if (op) close(op); op = sprintf("file_%d.txt", ++p)} BEGIN{ofile()} />/{++i} i>n{i=1; ofile()} { print $0 > op }' file

awk：在第n个分隔符出现时分割文件，错误的第一个分割文件

问题描述

2 个解决方案

解决方案1
3 已采纳 2017-02-17 16:04:54

解决方案2
2 2017-02-17 16:04:26

awk：在第n个分隔符出现时分割文件，错误的第一个分割文件

问题描述

2 个解决方案

解决方案1 3 已采纳 2017-02-17 16:04:54

解决方案2 2 2017-02-17 16:04:26

解决方案1
3 已采纳 2017-02-17 16:04:54

解决方案2
2 2017-02-17 16:04:26