[英]Awk: Splitting file on nth occurence of delimiter, wrong first split file
I want to split a text file like the one pasted below (sorry for the length), on every n occurence of ">". 我希望在每次出现“>”时拆分文本文件,如下面粘贴的文件(对不起长度)。 For example, every 2nd occurrence of ">", but I need to able able to change that number. 例如,每隔第二次出现“>”,但我需要能够更改该数字。
test_split.txt: test_split.txt:
>eeefkdfn
a
a
a
>c 4ufjdhf
b
b
b
b
>
c
c
> c
d
d
d
d
d
>3
>cr
>c3
e
e
e
e
e
> 5
f
f
f
f
>cr
g
g
g
g
> cr dkjfddf
h
h
h
h
So I want to have output files this these (only showing the first two): 所以我希望这些输出文件(仅显示前两个):
file_1.txt: file_1.txt:
>eeefkdfn
a
a
a
>c 4ufjdhf
b
b
b
b
file_2.txt: file_2.txt:
>
c
c
> c
d
d
d
d
d
etc. 等等
Question: 题:
I have been trying to achieve that result using this awk command: 我一直在尝试使用这个awk命令来实现这个结果:
awk '/^>/ {n++} { file = sprintf("file_%s.txt", int(n/2)); print >> file; }' < test_split.txt
And instead of the desired result, I am getting correct output (split) files, except for the first one, which only contains one occurence of ">" (instead of two), like this: 而不是期望的结果,我得到正确的输出(拆分)文件,除了第一个,其中只包含一个“>”(而不是两个),如下所示:
cat test_0.txt cat test_0.txt
>eeefkdfn
a
a
a
cat test_1.txt cat test_1.txt
>chr1 4ufjdhf
b
b
b
b
>
c
c
Any idea why that is? 知道为什么会这样吗? Thank you! 谢谢!
This seems more simple: 这似乎更简单:
awk 'BEGIN{i=1}/^>/{cont++}cont==3{i++;cont=1}{print > "file_"i".txt"} file
Will gives you the expected result: 威尔会给你预期的结果:
$ cat file_1.txt
>eeefkdfn
a
a
a
>c 4ufjdhf
b
b
b
b
$ cat file_2.txt
>
c
c
> c
d
d
d
d
d
Explanation 说明
BEGIN{i=1}
: File counter initialization. BEGIN{i=1}
:文件计数器初始化。
/^>/{cont++}
: To count every >
found. /^>/{cont++}
:计算每个>
找到的。
cont==3{i++;cont=1}
: To increase the file counter and initialize the cont var every third appearance of the >
char which becomes first again. cont==3{i++;cont=1}
:增加文件计数器并初始化>
每隔一次>
char的第三次出现的cont var。
{print > "file_"i".txt"}
: Direct the output to the expected file. {print > "file_"i".txt"}
:将输出{print > "file_"i".txt"}
到预期文件。
You can use this awk for dynamic control over number n
where file will be split on nth
occurrence of >
in input data: 您可以使用此AWK超过数动态控制n
哪里文件将在被分割nth
的发生>
在输入数据:
awk -v n=2 'function ofile() {
if (op)
close(op)
op = sprintf("file_%d.txt", ++p)
}
BEGIN {
ofile()
}
/>/ {
++i
}
i > n {
i=1
ofile()
}
{
print $0 > op
}
END {
close(op)
}' file
Here is one liner in case you want to copy/paste: 如果您想复制/粘贴,这是一个衬垫:
awk -v n=2 'function ofile() {if (op) close(op); op = sprintf("file_%d.txt", ++p)} BEGIN{ofile()} />/{++i} i>n{i=1; ofile()} { print $0 > op }' file
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.