简体   繁体   English

如何使用sed或Perl删除多行块中的部分行?

[英]How do I remove part of a line in a multi-line chunk using sed or Perl?

I have some data that looks like this. 我有一些看起来像这样的数据。 It comes in chunk of four. 它有四个块。 Each chunk starts with a @ character. 每个块都以@字符开头。

@SRR037212.1 FC30L5TAA_102708:7:1:741:1355 length=27
AAAAAAAAAAAAAAAAAAAAAAAAAAA
+SRR037212.1 FC30L5TAA_102708:7:1:741:1355 length=27
::::::::::::::::::::::::;;8
@SRR037212.2 FC30L5TAA_102708:7:1:1045:1765 length=27
TATAACCAGAAAGTTACAAGTAAACAC
+SRR037212.2 FC30L5TAA_102708:7:1:1045:1765 length=27
888888888888888888888888888

At the third line of each chunk, I want to remove the text that comes after the + character, resulting in: 在每个块的第三行,我想删除+字符后面的文本,导致:

@SRR037212.1 FC30L5TAA_102708:7:1:741:1355 length=27
AAAAAAAAAAAAAAAAAAAAAAAAAAA
+
::::::::::::::::::::::::;;8
@SRR037212.2 FC30L5TAA_102708:7:1:1045:1765 length=27
TATAACCAGAAAGTTACAAGTAAACAC
+
888888888888888888888888888

Is there a compact way to do that in sed or Perl? 在sed或Perl中有一个紧凑的方法吗?

Assuming you just don't want to blindly remove the rest of every line starting with a + , then you can do this: 假设你只是不想盲目地删除以+开头的每一行的其余部分,那么你可以这样做:

sed '/^@/{N;N;s/\n+.*/\n+/}' infile

Output 产量

$ sed '/^@/{N;N;s/\n+.*/\n+/}' infile
@SRR037212.1 FC30L5TAA_102708:7:1:741:1355 length=27
AAAAAAAAAAAAAAAAAAAAAAAAAAA
+
::::::::::::::::::::::::;;8
@SRR037212.2 FC30L5TAA_102708:7:1:1045:1765 length=27
TATAACCAGAAAGTTACAAGTAAACAC
+
888888888888888888888888888
+Dont remove me

*Note: Although the above command keys on the @ to determine if a line with a + should be altered, it will still alter the 2nd line if it happens to also start with a + . *注:虽然在上面的命令键@ ,以确定是否有一个线+应该改变,它仍然改变二号线如果碰巧也有启动+ It doesn't sound like this is the case, but if you want to exclude this corner case as well, the following minor alteration will protect against that: 这听起来并非如此,但如果你想排除这个角落的情况,下面的小改动将防止这种情况:

sed '/^@/{N;N;s/\(.*\)\n+.*/\1\n+/}' infile

Output 产量

$ sed '/^@/{N;N;s/\(.*\)\n+.*/\1\n+/}' ./infile
@SRR037212.1 FC30L5TAA_102708:7:1:741:1355 length=27
+AAAAAAAAAAAAAAAAAAAAAAAAAAA
+
::::::::::::::::::::::::;;8
@SRR037212.2 FC30L5TAA_102708:7:1:1045:1765 length=27
TATAACCAGAAAGTTACAAGTAAACAC
+
888888888888888888888888888
+Dont remove me

If there is never a + on the first or second lines and always one on the third line: 如果第一行或第二行上永远不会有+,而第三行总是一行:

perl -0100pi -e's/\+.*/+/' datafile

Otherwise: 除此以外:

perl -0100pi -e's/^((?:.*\n){2}.*?\+).*/$1/' datafile

or on 5.10+: 或者在5.10+上:

perl -0100pi -e's/^(?:.*\n){2}.*?\+\K.*//' datafile

All those assume @ only appears at the start of a chunk. 所有那些假设@只出现在一个块的开头。 If it may appear other places, then: 如果它可能出现在其他地方,那么:

perl -pi -e's/\+.*/+/ if $. % 4 == 3' datafile

If you can use awk, you can do: 如果你可以使用awk,你可以这样做:

 gawk '{if ($0 ~ /^@/ ) { print ; getline ; print ; getline ; print "+"}}' INPUTFILE

So if gawk sees an @ at the start of the line, it will be printed, then the next line will be slurped && printed, and finally slurping the 3rd line (after the @ ), and printing only the + . 因此,如果gawk在行的开头看到一个@ ,它将被打印,然后下一行将被打印&&打印,最后啜饮第3行(在@ ),并仅打印+

If the + is not on the start of the line, you can use gensub(/\\+.*/,"+",$0) instead of the "+" in the last print . 如果+不在行的开头,则可以使用gensub(/\\+.*/,"+",$0)而不是最后一次print中的"+"

(And if you have perl installed, most probably there will be an a2p executable, which can convert the above awk script to perl, if you want to...) (如果你安装了perl ,很可能会有一个a2p可执行文件,可以将上面的awk脚本转换为perl,如果你想......)

HTH HTH

UPDATE (on missing 4th line): 更新 (缺少第4行):

 gawk '{if ($0 ~ /^@/ ) { print ; getline ; print ; getline ; print "+"; getline; print }}' INPUTFILE

This should print the 4th line as well. 这也应该打印第4行。

maybe just sed '/^@/+2 s/+.*/+/' 也许只是sed '/^@/+2 s/+.*/+/'

edit : this will not work, but as a vim command it should work: 编辑 :这将无法正常工作,但作为vim命令,它应该工作:

vim file -c ':g/^@/+2s/+.*/+/' -c 'wq'

This might work for you: 这可能对你有用:

sed '/^@/{$!N;$!N;$!N;s/\n+[^\n]*/\n+/g}' file

or with GNU sed: 或者使用GNU sed:

sed '/^@/,+3s/^+.*/+/' file

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM