简体   繁体   English

如何grep和从文件中删除分隔符之间的所有行

[英]How to grep and remove from a file all lines between a separator

I have a file that looks like this: 我有一个看起来像这样的文件:

===SEPARATOR===
line2
line3
===SEPARATOR===
line5
line6
===SEPARATOR===
line8
...
lineX
===SEPARATOR===

How can I do a while loop and go through the file, dump anything between two ===SEPARATOR=== occurrences into another file for further processing? 我该如何进行while循环并遍历文件,将两个===SEPARATOR===事件之间的所有内容转储到另一个文件中以进行进一步处理? I want to add only line2, line3 to the second file on the first iteration. 我只想将line2,line3添加到第一次迭代的第二个文件中。 I will parse the file; 我将解析该文件; and on the next iteration I want line5 line6 in second file to do the same parsing again but on different data. 在下一次迭代中,我希望第二个文件中的line5 line6再次执行相同的解析,但是对不同的数据进行解析。

You can exclude all lines matching ===SEPARATOR=== with grep -v and redirect the rest to a file: 您可以使用grep -v排除所有匹配===SEPARATOR===行,并将其余的行重定向到文件:

grep -vx '===SEPARATOR===' file > file_processed

-x makes sure that only lines completely matching ===SEPARATOR=== are excluded. -x确保仅排除完全匹配===SEPARATOR===的行。

This uses sed to find lines between separators, and then grep -v to delete the separators. 它使用sed查找分隔符之间的行,然后使用grep -v删除分隔符。

$ sed -n '/===SEPARATOR===/,/===SEPARATOR===/ p' file | grep -v '===SEPARATOR==='
line2
line3
line8
...
lineX

There's got to be a more elegant answer that doesn't repeat the separator three times, but I'm drawing a blank. 必须有一个更优雅的答案,该答案不会重复分隔符3次,但我正在绘制空白。

I am assuming that you do not need the line5 and line6 . 我假设您不需要line5和line6。 You can do it with awk like this:. 您可以使用awk做到这一点:

awk '$0 == "===SEPARATOR===" {interested = ! interested; next} interested {print}'

Credit goes to https://www.gnu.org/software/gawk/manual/html_node/Boolean-Ops.html#Boolean-Ops 鸣谢到https://www.gnu.org/software/gawk/manual/html_node/Boolean-Ops.html#Boolean-Ops

Output: 输出:

[root@hostname ~]# cat /tmp/1 | awk '$0 == "===SEPARATOR===" {interested = ! interested; next} interested {print}' /tmp/1
line2
line3
line8
...
lineX

awk to the rescue! awk解救!

with multi-char support (eg gawk) 具有多字符支持(例如gawk)

$ awk -v RS='\n?===SEPARATOR===\n' '!(NR%2)' file

line2
line3
line8
...
lineX

or without that 或没有

$ awk '/===SEPARATOR===/{p=!p;next} p' file

line2
line3
line8
...
lineX

which is practically the same with @Jay Rajput's answer. 这与@Jay Rajput的答案几乎相同。

It sounds like you want to save each block of lines to a separate file . 听起来您想将每一行行保存到单独的文件中

The following solutions create output files f1 , f2 , containing the (non-empty) blocks of lines betwen the ===SEPARATOR=== lines. 以下解决方案创建输出文件f1f2 ,其中包含===SEPARATOR===行之间的(非空)行块。

With GNU Awk or Mawk: 使用GNU Awk或Mawk:

awk -v fnamePrefix='f' -v RS='(^|\n)===SEPARATOR===(\n|$)' \
  'NF { fname = fnamePrefix (++n); print > fname; close(fname) }' file

Pure bash - which will be slow : bash会很

#!/usr/bin/env bash

fnamePrefix='f'; i=0
while IFS= read -r line; do
  [[ $line == '===SEPARATOR===' ]] && { (( ++i )); > "${fnamePrefix}${i}"; continue; }
  printf '%s\n' "$line" >> "${fnamePrefix}${i}"
done < file

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM