简体   繁体   English

sed提取两个字符串之间的文本

[英]Sed to extract text between two strings

Please help me in using sed. 请帮助我使用sed。 I have a file like below. 我有一个如下文件。

START=A
  xxxxx
  xxxxx
END
START=A
  xxxxx
  xxxxx
END
START=A
  xxxxx
  xxxxx
END
START=B
  xxxxx
  xxxxx
END
START=A
  xxxxx
  xxxxx
END
START=C
  xxxxx
  xxxxx
END
START=A
  xxxxx
  xxxxx
END
START=D
  xxxxx
  xxxxx
END

I want to get the text between START=A, END. 我想获取START = A,END之间的文本。 I used the below query. 我用下面的查询。

sed '/^START=A/, / ^END/!d' input_file

The problem here is , I am getting 这里的问题是,我越来越

START=A
  xxxxx
  xxxxx
END
START=D
  xxxxx
  xxxxx
END

instead of 代替

START=A
  xxxxx
  xxxxx
END

Sed finds greedily. 塞德贪婪地找到了。

Please help me in resolvng this. 请帮助我解决这个问题。

Thanks in advance. 提前致谢。

Can I use AWK for achieving above? 我可以使用AWK实现以上目标吗?

sed -n '/^START=A$/,/^END$/p' data

The -n option means don't print by default; -n选项表示默认情况下不打印; then the script says 'do print between the line containing START=A and the next END . 然后脚本说'在包含START=A的行和下一个END之间打印。

You can also do it with awk : 您也可以使用awk做到这一点:

A pattern may consist of two patterns separated by a comma; 模式可以包含两个模式,并以逗号分隔; in this case, the action is performed for all lines from an occurrence of the first pattern though an occurrence of the second. 在这种情况下,从出现第一个图案到出现第二个图案,对所有行执行该动作。

(from man awk on Mac OS X). (来自Mac OS X上的man awk )。

awk '/^START=A$/,/^END$/ { print }' data

Given a modified form of the data file in the question: 给定问题中数据文件的修改形式:

START=A
  xxx01
  xxx02
END
START=A
  xxx03
  xxx04
END
START=A
  xxx05
  xxx06
END
START=B
  xxx07
  xxx08
END
START=A
  xxx09
  xxx10
END
START=C
  xxx11
  xxx12
END
START=A
  xxx13
  xxx14
END
START=D
  xxx15
  xxx16
END

The output using GNU sed or Mac OS X (BSD) sed , and using GNU awk or BSD awk , is the same: 使用GNU sed或Mac OS X(BSD) sed以及GNU awk或BSD awk是相同的:

START=A
  xxx01
  xxx02
END
START=A
  xxx03
  xxx04
END
START=A
  xxx05
  xxx06
END
START=A
  xxx09
  xxx10
END
START=A
  xxx13
  xxx14
END

Note how I modified the data file so it is easier to see where the various blocks of data printed came from in the file. 请注意我如何修改数据文件,以便更轻松地查看文件中打印的各种数据块的来源。

If you have a different output requirement (such as 'only the first block between START=A and END', or 'only the last ...'), then you need to articulate that more clearly in the question. 如果您有不同的输出要求(例如“仅START = A和END之间的第一个块”或“仅最后一个...”),则需要在问题中更清楚地说明。

Basic version ... 基本版本...

sed -n '/START=A/,/END/p' yourfile

More robust version... 更强大的版本...

sed -n '/^ *START=A *$/,/^ *END *$/p' yourfile

Your sed expression has a space before end, ie / ^END/ . 您的sed表达式在末尾有一个空格,即/ ^END/ So sed gets the starting pattern, but does not get the ending pattern and keeps on printing till end. 因此, sed获得了起始模式,但没有获得结束模式,并继续打印直到结束。 Use sed '/^START=A/, /^END/!d' input_file (notice /^END/ ) 使用sed '/^START=A/, /^END/!d' input_file (注意/^END/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM