[英]How to grep/sed/awk for a range of output starting with a whitespace character
I have a file that looks something like this: 我有一个看起来像这样的文件:
# cat $file
...
ip access-list extended DOG-IN
permit icmp 10.10.10.1 0.0.0.7 any
permit tcp 10.11.10.1 0.0.0.7 eq www 443 10.12.10.0 0.0.0.63
deny ip any any log
ip access-list extended CAT-IN
permit icmp 10.13.10.0 0.0.0.255 any
permit ip 10.14.10.0 0.0.0.255 host 10.15.10.10
permit tcp 10.16.10.0 0.0.0.255 host 10.17.10.10 eq smtp
...
I want to be able to search by name (using a script) to get 'section' output for independent access-lists. 我希望能够通过名称(使用脚本)进行搜索,以获得独立访问列表的“部分”输出。 I want the output to look like this:
我希望输出看起来像这样:
# grep -i dog $file | sed <options??>
ip access-list extended DOG-IN
permit icmp 10.10.10.1 0.0.0.7 any
permit tcp 10.11.10.1 0.0.0.7 eq www 443 10.12.10.0 0.0.0.63
deny ip any any log
...with no further output of inapplicable non-indented lines. ...没有进一步输出不适用的非缩进行。
I have tried the following: 我尝试过以下方法:
grep -A 10 DOG $file | sed -n '/^[[:space:]]\{1\}/p'
...Which only gives me the 10 lines after DOG which begin with a single space (including lines not applicable to the searched access-list). ...它只给我在DOG之后的10行,它以单个空格开头(包括不适用于搜索的访问列表的行)。
sed -n '/DOG/,/^[[:space:]]\{1\}/p' $file
...Which gives me the line containing DOG, and the next line beginning with a single space. ...这给了我包含DOG的行,以及以单个空格开头的下一行。 (Need all the applicable lines of the access-list...)
(需要访问列表的所有适用行......)
I want the line containing DOG, and all lines after DOG which begin with a single space, until the next un-indented line. 我想要包含DOG的行,以及DOG之后以单个空格开头的所有行,直到下一个非缩进行。 There are too many variables in the content to depend on any patterns other than the leading space (there is not always a deny on the end, etc...).
内容中有太多变量依赖于除前导空格之外的任何模式(最终并不总是拒绝等等)。
Using GNU sed
(Linux) : 使用GNU
sed
(Linux) :
name='dog' # case-INsensitive name of section to extract
sed -n "/$name/I,/^[^[:space:]]/ { /$name/I {p;d}; /^[^[:space:]]/q; p }" file
To make matching case- sensitive , remove the I
after the occurrences of /I
above. 要使匹配区分大小写 ,请在上面的
/I
出现后删除I
-n
suppresses default output so that output must explicitly be requested inside the script with functions such as p
. -n
禁止默认输出,因此必须在脚本中使用p
等函数显式请求输出。 "..."
) around the sed
script, so as to allow references to the shell variable $name
: The double quotes ensure that the shell variable references are expanded BEFORE the script is handed to sed
( sed
itself has no access to shell variables). sed
脚本周围使用双引号( "..."
),以便允许引用shell变量$name
:双引号确保在脚本传递给sed
之前扩展shell变量引用( sed
本身无权访问shell变量)。
sed
, such as $
as \\$
, and (b) the shell-variable value must not contain sed
metacharacters that could break the sed
script; sed
shell元字符,例如$
as \\$
,以及(b)shell-variable值不能包含sed
元字符可以打破sed
脚本; for generic escaping of shell-variable values for use in sed
scripts, see this answer of mine, or use my awk
-based answer . sed
脚本中使用的shell变量值的泛型转义,请参阅我的这个答案 ,或者使用我基于awk
的答案 。 /$name/I,/^[^[:space:]]/
uses a range to match the line of interest ( /$name/I
; the trailing I
is GNU sed
's case-insensitivity matching option) through the start of the next section ( /^[^[:space:]]/
- ie, the next line that does NOT start with whitespace); /$name/I,/^[^[:space:]]/
使用一个范围来匹配感兴趣的行( /$name/I
;尾随I
是GNU sed
的不区分大小写的匹配选项) 下一节的部分( /^[^[:space:]]/
- 即,不以空格开头的下一行); since sed
ranges are always inclusive , the challenge is to selectively remove the last line of the range, IF it is the start of the next section - note that this will NOT be the case if the section of interest is the LAST one in the file. sed
范围始终是包容性的 ,所以挑战是选择性地删除范围的最后一行,如果它是下一部分的开始 - 请注意,如果感兴趣的部分是文件中的最后部分,则不会出现这种情况。 。 { ... }
are only executed for each line in the range. { ... }
中的命令仅对范围中的每一行执行。 /$name/I {p;d};
unconditionally prints the 1st line of the range: d
deletes the line (which has already been printed) and starts the next cycle (proceeds to the next input line). d
删除该行(已经打印)并开始下一个循环(进入下一个输入行)。 /^[^[:space:]]/q
matches the last line in the range, IF it is the next section's first line, and quits processing altogether ( q
), without printing the line. /^[^[:space:]]/q
匹配范围中的最后一行,如果它是下一部分的第一行,则完全退出处理( q
),而不打印该行。 p
is then only reached for section-interior lines and prints them. p
然后只达到部分,内饰线和打印他们。 Note: 注意:
awk
-based answer . awk
的答案 。 FreeBSD/macOS sed
can almost do the same, except that it lacks the case-insensitivity option, I
. FreeBSD / macOS
sed
几乎可以做同样的事情,除了它缺少不区分大小写的选项, I
。
name='DOG' # case-SENSITIVE name of section to extract
sed -n -e "/$name/,/^[^[:space:]]/ { /$name/ {p;d;}; /^[^[:space:]]/q; p; }" file
Note that FreeBSD/OSX sed
generally has stricter syntax requirements, such as the ;
请注意,FreeBSD / OSX
sed
通常具有更严格的语法要求,例如;
after a command even when followed by }
. 在命令之后,即使后面跟着
}
。
If you do need case-insensitivity, see my awk
-based answer . 如果你确实需要不区分大小写,请参阅我基于
awk
的答案 。
awk -vfound=0 '
/DOG/{
found = !found;
print;
next
}
/^[[:space:]]/{
if (found) {
print;
next
}
}
{ found = !found }
'
You can substitute any ERE in place of /DOG/
, such as /(DOG)|(CAT)/
, and the rest of the script will do the work. 您可以用任何ERE代替
/DOG/
,例如/(DOG)|(CAT)/
,并且脚本的其余部分将完成工作。 You can condense it if you like of course. 如果你喜欢,你可以浓缩它。
Note that just because a line begins with a space, that doesn't mean there is only one space. 请注意,仅仅因为一行以空格开头,这并不意味着只有一个空格。
/^[[:space:]]{1}/
will match the leading space, even in a string like /^[[:space:]]{1}/
将匹配前导空格,即使是像字符串一样
nonspace
meaning it is equivalent to /^[[:space:]]/
. 意思是它相当于
/^[[:space:]]/
。 If your format is so rigid that there must always only be a single space, use /^[[:space:]][^[:space:]]/
instead. 如果您的格式非常严格以至于必须始终只有一个空格,请使用
/^[[:space:]][^[:space:]]/
。 Lines like the one with "nonspace" above will not be matched. 像上面有“非空格”的行将不匹配。
I added a second answer as mklement0 pointed a flaw on my logic. 我添加了第二个答案,因为mklement0指出了我的逻辑缺陷。
This is yet a very simple way to do that in Perl: 在Perl中,这是一种非常简单的方法:
perl -ne ' /^\\w+/ && {$p=0}; /DOG/ && {$p=1}; $p && {print}'
EXAMPLES: 例子:
cat /tmp/file | perl -ne ' /^\w+/ && {$p=0}; /DOG/ && {$p=1}; $p && {print}'
ip access-list extended DOG-IN
permit icmp 10.10.10.1 0.0.0.7 any
permit tcp 10.11.10.1 0.0.0.7 eq www 443 10.12.10.0 0.0.0.63
deny ip any any log
cat /tmp/file | perl -ne ' /^\w+/ && {$p=0}; /CAT/ && {$p=1}; $p && {print}'
ip access-list extended CAT-IN
permit icmp 10.13.10.0 0.0.0.255 any
permit ip 10.14.10.0 0.0.0.255 host 10.15.10.10
permit tcp 10.16.10.0 0.0.0.255 host 10.17.10.10 eq smtp
EXPLANATION: 说明:
If the line starts with [a-z0-9_] set $p false 如果该行以[a-z0-9_]开头设置$ p false
If the line contains PATTERN in this case DOG sets $p true 如果该行包含PATTERN,则DOG设置$ p true
if $p true prints 如果$ p true打印
A shorter, POSIX-compliant awk
solution , which is a generalized and optimized translation of @Tiago's excellent Perl-based answer . 一个简短的, 符合POSIX标准的
awk
解决方案 ,是@Tiago优秀的基于Perl的答案的通用和优化翻译。
One advantage of these answers over the sed
solutions is that they use literal substring matching rather than regular expressions, which allows passing in arbitrary search strings, without needing to worry about escaping. 这些答案优于
sed
解决方案的一个优点是它们使用文字子串匹配而不是正则表达式,这允许传递任意搜索字符串,而无需担心转义。 That said, if you did want regex matching, use the ~
operator rather than the index()
function; 也就是说,如果你确实想要正则表达式匹配,使用
~
运算符而不是index()
函数; eg, index($0, name)
would become $0 ~ name
. 例如,
index($0, name)
将变为$0 ~ name
。 You then have to make sure that the value passed for name
either contains no accidental regex metacharacters meant to be treated as literals or is an intentionally crafted regex. 然后,您必须确保为
name
传递的值不包含意外的正则表达式元字符,意味着将其视为文字, 或者是有意制作的正则表达式。
name='DOG' # Case-sensitive name to search for.
awk -v name="$name" '/^[^[:space:]]/ {if (p) exit; if (index($0,name)) {p=1}} p' file
-v name="$name"
defines awk
variable name
based on the value of shell variable $name
( awk
has no direct access to shell variables). -v name="$name"
定义awk
变量name
基于shell变量的值$name
( awk
没有直接进入shell变量)。 p
is used as a flag to indicate whether the current line should be printed, ie, whether it is part of the section of interest; p
用作标志以指示是否应打印当前行,即它是否是感兴趣部分的一部分; as long as p
is not initialized, it is treated as 0
(false) in a Boolean context. p
未初始化,就会在布尔上下文中将其视为0
(false)。 /^[^[:space:]]/
matches only header lines (lines that start with a non-whitespace character), and the associated action ( {...}
) is only processed for them: /^[^[:space:]]/
仅匹配标题行(以非空白字符开头的行),并且仅为它们处理关联的操作 ( {...}
):
if (p) exit
exits processing altogether, if p
is already set, because that implies that the next section has been reached. if (p) exit
完全退出处理,如果p
已经设置,因为这意味着已经到达下一部分。 Exiting right away has the benefit of not having to process the remainder of the file. if (index($0, name))
looks for the name of interest as a literal substring in the header line at hand, and, if found (in which case index() returns the 1-based position at which the substring was found, which is interpreted as
true in a Boolean context), sets flag
p to
1 (
{p=1}`). if (index($0, name))
在手头的标题行中查找感兴趣的名称作为文字子字符串 ,如果找到(在这种情况下, index() returns the 1-based position at which the substring was found, which is interpreted as
in a Boolean context), sets flag
index() returns the 1-based position at which the substring was found, which is interpreted as
true in a Boolean context), sets flag
p in a Boolean context), sets flag
to
1 (
{p = 1}`)。 p
simply prints the current line, if p
is 1
, and does nothing otherwise. p
简单地打印当前行,如果p
是1
,和什么也不做,否则。 That is, once the section header of interest has been found, it and subsequent lines are printed (up until the next section or the end of the input file). {...}
), in which case the default action is to print the current line, if the pattern evaluates to true. {...}
),在这种情况下,默认操作是打印当前行,如果模式评估为真。 (That technique is used in the common shorthand 1
to simply unconditionally print the current record.) 1
用于简单地无条件地打印当前记录。) If case-INsensitivity is needed: 如果需要case-INsensitivity :
name='dog' # Case-INsensitive name to search for.
awk -v name="$name" \
'/^[^[:space:]]/ {if(p) exit; if(index(tolower($0),tolower(name))) {p=1}} p' file
Caveat : The BSD-based awk
that comes with macOS (still applies as of 10.12.1) is not UTF-8-aware. 警告 :macOS附带的基于BSD的
awk
(自10.12.1起仍然适用)不支持UTF-8。 : the case-insensitive matching won't work with non-ASCII letters such as ü
. :不区分大小写的匹配不适用于非ASCII字母,如
ü
。
GNU awk
alternative, using the special IGNORECASE
variable: GNU
awk
替代方案,使用特殊的IGNORECASE
变量:
awk -v name="$name" -v IGNORECASE=1 \
'/^[^[:space:]]/ {if(p) exit; if(index($0,name)) {p=1}} p' file
Another POSIX-compliant awk
solution: 另一个符合POSIX标准的
awk
解决方案:
name='dog' # Case-insensitive name of section to extract.
awk -v name="$name" '
index(tolower($0),tolower(name)) {inBlock=1; print; next} # 1st section line found.
inBlock && !/^[[:space:]]/ {exit} # Exit at start of next section.
inBlock # Print 2nd, 3rd, ... section line.
' file
Note: 注意:
next
skips the remaining pattern-action pairs and proceeds to the next line. next
跳过剩余的模式 - 动作对并继续下一行。 /^[[:space:]]/
matches lines that start with at least one whitespace char. /^[[:space:]]/
匹配以至少一个空格字符开头的行。 As @Chrono Kitsune explains in his answer, if you wanted to match lines that start with exactly one whitespace char., use /^[[:space:]][^[:space:]]/
. /^[[:space:]][^[:space:]]/
。 Also note that, despite its name, character class [:space:]
matches ANY form of whitespace, not just spaces - see man isspace
. [:space:]
匹配任何形式的空格,而不仅仅是空格 - 请参阅man isspace
。 inBlock
, as it defaults to 0
in numeric/Boolean contexts. inBlock
,因为它在numeric / Boolean上下文中默认为0
。 awk
, you can more easily achieve case-insensitive matching by setting the IGNORECASE
variable to a nonzero value ( -v IGNORECASE=1
) and simply using index($0, name)
inside the program. awk
,你可以通过将IGNORECASE
变量设置为非零值( -v IGNORECASE=1
)并在程序中简单地使用index($0, name)
来更轻松地实现不区分大小写的匹配。 A GNU awk
solution, IF, you can assume that all section header lines start with 'ip'
(so as to break the input into sections that way, rather than looking for leading whitespace): 一个GNU
awk
解决方案,IF,你可以假设所有的section标题行以'ip'
开头(以便将输入分解为那样的部分,而不是寻找前导空格):
awk -v RS='(^|\n)ip' -F'\n' -v name="$name" -v IGNORECASE=1 '
index($1, name) { sub(/\n$/, ""); print "ip" $0; exit }
' file
-v RS='(^|\\n)ip'
breaks the input into records by lines that fall between line-starting instances of string 'ip'
. -v RS='(^|\\n)ip'
将输入分解为字符串'ip'
行起始实例之间的行。 -F'\\n'
then breaks each record into fields ( $1
, ...) by lines. -F'\\n'
然后按行将每条记录分成字段( $1
,...)。 index($1, name)
looks for the name on the current record's first line - case-INsensitively, thanks to -v IGNORECASE=1
. index($1, name)
在当前记录的第一行查找名称 - case-Insensitively,这要归功于-v IGNORECASE=1
。 sub(/\\n$/, "")
removes any trailing \\n
, which can stem from the section of interest being the last in the input file. sub(/\\n$/, "")
删除任何尾随\\n
,它可以源于输入文件中最后一个感兴趣的部分。 print "ip" $0
prints the matching record, comprising the entire section of interest - since, however the record doesn't include the separator , 'ip'
, it is prepended. print "ip" $0
打印匹配的记录,包括整个感兴趣的部分 - 因为,但是记录不包括分隔符 'ip'
,它是前置的。 @mklement0 squeezed my already-inscrutable sed down to this: @ mklement0将我已经不可理喻的sed压缩到这个:
sed '/^ip/!{H;$!d};x; /DOG/I!d'
which swaps accumulated multiline groups into the pattern buffer for processing -- the main logic ( /DOG/I!d
here) operates on whole groups. 它将累积的多行组交换到模式缓冲区中进行处理 - 主逻辑(
/DOG/I!d
)在整个组上运行。
The /^ip/!
/^ip/!
identifies continuation lines by the absence of a first-line marker and accumulates them, so the x
only runs when an entire group has been accumulated. 通过缺少第一行标记来识别连续行并累积它们,因此
x
仅在累积整个组时运行。
Some corner cases don't apply here: 有些角落案例不适用于此:
The first x
swaps in a phantom empty group at the start. 第一个
x
在开始时交换一个幻像空组。 If that doesn't get dropped during ordinary processing, adding a 1d
fixes that. 如果在普通处理过程中没有丢弃,那么添加
1d
解决这个问题。
The last x
also swaps out the last line of the file. 最后一个
x
也交换出文件的最后一行。 That's usually just last line of the last group, already accumulated by the H
, but if some command might produce one-line groups you need to supply a fake one at the end (with eg echo "header phantom" | sed '/^header/!{H;$!d};x' realdata.txt -
, or { showgroups; echo header phantom; } | sed '/^header/!{H;$!d};x'
. 这通常只是最后一组的最后一行,已经由
H
累积,但是如果某些命令可能产生一行组,你需要在末尾提供一个假的(例如echo "header phantom" | sed '/^header/!{H;$!d};x' realdata.txt -
,或{ showgroups; echo header phantom; } | sed '/^header/!{H;$!d};x'
echo "header phantom" | sed '/^header/!{H;$!d};x' realdata.txt -
{ showgroups; echo header phantom; } | sed '/^header/!{H;$!d};x'
。
The simplest way I can think of is: sed '/DOG/, /^ip/ !d' | sed '$d'
我能想到的最简单的方法是:
sed '/DOG/, /^ip/ !d' | sed '$d'
sed '/DOG/, /^ip/ !d' | sed '$d'
cat file | sed '/DOG/, /^ip/ !d' | sed '$d'
ip access-list extended DOG-IN
permit icmp 10.10.10.1 0.0.0.7 any
permit tcp 10.11.10.1 0.0.0.7 eq www 443 10.12.10.0 0.0.0.63
deny ip any any log
Explanation: 说明:
DOG
to the next line starting with ip
DOG
的行打印到以ip
开头的下一行 ip
) ip
开头的行)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.