简体   繁体   English

如何grep / sed / awk以一个以空白字符开头的输出范围

[英]How to grep/sed/awk for a range of output starting with a whitespace character

I have a file that looks something like this: 我有一个看起来像这样的文件:

# cat $file
...
ip access-list extended DOG-IN
 permit icmp 10.10.10.1 0.0.0.7 any
 permit tcp 10.11.10.1 0.0.0.7 eq www 443 10.12.10.0 0.0.0.63
 deny   ip any any log
ip access-list extended CAT-IN
 permit icmp 10.13.10.0 0.0.0.255 any
 permit ip 10.14.10.0 0.0.0.255 host 10.15.10.10
 permit tcp 10.16.10.0 0.0.0.255 host 10.17.10.10 eq smtp
...

I want to be able to search by name (using a script) to get 'section' output for independent access-lists. 我希望能够通过名称(使用脚本)进行搜索,以获得独立访问列表的“部分”输出。 I want the output to look like this: 我希望输出看起来像这样:

# grep -i dog $file | sed <options??>

ip access-list extended DOG-IN
 permit icmp 10.10.10.1 0.0.0.7 any
 permit tcp 10.11.10.1 0.0.0.7 eq www 443 10.12.10.0 0.0.0.63
 deny   ip any any log

...with no further output of inapplicable non-indented lines. ...没有进一步输出不适用的非缩进行。

I have tried the following: 我尝试过以下方法:

grep -A 10 DOG $file | sed -n '/^[[:space:]]\{1\}/p'

...Which only gives me the 10 lines after DOG which begin with a single space (including lines not applicable to the searched access-list). ...它只给我在DOG之后的10行,它以单个空格开头(包括不适用于搜索的访问列表的行)。

sed -n '/DOG/,/^[[:space:]]\{1\}/p' $file

...Which gives me the line containing DOG, and the next line beginning with a single space. ...这给了我包含DOG的行,以及以单个空格开头的下一行。 (Need all the applicable lines of the access-list...) (需要访问列表的所有适用行......)

I want the line containing DOG, and all lines after DOG which begin with a single space, until the next un-indented line. 我想要包含DOG的行,以及DOG之后以单个空格开头的所有行,直到下一个非缩进行。 There are too many variables in the content to depend on any patterns other than the leading space (there is not always a deny on the end, etc...). 内容中有太多变量依赖于除前导空格之外的任何模式(最终并不总是拒绝等等)。

Using GNU sed (Linux) : 使用GNU sed (Linux)

name='dog'  # case-INsensitive name of section to extract
sed -n "/$name/I,/^[^[:space:]]/ { /$name/I {p;d}; /^[^[:space:]]/q; p }" file

To make matching case- sensitive , remove the I after the occurrences of /I above. 要使匹配区分大小写 ,请在上面的/I出现后删除I

  • -n suppresses default output so that output must explicitly be requested inside the script with functions such as p . -n禁止默认输出,因此必须在脚本中使用p等函数显式请求输出。
  • Note the use of double quotes ( "..." ) around the sed script, so as to allow references to the shell variable $name : The double quotes ensure that the shell variable references are expanded BEFORE the script is handed to sed ( sed itself has no access to shell variables). 注意在sed脚本周围使用引号( "..." ),以便允许引用shell变量$name :双引号确保在脚本传递给sed之前扩展shell变量引用( sed本身无权访问shell变量)。
    • Caveat : This technique is tricky, because (a) you must use shell escaping to escape shell metacharacters you want to pass through to sed , such as $ as \\$ , and (b) the shell-variable value must not contain sed metacharacters that could break the sed script; 警告 :这种技术很棘手,因为(a)你必须使用shell转义来转义要传递给sed shell元字符,例如$ as \\$ ,以及(b)shell-variable值不能包含sed元字符可以打破sed脚本; for generic escaping of shell-variable values for use in sed scripts, see this answer of mine, or use my awk -based answer . 对于在sed脚本中使用的shell变量值的泛型转义,请参阅我的这个答案 ,或者使用我基于awk的答案
  • /$name/I,/^[^[:space:]]/ uses a range to match the line of interest ( /$name/I ; the trailing I is GNU sed 's case-insensitivity matching option) through the start of the next section ( /^[^[:space:]]/ - ie, the next line that does NOT start with whitespace); /$name/I,/^[^[:space:]]/使用一个范围来匹配感兴趣的行( /$name/I ;尾随I是GNU sed的不区分大小写的匹配选项) 下一节的部分( /^[^[:space:]]/ - 即,不以空格开头的下一行); since sed ranges are always inclusive , the challenge is to selectively remove the last line of the range, IF it is the start of the next section - note that this will NOT be the case if the section of interest is the LAST one in the file. 因为sed范围始终是包容性的 ,所以挑战是选择性地删除范围的最后一行,如果它是一部分的开始 - 请注意,如果感兴趣的部分是文件中的最后部分,则不会出现这种情况。 。
    Note that the commands inside { ... } are only executed for each line in the range. 请注意, { ... }中的命令仅对范围中的每一行执行。
  • /$name/I {p;d}; unconditionally prints the 1st line of the range: d deletes the line (which has already been printed) and starts the next cycle (proceeds to the next input line). 无条件地打印范围的第一行: d删除该行(已经打印)并开始下一个循环(进入下一个输入行)。
  • /^[^[:space:]]/q matches the last line in the range, IF it is the next section's first line, and quits processing altogether ( q ), without printing the line. /^[^[:space:]]/q匹配范围中的最后一行,如果它是一部分的第一行,则完全退出处理( q ),而不打印该行。
  • p is then only reached for section-interior lines and prints them. p然后只达到部分,内饰线和打印他们。

Note: 注意:

  • The assumption is that header lines can be identified by NOT starting with a whitespace char., and that any other lines are non-header lines - if more sophisticated matching is required, see my awk -based answer . 假设标题行可以通过NOT以空白字符开头来标识,并且任何其他行都是非标题行 - 如果需要更复杂的匹配,请参阅我的基于awk的答案
  • This solution has the slight disadvantage that the range regexes must be duplicated, although you could mitigate that with shell variables. 此解决方案的一个缺点是必须复制范围正则表达式,尽管您可以使用shell变量来缓解这种情况。

FreeBSD/macOS sed can almost do the same, except that it lacks the case-insensitivity option, I . FreeBSD / macOS sed 几乎可以做同样的事情,除了它缺少不区分大小写的选项, I

name='DOG'  # case-SENSITIVE name of section to extract
sed -n -e "/$name/,/^[^[:space:]]/ { /$name/ {p;d;}; /^[^[:space:]]/q; p; }" file

Note that FreeBSD/OSX sed generally has stricter syntax requirements, such as the ; 请注意,FreeBSD / OSX sed通常具有更严格的语法要求,例如; after a command even when followed by } . 在命令之后,即使后面跟着}

If you do need case-insensitivity, see my awk -based answer . 如果你确实需要不区分大小写,请参阅我基于awk的答案

awk -vfound=0 '
/DOG/{
    found = !found;
    print;
    next
}

/^[[:space:]]/{
    if (found) {
        print;
        next
    }
}

{ found = !found }
'

You can substitute any ERE in place of /DOG/ , such as /(DOG)|(CAT)/ , and the rest of the script will do the work. 您可以用任何ERE代替/DOG/ ,例如/(DOG)|(CAT)/ ,并且脚本的其余部分将完成工作。 You can condense it if you like of course. 如果你喜欢,你可以浓缩它。

Note that just because a line begins with a space, that doesn't mean there is only one space. 请注意,仅仅因为一行以空格开头,这并不意味着只有一个空格。 /^[[:space:]]{1}/ will match the leading space, even in a string like /^[[:space:]]{1}/将匹配前导空格,即使是像字符串一样

                      nonspace

meaning it is equivalent to /^[[:space:]]/ . 意思是它相当于/^[[:space:]]/ If your format is so rigid that there must always only be a single space, use /^[[:space:]][^[:space:]]/ instead. 如果您的格式非常严格以至于必须始终只有一个空格,请使用/^[[:space:]][^[:space:]]/ Lines like the one with "nonspace" above will not be matched. 像上面有“非空格”的行将不匹配。

I added a second answer as mklement0 pointed a flaw on my logic. 我添加了第二个答案,因为mklement0指出了我的逻辑缺陷。

This is yet a very simple way to do that in Perl: 在Perl中,这是一种非常简单的方法:

perl -ne ' /^\\w+/ && {$p=0}; /DOG/ && {$p=1}; $p && {print}'

EXAMPLES: 例子:

cat /tmp/file  | perl -ne ' /^\w+/ && {$p=0}; /DOG/ && {$p=1}; $p && {print}'
ip access-list extended DOG-IN
 permit icmp 10.10.10.1 0.0.0.7 any
 permit tcp 10.11.10.1 0.0.0.7 eq www 443 10.12.10.0 0.0.0.63
 deny   ip any any log

cat /tmp/file  | perl -ne ' /^\w+/ && {$p=0}; /CAT/ && {$p=1}; $p && {print}'
ip access-list extended CAT-IN
 permit icmp 10.13.10.0 0.0.0.255 any
 permit ip 10.14.10.0 0.0.0.255 host 10.15.10.10
 permit tcp 10.16.10.0 0.0.0.255 host 10.17.10.10 eq smtp

EXPLANATION: 说明:

If the line starts with [a-z0-9_] set $p false 如果该行以[a-z0-9_]开头设置$ p false

If the line contains PATTERN in this case DOG sets $p true 如果该行包含PATTERN,则DOG设置$ p true

if $p true prints 如果$ p true打印

A shorter, POSIX-compliant awk solution , which is a generalized and optimized translation of @Tiago's excellent Perl-based answer . 一个简短的, 符合POSIX标准的awk解决方案 ,是@Tiago优秀的基于Perl的答案的通用和优化翻译。

One advantage of these answers over the sed solutions is that they use literal substring matching rather than regular expressions, which allows passing in arbitrary search strings, without needing to worry about escaping. 这些答案优于sed解决方案的一个优点是它们使用文字子串匹配而不是正则表达式,这允许传递任意搜索字符串,而无需担心转义。 That said, if you did want regex matching, use the ~ operator rather than the index() function; 也就是说,如果你确实想要正则表达式匹配,使用~运算符而不是index()函数; eg, index($0, name) would become $0 ~ name . 例如, index($0, name)将变为$0 ~ name You then have to make sure that the value passed for name either contains no accidental regex metacharacters meant to be treated as literals or is an intentionally crafted regex. 然后,您必须确保为name传递的值不包含意外的正则表达式元字符,意味着将其视为文字, 或者是有意制作的正则表达式。

name='DOG' # Case-sensitive name to search for.

awk -v name="$name" '/^[^[:space:]]/ {if (p) exit; if (index($0,name)) {p=1}}  p' file
  • Option -v name="$name" defines awk variable name based on the value of shell variable $name ( awk has no direct access to shell variables). 选项 -v name="$name"定义awk变量name基于shell变量的值$nameawk没有直接进入shell变量)。
  • Variable p is used as a flag to indicate whether the current line should be printed, ie, whether it is part of the section of interest; 变量 p用作标志以指示是否应打印当前行,即它是否是感兴趣部分的一部分; as long as p is not initialized, it is treated as 0 (false) in a Boolean context. 只要p未初始化,就会在布尔上下文中将其视为0 (false)。
  • Pattern /^[^[:space:]]/ matches only header lines (lines that start with a non-whitespace character), and the associated action ( {...} ) is only processed for them: Pattern /^[^[:space:]]/仅匹配标题行(以非空白字符开头的行),并且仅为它们处理关联的操作{...} ):
    • if (p) exit exits processing altogether, if p is already set, because that implies that the next section has been reached. if (p) exit完全退出处理,如果p已经设置,因为这意味着已经到达一部分。 Exiting right away has the benefit of not having to process the remainder of the file. 立即退出的好处是不必处理文件的其余部分。
    • if (index($0, name)) looks for the name of interest as a literal substring in the header line at hand, and, if found (in which case index() returns the 1-based position at which the substring was found, which is interpreted as true in a Boolean context), sets flag p to 1 ( {p=1}`). if (index($0, name))在手头的标题行中查找感兴趣的名称作为文字子字符串 ,如果找到(在这种情况下, index() returns the 1-based position at which the substring was found, which is interpreted as in a Boolean context), sets flag index() returns the 1-based position at which the substring was found, which is interpreted as true in a Boolean context), sets flag p in a Boolean context), sets flag to 1 ( {p = 1}`)。
  • p simply prints the current line, if p is 1 , and does nothing otherwise. p简单地打印当前行,如果p1 ,和什么也不做,否则。 That is, once the section header of interest has been found, it and subsequent lines are printed (up until the next section or the end of the input file). 也就是说,一旦找到感兴趣的部分标题,就会打印它和后续行(直到下一部分或输入文件的结尾)。
    Note that this is an example of a pattern-only command: only a pattern (condition) is specified, without an associated action ( {...} ), in which case the default action is to print the current line, if the pattern evaluates to true. 请注意,这是仅模式命令的示例: 仅指定模式 (条件),没有关联的操作( {...} ),在这种情况下,默认操作是打印当前行,如果模式评估为真。 (That technique is used in the common shorthand 1 to simply unconditionally print the current record.) (该技术在通用速记1用于简单地无条件地打印当前记录。)

If case-INsensitivity is needed: 如果需要case-INsensitivity

name='dog' # Case-INsensitive name to search for.

awk -v name="$name" \
  '/^[^[:space:]]/ {if(p) exit; if(index(tolower($0),tolower(name))) {p=1}}  p' file

Caveat : The BSD-based awk that comes with macOS (still applies as of 10.12.1) is not UTF-8-aware. 警告 :macOS附带的基于BSD的awk (自10.12.1起仍然适用)不支持UTF-8。 : the case-insensitive matching won't work with non-ASCII letters such as ü . :不区分大小写的匹配不适用于非ASCII字母,如ü

GNU awk alternative, using the special IGNORECASE variable: GNU awk替代方案,使用特殊的IGNORECASE变量:

awk -v name="$name" -v IGNORECASE=1 \
  '/^[^[:space:]]/ {if(p) exit; if(index($0,name)) {p=1}}  p' file

Another POSIX-compliant awk solution: 另一个符合POSIX标准的awk解决方案:

name='dog' # Case-insensitive name of section to extract.

awk -v name="$name" '
 index(tolower($0),tolower(name)) {inBlock=1; print; next} # 1st section line found.
 inBlock && !/^[[:space:]]/       {exit}             # Exit at start of next section.
 inBlock                                             # Print 2nd, 3rd, ... section line.
 ' file

Note: 注意:

  • next skips the remaining pattern-action pairs and proceeds to the next line. next跳过剩余的模式 - 动作对并继续下一行。
  • /^[[:space:]]/ matches lines that start with at least one whitespace char. /^[[:space:]]/匹配以至少一个空格字符开头的行。 As @Chrono Kitsune explains in his answer, if you wanted to match lines that start with exactly one whitespace char., use /^[[:space:]][^[:space:]]/ . 正如@Chrono Kitsune在他的回答中解释的那样,如果你想匹配以一个空格字符开头的行,请使用/^[[:space:]][^[:space:]]/ Also note that, despite its name, character class [:space:] matches ANY form of whitespace, not just spaces - see man isspace . 还要注意,尽管它的名称,字符类[:space:]匹配任何形式的空格,而不仅仅是空格 - 请参阅man isspace
  • There's no need to initialize flag variable inBlock , as it defaults to 0 in numeric/Boolean contexts. 不需要初始化标志变量inBlock ,因为它在numeric / Boolean上下文中默认为0
  • If you have GNU awk , you can more easily achieve case-insensitive matching by setting the IGNORECASE variable to a nonzero value ( -v IGNORECASE=1 ) and simply using index($0, name) inside the program. 如果你有GNU awk ,你可以通过将IGNORECASE变量设置为非零值( -v IGNORECASE=1 )并在程序中简单地使用index($0, name)来更轻松地实现不区分大小写的匹配。

A GNU awk solution, IF, you can assume that all section header lines start with 'ip' (so as to break the input into sections that way, rather than looking for leading whitespace): 一个GNU awk解决方案,IF,你可以假设所有的section标题行以'ip'开头(以便将输入分解为那样的部分,而不是寻找前导空格):

awk -v RS='(^|\n)ip' -F'\n' -v name="$name" -v IGNORECASE=1 '
  index($1, name) { sub(/\n$/, ""); print "ip" $0; exit }
  ' file
  • -v RS='(^|\\n)ip' breaks the input into records by lines that fall between line-starting instances of string 'ip' . -v RS='(^|\\n)ip'将输入分解为字符串'ip'行起始实例之间的行。
  • -F'\\n' then breaks each record into fields ( $1 , ...) by lines. -F'\\n'然后按行将每条记录分成字段( $1 ,...)。
  • index($1, name) looks for the name on the current record's first line - case-INsensitively, thanks to -v IGNORECASE=1 . index($1, name)在当前记录的第一行查找名称 - case-Insensitively,这要归功于-v IGNORECASE=1
  • sub(/\\n$/, "") removes any trailing \\n , which can stem from the section of interest being the last in the input file. sub(/\\n$/, "")删除任何尾随\\n ,它可以源于输入文件中最后一个感兴趣的部分。
  • print "ip" $0 prints the matching record, comprising the entire section of interest - since, however the record doesn't include the separator , 'ip' , it is prepended. print "ip" $0打印匹配的记录,包括整个感兴趣的部分 - 因为,但是记录不包括分隔符 'ip' ,它是前置的。

@mklement0 squeezed my already-inscrutable sed down to this: @ mklement0将我已经不可理喻的sed压缩到这个:

sed '/^ip/!{H;$!d};x; /DOG/I!d'

which swaps accumulated multiline groups into the pattern buffer for processing -- the main logic ( /DOG/I!d here) operates on whole groups. 它将累积的多行组交换到模式缓冲区中进行处理 - 主逻辑( /DOG/I!d )在整个组上运行。

The /^ip/! /^ip/! identifies continuation lines by the absence of a first-line marker and accumulates them, so the x only runs when an entire group has been accumulated. 通过缺少第一行标记来识别连续行并累积它们,因此x仅在累积整个组时运行。

Some corner cases don't apply here: 有些角落案例不适用于此:

The first x swaps in a phantom empty group at the start. 第一个x在开始时交换一个幻像空组。 If that doesn't get dropped during ordinary processing, adding a 1d fixes that. 如果在普通处理过程中没有丢弃,那么添加1d解决这个问题。

The last x also swaps out the last line of the file. 最后一个x也交换文件的最后一行。 That's usually just last line of the last group, already accumulated by the H , but if some command might produce one-line groups you need to supply a fake one at the end (with eg echo "header phantom" | sed '/^header/!{H;$!d};x' realdata.txt - , or { showgroups; echo header phantom; } | sed '/^header/!{H;$!d};x' . 这通常只是最后一组的最后一行,已经由H累积,但是如果某些命令可能产生一行组,你需要在末尾提供一个假的(例如echo "header phantom" | sed '/^header/!{H;$!d};x' realdata.txt - ,或{ showgroups; echo header phantom; } | sed '/^header/!{H;$!d};x' echo "header phantom" | sed '/^header/!{H;$!d};x' realdata.txt - { showgroups; echo header phantom; } | sed '/^header/!{H;$!d};x'

The simplest way I can think of is: sed '/DOG/, /^ip/ !d' | sed '$d' 我能想到的最简单的方法是: sed '/DOG/, /^ip/ !d' | sed '$d' sed '/DOG/, /^ip/ !d' | sed '$d'

cat file | sed '/DOG/, /^ip/ !d' | sed '$d'
ip access-list extended DOG-IN
 permit icmp 10.10.10.1 0.0.0.7 any
 permit tcp 10.11.10.1 0.0.0.7 eq www 443 10.12.10.0 0.0.0.63
 deny   ip any any log

Explanation: 说明:

  • first sed command prints from the line containing DOG to the next line starting with ip first sed命令从包含DOG的行打印到以ip开头的下一行
  • second sed command deletes the last line(which is the line starting with ip ) 第二个sed命令删除最后一行(这是以ip开头的行)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM