sed one-liner - 查找定界符對周圍的關鍵字

Question

我通常使用大型XML文件，並且通常通過grep進行字數統計以確認某些統計信息。

例如，我想通過以下方法確保在一個xml文件中至少有五個widget實例：

cat test.xml | grep -ic widget

另外，我只想記錄widget出現的行，即：

cat test.xml | grep -i widget > ~/log.txt

但是，我真正需要的關鍵信息是widget出現的XML代碼塊。示例文件可能如下所示：

<test> blah blah
  blah blah blah
  widget
  blah blah blah
</test>

<formula>
  blah
  <details> 
    widget
  </details>
</formula>

我試圖從上面的示例文本中獲取以下輸出，即：

<test>widget</test>

<formula>widget</formula>

實際上，我正在嘗試使用最高級別的標記標記獲得一行，這些標記適用於圍繞任意字符串widget的XML文本/代碼塊。

有沒有人有任何建議通過命令行一行實現這一點？

謝謝。

Answer 1

使用sed和awk非優雅方式：

sed -ne '/[Ww][Ii][Dd][Gg][Ee][Tt]/,/^<\// {//p}' file.txt | awk 'NR%2==1 { sub(/^[ \t]+/, ""); search = $0 } NR%2==0 { end = $0; sub(/^<\//, "<"); printf "%s%s%s\n", $0, search, end }'

結果：

<test>widget</test>
<formula>widget</formula>

說明：

## The sed pipe:

sed -ne '/[Ww][Ii][Dd][Gg][Ee][Tt]/,/^<\// {//p}'
## This finds the widget pattern, ignoring case, then finds the last, 
## highest level markup tag (these must match the start of the line)
## Ultimately, this prints two lines for each pattern match

## Now the awk pipe:

NR%2==1 { sub(/^[ \t]+/, ""); search = $0 }
## This takes the first line (the widget pattern) and removes leading
## whitespace, saving the pattern in 'search'

NR%2==0 { end = $0; sub(/^<\//, "<"); printf "%s%s%s\n", $0, search, end }
## This finds the next line (which is even), and stores the markup tag in 'end'
## We then remove the slash from this tag and print it, the widget pattern, and
## the saved markup tag

HTH

Answer 2

 sed -nr '/^(<[^>]*>).*/{s//\1/;h};/widget/{g;p}' test.xml

版畫

<test>
<formula>

如果打印出您想要的確切格式，Sed只有一個內襯會更復雜。

編輯：
您可以使用/widget/I而不是/widget/用於gnu sed中widget不區分大小寫的匹配，否則在每個字母中使用[Ww] ，就像在另一個答案中一樣。

Answer 3

這可能適合你（GUN sed）：

sed '/^<[^/]/!d;:a;/^<\([^>]*>\).*<\/\1/!{$!N;ba};/^<\([^>]*>\).*\(widget\).*<\/\1/s//<\1\2<\/\1/p;d' file

Answer 4

需要gawk在RS有regexp

BEGIN {
    # make a stream of words
    RS="(\n| )"
}

# match </tag>
/<\// {
    s--
    next
}

# match <tag>
/</ {
    if (!s) {
    tag=substr($0, 2)
    }
    s++
}

$0=="widget" {
    print "<" tag $0 "</" tag
}

sed one-liner - 查找定界符對周圍的關鍵字

問題描述

4 個解決方案

解決方案1
3 已采納 2012-07-20 23:56:51

解決方案2
2 2012-07-21 05:17:57

解決方案3
2 2012-07-21 08:40:43

解決方案4
1 2012-07-27 18:41:35

sed one-liner - 查找定界符對周圍的關鍵字

問題描述

4 個解決方案

解決方案1 3 已采納 2012-07-20 23:56:51

解決方案2 2 2012-07-21 05:17:57

解決方案3 2 2012-07-21 08:40:43

解決方案4 1 2012-07-27 18:41:35

解決方案1
3 已采納 2012-07-20 23:56:51

解決方案2
2 2012-07-21 05:17:57

解決方案3
2 2012-07-21 08:40:43

解決方案4
1 2012-07-27 18:41:35