計算字符串中子字符串的出現次數

Question

如何使用Bash計數字符串中子字符串出現的次數？

例：

我想知道這個子字符串多少次...

Bluetooth
         Soft blocked: no
         Hard blocked: no

...出現在這個字符串中...

0: asus-wlan: Wireless LAN
         Soft blocked: no
         Hard blocked: no
1: asus-bluetooth: Bluetooth
         Soft blocked: no
         Hard blocked: no
2: phy0: Wireless LAN
         Soft blocked: no
         Hard blocked: no
113: hci0: Bluetooth
         Soft blocked: no
         Hard blocked: no

注意I：我已經用sed，grep，awk嘗試了幾種方法...當我們使用帶空格和多行的字符串時，似乎什么也沒有用。

注意II：我是Linux用戶，我正在嘗試不涉及在Linux發行版中通常不存在的應用程序/工具之外安裝應用程序/工具的解決方案。

重要：

除了我的問題之外，還可以根據以下假設的示例進行操作。 在這種情況下，我們使用兩個Shell變量（Bash）而不是使用文件。

示例：（基於@Ed Morton貢獻）

STRING="0: asus-wlan: Wireless LAN
         Soft blocked: no
         Hard blocked: no
1: asus-bluetooth: Bluetooth
         Soft blocked: no
         Hard blocked: no
2: phy0: Wireless LAN
         Soft blocked: no
         Hard blocked: no
113: hci0: Bluetooth
         Soft blocked: no
         Hard blocked: no"

SUB_STRING="Bluetooth
         Soft blocked: no
         Hard blocked: no"

awk -v RS='\0' 'NR==FNR{str=$0; next} {print gsub(str,"")}' "$STRING" "$SUB_STRING"

Answer 1

使用GNU awk：

$ awk '
BEGIN { RS="[0-9]+:" }      # number followed by colon is the record separator
NR==1 {                     # read the substring to b
    b=$0
    next
}
$0~b { c++ }                # if b matches current record, increment counter
END { print c }             # print counter value
' substringfile stringfile
2

此解決方案要求匹配項與空間量相同，並且您的示例無法按原樣工作，因為子字符串的縮進空間少於字符串。 注意，由於所選擇的RS匹配，例如phy0:是不可能的。 在這種情況下， RS="(^|\\n)[0-9]+:"可能會起作用。

另一個：

$ awk '
BEGIN{ RS="^$" }                           # treat whole files as one record
NR==1 { b=$0; next }                       # buffer substringfile
{
    while(match($0,b)) {                   # count matches of b in stringfile
        $0=substr($0,RSTART+RLENGTH-1)
        c++
    }
}
END { print c }                            # output
' substringfile stringfile

編輯：當然，刪除BEGIN部分，並使用Bash的進程替換，如下所示：

$ awk '
NR==1 { 
    b=$0
    gsub(/^ +| +$/,"",b)                 # clean surrounding space from substring
    next 
}
{
    while(match($0,b)) {
        $0=substr($0,RSTART+RLENGTH-1)
        c++
    }
}
END { print c }
' <(echo $SUB_STRING) <(echo $STRING)    # feed it with process substitution
2

進程替換中的echo使數據變平並且也刪除了重復的空格：

$ echo $SUB_STRING
Bluetooth Soft blocked: no Hard blocked: no

因此空間問題應該有所緩解。

Edit2 ：基於@EdMorton在評論中的鷹眼觀察：

$ awk '
NR==1 { 
    b=$0
    gsub(/^ +| +$/,"",b)                 # clean surrounding space from substring
    next 
}
{ print gsub(b,"") }
' <(echo $SUB_STRING) <(echo $STRING)    # feed it with process substitution
2

Answer 2

如果兩個字符串中的空格相同，請更新下面給出的注釋：

awk 'BEGIN{print gsub(ARGV[2],"",ARGV[1])}' "$STRING" "$SUB_STRING"

或者如果空格與您的示例中的空格不同，則STRING行以9個空格開頭，而SUB_STRING以8個空格開頭：

$ awk 'BEGIN{gsub(/[[:space:]]+/,"[[:space:]]+",ARGV[2]); print gsub(ARGV[2],"",ARGV[1])}' "$STRING" "$SUB_STRING"

原始答案：

使用GNU awk，如果文件和搜索字符串之間的空格匹配不包含RE元字符，則您需要做的就是：

awk -v RS='^$' 'NR==FNR{str=$0; next} {print gsub(str,"")}' str file

或任何awk（如果您輸入的內容也不包含NUL字符）：

awk -v RS='\0' 'NR==FNR{str=$0; next} {print gsub(str,"")}' str file

但有關說明的完整解決方案，請繼續閱讀：

在任何UNIX框的任何shell中使用任何POSIX awk：

$ cat str
Bluetooth
        Soft blocked: no
        Hard blocked: no

$ awk '
NR==FNR { str=(str=="" ? "" : str ORS) $0; next }
{ rec=(rec=="" ? "" : rec ORS) $0 }
END {
    gsub(/[^[:space:]]/,"[&]",str) # make sure each non-space char is treated as literal
    gsub(/[[:space:]]+/,"[[:space:]]+",str) # make sure space differences do not matter
    print gsub(str,"",rec)
}
' str file
2

對於像nawk這樣的非POSIX awk，只需使用0-9而不是[:space:] 。 如果您的搜索字符串可以包含反斜杠，那么我們需要再添加1個gsub（）來處理它們。

另外，對於多字符RS，使用GNU awk：

$ awk -v RS='^$' 'NR==FNR{gsub(/[^[:space:]]/,"[&]"); gsub(/[[:space:]]+/,"[[:space:]]+"); str=$0; next} {print gsub(str,"")}' str file
2

或任何awk（如果您的輸入不能包含NUL字符）：

$ awk -v RS='\0' 'NR==FNR{gsub(/[^[:space:]]/,"[&]"); gsub(/[[:space:]]+/,"[[:space:]]+"); str=$0; next} {print gsub(str,"")}' str file
2

還有……

Answer 3

您可以嘗試使用GNU grep：

grep -zo -P ".*Bluetooth\n\s*Soft blocked: no\n\s*Hard blocked: no" <your_file> | grep -c "Bluetooth"

第一個grep將在多行上匹配，並且僅顯示匹配的組。 從該匹配中計算藍牙的出現次數將為您提供匹配的“子字符串”的數量。

第一個grep的輸出：

1: asus-bluetooth: Bluetooth
         Soft blocked: no
         Hard blocked: no
113: hci0: Bluetooth
         Soft blocked: no
         Hard blocked: no

整個命令的輸出：

Answer 4

使用python：

#! /usr/bin/env python

import sys
import re

with open(sys.argv[1], 'r') as i:
    print(len(re.findall(sys.argv[2], i.read(), re.MULTILINE)))

調用為

$ ./search.py file.txt 'Bluetooth
 +Soft blocked: no
 +Hard blocked: no'

+允許一個或多個空格。

編輯

如果內容已經在bash變量中，那就更簡單了

#! /usr/bin/env python

import sys
import re

print(len(re.findall(sys.argv[2], sys.argv[1], re.MULTILINE)))

調用為

$ ./search.py "$STRING" "$SUB_STRING"

Answer 5

這可能對您有用（GNU sed和wc）：

sed -nr 'N;/^(\s*)Soft( blocked: no\s*)\n\1Hard\2$/P;D' file | wc -l

為多行匹配的每次出現輸出一行並計數行數。

Answer 6

另一個awk

awk '
  NR==FNR{
    b[i++]=$0          # get each line of string in array b
    next}
  $0 ~ b[0]{            # if current record match first line of string
    for(j=1;j<i;j++){
      getline
      if($0!~b[j])  # next record do not match break
        j+=i}
     if(j==i)         # all record match string
       k++}
  END{
    print k}
' stringfile infile

編輯：

對於OP的XY問題，有一個簡單的腳本：

貓scriptbash.sh

list="${1//$'\n'/@}"
var="${2//$'\n'/@}"
result="${list//$var}"
echo $(((${#list} - ${#result}) / ${#var}))

你這樣稱呼它：

./scriptbash.sh“ $ String”“ $ Sub_String”

計算字符串中子字符串的出現次數

問題描述

6 個解決方案

解決方案1
2 2018-05-07 05:31:11

解決方案2
2 已采納 2018-05-07 10:16:52

解決方案3
1 2018-05-07 05:47:32

解決方案4
0 2018-05-07 06:03:45

編輯

解決方案5
0 2018-05-07 08:13:36

解決方案6
0 2018-05-07 08:52:21

計算字符串中子字符串的出現次數

問題描述

6 個解決方案

解決方案1 2 2018-05-07 05:31:11

解決方案2 2 已采納 2018-05-07 10:16:52

解決方案3 1 2018-05-07 05:47:32

解決方案4 0 2018-05-07 06:03:45

編輯

解決方案5 0 2018-05-07 08:13:36

解決方案6 0 2018-05-07 08:52:21

解決方案1
2 2018-05-07 05:31:11

解決方案2
2 已采納 2018-05-07 10:16:52

解決方案3
1 2018-05-07 05:47:32

解決方案4
0 2018-05-07 06:03:45

解決方案5
0 2018-05-07 08:13:36

解決方案6
0 2018-05-07 08:52:21