简体   繁体   English

计算字符串中子字符串的出现次数

[英]Count the number of occurrences of a substring in a string

How can I count the number of occurrences of a substring in a string using Bash? 如何使用Bash计数字符串中子字符串出现的次数?

EXAMPLE: 例:

I'd like to know how many times this substring... 我想知道这个子字符串多少次...

Bluetooth
         Soft blocked: no
         Hard blocked: no

...occurs in this string... ...出现在这个字符串中...

0: asus-wlan: Wireless LAN
         Soft blocked: no
         Hard blocked: no
1: asus-bluetooth: Bluetooth
         Soft blocked: no
         Hard blocked: no
2: phy0: Wireless LAN
         Soft blocked: no
         Hard blocked: no
113: hci0: Bluetooth
         Soft blocked: no
         Hard blocked: no

NOTE I: I have tried several approaches with sed, grep, awk... Nothing seems to work when we have strings with spaces and multiple lines. 注意I:我已经用sed,grep,awk尝试了几种方法...当我们使用带空格和多行的字符串时,似乎什么也没有用。

NOTE II: I'm a Linux user and I'm trying a solution that does not involve installing applications/tools outside those that are usually found in Linux distributions. 注意II:我是Linux用户,我正在尝试不涉及在Linux发行版中通常不存在的应用程序/工具之外安装应用程序/工具的解决方案。


IMPORTANT: 重要:

In addition to my question it is possible to have something according to the hypothetical example below. 除了我的问题之外,还可以根据以下假设的示例进行操作。 In this case instead of using files we use two Shell variables (Bash). 在这种情况下,我们使用两个Shell变量(Bash)而不是使用文件。

EXAMPLE: (based on @Ed Morton contribution) 示例:(基于@Ed Morton贡献)

STRING="0: asus-wlan: Wireless LAN
         Soft blocked: no
         Hard blocked: no
1: asus-bluetooth: Bluetooth
         Soft blocked: no
         Hard blocked: no
2: phy0: Wireless LAN
         Soft blocked: no
         Hard blocked: no
113: hci0: Bluetooth
         Soft blocked: no
         Hard blocked: no"

SUB_STRING="Bluetooth
         Soft blocked: no
         Hard blocked: no"

awk -v RS='\0' 'NR==FNR{str=$0; next} {print gsub(str,"")}' "$STRING" "$SUB_STRING"

Using GNU awk: 使用GNU awk:

$ awk '
BEGIN { RS="[0-9]+:" }      # number followed by colon is the record separator
NR==1 {                     # read the substring to b
    b=$0
    next
}
$0~b { c++ }                # if b matches current record, increment counter
END { print c }             # print counter value
' substringfile stringfile
2

This solution requires that the match is identical to the amount of space and your example would not work as-is since the substring has less space in the indention than the string. 此解决方案要求匹配项与空间量相同,并且您的示例无法按原样工作,因为子字符串的缩进空间少于字符串。 Notice that due to the chosen RS matching for example phy0: is not possible; 注意,由于所选择的RS匹配,例如phy0:是不可能的。 in that case something like RS="(^|\\n)[0-9]+:" would probably work. 在这种情况下, RS="(^|\\n)[0-9]+:"可能会起作用。

Another: 另一个:

$ awk '
BEGIN{ RS="^$" }                           # treat whole files as one record
NR==1 { b=$0; next }                       # buffer substringfile
{
    while(match($0,b)) {                   # count matches of b in stringfile
        $0=substr($0,RSTART+RLENGTH-1)
        c++
    }
}
END { print c }                            # output
' substringfile stringfile

Edit : Sure, remove the BEGIN section and use Bash's process substitution like below: 编辑 :当然,删除BEGIN部分,并使用Bash的进程替换,如下所示:

$ awk '
NR==1 { 
    b=$0
    gsub(/^ +| +$/,"",b)                 # clean surrounding space from substring
    next 
}
{
    while(match($0,b)) {
        $0=substr($0,RSTART+RLENGTH-1)
        c++
    }
}
END { print c }
' <(echo $SUB_STRING) <(echo $STRING)    # feed it with process substitution
2

echo ing in process substitution flattens the data and removes duplicate spaces too: 进程替换中的echo使数据变平并且也删除了重复的空格:

$ echo $SUB_STRING
Bluetooth Soft blocked: no Hard blocked: no

so the space problem should ease up a bit. 因此空间问题应该有所缓解。

Edit2 : Based on @EdMorton's hawk-eyed observation in the comments: Edit2 :基于@EdMorton在评论中的鹰眼观察:

$ awk '
NR==1 { 
    b=$0
    gsub(/^ +| +$/,"",b)                 # clean surrounding space from substring
    next 
}
{ print gsub(b,"") }
' <(echo $SUB_STRING) <(echo $STRING)    # feed it with process substitution
2

Update given your comments below, if the white space is the same in both strings: 如果两个字符串中的空格相同,请更新下面给出的注释:

awk 'BEGIN{print gsub(ARGV[2],"",ARGV[1])}' "$STRING" "$SUB_STRING"

or if the white space is different as in your example where the STRING lines start with 9 blanks but SUB_STRING with 8: 或者如果空格与您的示例中的空格不同,则STRING行以9个空格开头,而SUB_STRING以8个空格开头:

$ awk 'BEGIN{gsub(/[[:space:]]+/,"[[:space:]]+",ARGV[2]); print gsub(ARGV[2],"",ARGV[1])}' "$STRING" "$SUB_STRING"

Original answer: 原始答案:

With GNU awk if your white-space matched between files and the search string doesn't contain RE metachars all you'd need is: 使用GNU awk,如果文件和搜索字符串之间的空格匹配不包含RE元字符,则您需要做的就是:

awk -v RS='^$' 'NR==FNR{str=$0; next} {print gsub(str,"")}' str file

or with any awk if your input also doesn't contain NUL chars: 或任何awk(如果您输入的内容也不包含NUL字符):

awk -v RS='\0' 'NR==FNR{str=$0; next} {print gsub(str,"")}' str file

but for a full solution with explanations, read on: 但有关说明的完整解决方案,请继续阅读:

With any POSIX awk in any shell on any UNIX box: 在任何UNIX框的任何shell中使用任何POSIX awk:

$ cat str
Bluetooth
        Soft blocked: no
        Hard blocked: no

$ awk '
NR==FNR { str=(str=="" ? "" : str ORS) $0; next }
{ rec=(rec=="" ? "" : rec ORS) $0 }
END {
    gsub(/[^[:space:]]/,"[&]",str) # make sure each non-space char is treated as literal
    gsub(/[[:space:]]+/,"[[:space:]]+",str) # make sure space differences do not matter
    print gsub(str,"",rec)
}
' str file
2

With a non-POSIX awk like nawk just use 0-9 instead of [:space:] . 对于像nawk这样的非POSIX awk,只需使用0-9而不是[:space:] If your search string can contain backslashes then we'd need to add 1 more gsub() to handle them. 如果您的搜索字符串可以包含反斜杠,那么我们需要再添加1个gsub()来处理它们。

Alternatively, with GNU awk for multi-char RS: 另外,对于多字符RS,使用GNU awk:

$ awk -v RS='^$' 'NR==FNR{gsub(/[^[:space:]]/,"[&]"); gsub(/[[:space:]]+/,"[[:space:]]+"); str=$0; next} {print gsub(str,"")}' str file
2

or with any awk if your input cannot contain NUL chars: 或任何awk(如果您的输入不能包含NUL字符):

$ awk -v RS='\0' 'NR==FNR{gsub(/[^[:space:]]/,"[&]"); gsub(/[[:space:]]+/,"[[:space:]]+"); str=$0; next} {print gsub(str,"")}' str file
2

and on and on... 还有……

You could try this with GNU grep: 您可以尝试使用GNU grep:

grep -zo -P ".*Bluetooth\n\s*Soft blocked: no\n\s*Hard blocked: no" <your_file> | grep -c "Bluetooth"

The first grep will match on multiple lines and display only matched groups. 第一个grep将在多行上匹配,并且仅显示匹配的组。 Counting occurrences of Bluetooth from that match will give you count of matched 'substring'. 从该匹配中计算蓝牙的出现次数将为您提供匹配的“子字符串”的数量。

Output of first grep: 第一个grep的输出:

1: asus-bluetooth: Bluetooth
         Soft blocked: no
         Hard blocked: no
113: hci0: Bluetooth
         Soft blocked: no
         Hard blocked: no

Output of entire command: 整个命令的输出:

2

Use python: 使用python:

#! /usr/bin/env python

import sys
import re

with open(sys.argv[1], 'r') as i:
    print(len(re.findall(sys.argv[2], i.read(), re.MULTILINE)))

invoke as 调用为

$ ./search.py file.txt 'Bluetooth
 +Soft blocked: no
 +Hard blocked: no'

the + allows one or more spaces. +允许一个或多个空格。

EDIT 编辑

If the content is already in bash variables it's even simpler 如果内容已经在bash变量中,那就更简单了

#! /usr/bin/env python

import sys
import re

print(len(re.findall(sys.argv[2], sys.argv[1], re.MULTILINE)))

invoke as 调用为

$ ./search.py "$STRING" "$SUB_STRING"

This might work for you (GNU sed & wc): 这可能对您有用(GNU sed和wc):

sed -nr 'N;/^(\s*)Soft( blocked: no\s*)\n\1Hard\2$/P;D' file | wc -l

Output a line for each occurrence of the multi-line match and count the lines. 为多行匹配的每次出现输出一行并计数行数。

Another awk 另一个awk

awk '
  NR==FNR{
    b[i++]=$0          # get each line of string in array b
    next}
  $0 ~ b[0]{            # if current record match first line of string
    for(j=1;j<i;j++){
      getline
      if($0!~b[j])  # next record do not match break
        j+=i}
     if(j==i)         # all record match string
       k++}
  END{
    print k}
' stringfile infile

EDIT : 编辑:

And for the XY problem of the OP, a simple script : 对于OP的XY问题,有一个简单的脚本:

cat scriptbash.sh 猫scriptbash.sh

list="${1//$'\n'/@}"
var="${2//$'\n'/@}"
result="${list//$var}"
echo $(((${#list} - ${#result}) / ${#var}))

And you call it like that : 你这样称呼它:

./scriptbash.sh "$String" "$Sub_String" ./scriptbash.sh“ $ String”“ $ Sub_String”

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM