遍历字符串并在shell中查找某些字符

Question

lets say I have the following string stored in a variable: 可以说我将以下字符串存储在变量中：

string="1245aaa./ ssasaaa* kjdsaaa" string =“ 1245aaa./ ssasaaa * kjdsaaa”

Is there a way to somehow loop through this string and find out that it contains 3 "words" so to speak separated by blank spaces and that the word with the most "a" is the second one and there are total of 4 "a" in the second word? 有没有办法以某种方式遍历该字符串，发现它包含3个“单词”，可以说用空格隔开，并且最大“ a”的单词是第二个单词，总共有4个“ a”在第二个字？

I've been trying to google something like this but with no luck. 我一直在尝试谷歌这样的事情，但没有运气。

Answer 1

Another method is grepping for the line with at least n (in your example 4) a 's. 另一种方法是为至少n个（在您的示例中为4个） a的行进行grepping。
First you must find the number you need to grep for. 首先，您必须找到需要grep的号码。
In steps (requested in comment): 步骤（在评论中要求）：
Split the words in the string into lines by replacing ( tr , translate) spaces with newlines. 通过用换行符替换（ tr ，translation）空格，将字符串中的单词分成几行。

echo "${string}" | tr " " "\n"

With sed 's/old/new/g' you can s (substitute) the old string (pattern) with the new string g (globally). 使用sed 's/old/new/g' ，可以用新字符串g（全局）用旧字符串（模式）替换（替换）旧字符串（模式）。 So you can echo "Have all characters a banned" | sed 's/a//g' 因此，您可以echo "Have all characters a banned" | sed 's/a//g' echo "Have all characters a banned" | sed 's/a//g' . echo "Have all characters a banned" | sed 's/a//g' 。 You want to replace all characters except for the character a. 您要替换字符a以外的所有字符。 The ^ in [^a] stands for not , the [] for a class of characters. 所述^在[^a]表示not ，在[]的一类的字符。

echo "${string}" | tr " " "\n" | sed 's/[^a]//g'

You can find the longest string of a's by sorting them. 您可以通过排序找到a的最长字符串。 After sorting, the last line will have most. 排序后，最后一行将最多。 With tail -1 you get the last line: 使用tail -1您可以得到最后一行：

echo "${string}" | tr " " "\n" | sed 's/[^a]//g'|sort | tail -1

Now put the result in a variable. 现在将结果放入变量。 You can assign the output of another (set of) unix command(s) to a variable with var=$(command) , be aware that you do not add spaces around the = sign ( var = $(xxx) will fail). 您可以使用var=$(command)将另一个（一组）unix命令的输出分配给变量，请注意，不要在=号周围添加空格（ var = $(xxx)会失败）。

most_a=$(echo "${string}" | tr " " "\n" | sed 's/[^a]//g'|sort | tail -1)

When you want to see the contents of a variable, use $var or prefer ${var} . 当您想查看变量的内容时，请使用$var或首选${var} 。 With {} everybody knows that the other_chars in ${var}other_chars are not part of the variable name. 随着{}大家都知道， other_chars在${var}other_chars不是变量名的一部分。 With an # in ${#var} you ask for a number of chars. 在${#var}使用# ，您需要输入多个字符。 And always use double quotes when using echo until you understa 并且在使用echo时始终使用双引号，直到您了解

echo "The word with the highest number of a's has ${#most_a} of those"

Now you can grep the word with this number of a's out of a list of words. 现在，您可以在单词列表中以数字a代替该单词。 When you want to grep strings with at least 4 a's you will need .* (any character repeated 0 or more times), so grep for a.*a.*a.*a or a.*a.*a.*a.* . 如果要使用至少4个a来grep字符串，则需要.* （任何字符重复0次或多次），因此grep表示a.*a.*a.*a或a.*a.*a.*a.* 。 You can tell grep that the pattern (a.*) is repeated {4} or {${#most_a}} times. 您可以告诉grep模式(a.*)重复了{4}或{${#most_a}}次。 Now you need some backslashes to activate the special meaning of the (){} characters and start splitting the original string in words: 现在，您需要一些反斜杠来激活(){}字符的特殊含义，并开始将原始字符串拆分为单词：

echo "${string}" | tr " " "\n" | grep "\(a.*\)\{${#most_a}\}"

To print the string and number, use something like 要打印字符串和数字，请使用类似

printf "%s %s\n" ${#most_a} $(echo "${string}" | tr " " "\n" | grep "\(a.*\)\{${#most_a}\}" )

Answer 2

awk can handle this: awk可以处理以下问题：

string="1245aaa./ ssasaaa* kjdsaaa"

awk -v k='a' -v RS=' ' '{n = split($0, a, k)-1} 
     n > max{max=n; maxw=$0} END{print maxw, max}' OFS=, <<< "$string"

Output: 输出：

ssasaaa*,4

Answer 3

You can do this in Bash alone. 您可以仅在Bash中执行此操作。

Given: 鉴于：

$ string="1245aaa./ ssasaaa* kjdsaaa"

You can break that string into 'words' by breaking on the current IFS into an array: 您可以通过将当前IFS拆分为一个数组，将该字符串拆分为“ words”：

$ words=( $string )

Then loop over each word and count the regex matches: 然后遍历每个单词并计算正则表达式匹配项：

$ for word in "${words[@]}"
> do
> printf "%i %s\n" $(egrep -o 'a' <<<$word | wc -l) $word 
> done
3 1245aaa./
4 ssasaaa*
3 kjdsaaa

And pipeline the result of that into sort to sort by match count and head to get the top one: 然后将结果按流水线sort以按匹配计数和head进行排序，以获得最高的：

for word in "${words[@]}"
do
    printf "%i %s\n" $(egrep -o 'a' <<<$word | wc -l) $word 
done | sort -n -r | head -1
4 ssasaaa*

awk is more efficient, but you can do this way too. awk效率更高，但是您也可以这样做。

Answer 4

 string="1245aaa./ ssasaaa* kjdsaaa"

 echo $string | tr ' ' '\n' | while read s
 do  
 echo "`echo $s | tr -dc 'a' | wc -c` $s"
 done | sort -nr

or 要么

echo $string | xargs -n 1 bash -c 'for s; do echo "`echo $s | tr -dc 'a' | wc -c` $s"; done' bash | sort -nr

遍历字符串并在shell中查找某些字符

问题描述

4 个解决方案

解决方案1
1 已采纳 2016-03-04 20:46:34

解决方案2
0 2016-03-04 19:22:27

解决方案3
0 2016-03-04 20:01:51

解决方案4
0 2016-03-05 01:38:19

遍历字符串并在shell中查找某些字符

问题描述

4 个解决方案

解决方案1 1 已采纳 2016-03-04 20:46:34

解决方案2 0 2016-03-04 19:22:27

解决方案3 0 2016-03-04 20:01:51

解决方案4 0 2016-03-05 01:38:19

解决方案1
1 已采纳 2016-03-04 20:46:34

解决方案2
0 2016-03-04 19:22:27

解决方案3
0 2016-03-04 20:01:51

解决方案4
0 2016-03-05 01:38:19