[英]Loop through string and look for certain characters in shell
lets say I have the following string stored in a variable: 可以说我将以下字符串存储在变量中:
string="1245aaa./ ssasaaa* kjdsaaa" string =“ 1245aaa./ ssasaaa * kjdsaaa”
Is there a way to somehow loop through this string and find out that it contains 3 "words" so to speak separated by blank spaces and that the word with the most "a" is the second one and there are total of 4 "a" in the second word? 有没有办法以某种方式遍历该字符串,发现它包含3个“单词”,可以说用空格隔开,并且最大“ a”的单词是第二个单词,总共有4个“ a”在第二个字?
I've been trying to google something like this but with no luck. 我一直在尝试谷歌这样的事情,但没有运气。
Another method is grepping for the line with at least n (in your example 4) a
's. 另一种方法是为至少n个(在您的示例中为4个)
a
的行进行grepping。
First you must find the number you need to grep for. 首先,您必须找到需要grep的号码。
In steps (requested in comment): 步骤(在评论中要求):
Split the words in the string into lines by replacing ( tr
, translate) spaces with newlines. 通过用换行符替换(
tr
,translation)空格,将字符串中的单词分成几行。
echo "${string}" | tr " " "\n"
With sed 's/old/new/g'
you can s (substitute) the old string (pattern) with the new string g (globally). 使用
sed 's/old/new/g'
,可以用新字符串g(全局)用旧字符串(模式)替换(替换)旧字符串(模式)。 So you can echo "Have all characters a banned" | sed 's/a//g'
因此,您可以
echo "Have all characters a banned" | sed 's/a//g'
echo "Have all characters a banned" | sed 's/a//g'
. echo "Have all characters a banned" | sed 's/a//g'
。 You want to replace all characters except for the character a. 您要替换字符a以外的所有字符。 The
^
in [^a]
stands for not
, the []
for a class of characters. 所述
^
在[^a]
表示not
,在[]
的一类的字符。
echo "${string}" | tr " " "\n" | sed 's/[^a]//g'
You can find the longest string of a's by sorting them. 您可以通过排序找到a的最长字符串。 After sorting, the last line will have most.
排序后,最后一行将最多。 With
tail -1
you get the last line: 使用
tail -1
您可以得到最后一行:
echo "${string}" | tr " " "\n" | sed 's/[^a]//g'|sort | tail -1
Now put the result in a variable. 现在将结果放入变量。 You can assign the output of another (set of) unix command(s) to a variable with
var=$(command)
, be aware that you do not add spaces around the =
sign ( var = $(xxx)
will fail). 您可以使用
var=$(command)
将另一个(一组)unix命令的输出分配给变量,请注意,不要在=
号周围添加空格( var = $(xxx)
会失败)。
most_a=$(echo "${string}" | tr " " "\n" | sed 's/[^a]//g'|sort | tail -1)
When you want to see the contents of a variable, use $var
or prefer ${var}
. 当您想查看变量的内容时,请使用
$var
或首选${var}
。 With {}
everybody knows that the other_chars
in ${var}other_chars
are not part of the variable name. 随着
{}
大家都知道, other_chars
在${var}other_chars
不是变量名的一部分。 With an #
in ${#var}
you ask for a number of chars. 在
${#var}
使用#
,您需要输入多个字符。 And always use double quotes when using echo until you understa 并且在使用echo时始终使用双引号,直到您了解
echo "The word with the highest number of a's has ${#most_a} of those"
Now you can grep the word with this number of a's out of a list of words. 现在,您可以在单词列表中以数字a代替该单词。 When you want to grep strings with at least 4 a's you will need
.*
(any character repeated 0 or more times), so grep for a.*a.*a.*a
or a.*a.*a.*a.*
. 如果要使用至少4个a来grep字符串,则需要
.*
(任何字符重复0次或多次),因此grep表示a.*a.*a.*a
或a.*a.*a.*a.*
。 You can tell grep that the pattern (a.*)
is repeated {4}
or {${#most_a}}
times. 您可以告诉grep模式
(a.*)
重复了{4}
或{${#most_a}}
次。 Now you need some backslashes to activate the special meaning of the (){}
characters and start splitting the original string in words: 现在,您需要一些反斜杠来激活
(){}
字符的特殊含义,并开始将原始字符串拆分为单词:
echo "${string}" | tr " " "\n" | grep "\(a.*\)\{${#most_a}\}"
To print the string and number, use something like 要打印字符串和数字,请使用类似
printf "%s %s\n" ${#most_a} $(echo "${string}" | tr " " "\n" | grep "\(a.*\)\{${#most_a}\}" )
awk
can handle this: awk
可以处理以下问题:
string="1245aaa./ ssasaaa* kjdsaaa"
awk -v k='a' -v RS=' ' '{n = split($0, a, k)-1}
n > max{max=n; maxw=$0} END{print maxw, max}' OFS=, <<< "$string"
Output: 输出:
ssasaaa*,4
You can do this in Bash alone. 您可以仅在Bash中执行此操作。
Given: 鉴于:
$ string="1245aaa./ ssasaaa* kjdsaaa"
You can break that string into 'words' by breaking on the current IFS into an array: 您可以通过将当前IFS拆分为一个数组,将该字符串拆分为“ words”:
$ words=( $string )
Then loop over each word and count the regex matches: 然后遍历每个单词并计算正则表达式匹配项:
$ for word in "${words[@]}"
> do
> printf "%i %s\n" $(egrep -o 'a' <<<$word | wc -l) $word
> done
3 1245aaa./
4 ssasaaa*
3 kjdsaaa
And pipeline the result of that into sort
to sort by match count and head
to get the top one: 然后将结果按流水线
sort
以按匹配计数和head
进行排序,以获得最高的:
for word in "${words[@]}"
do
printf "%i %s\n" $(egrep -o 'a' <<<$word | wc -l) $word
done | sort -n -r | head -1
4 ssasaaa*
awk
is more efficient, but you can do this way too. awk
效率更高,但是您也可以这样做。
string="1245aaa./ ssasaaa* kjdsaaa"
echo $string | tr ' ' '\n' | while read s
do
echo "`echo $s | tr -dc 'a' | wc -c` $s"
done | sort -nr
or 要么
echo $string | xargs -n 1 bash -c 'for s; do echo "`echo $s | tr -dc 'a' | wc -c` $s"; done' bash | sort -nr
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.