简体   繁体   English

在bash shell脚本中的awk中的正则表达式

[英]Regular expression in awk in bash shell script

I'm totally a regular expression newbie and I think the problem of my code lies in the regular expression I use in match function of awk. 我完全是一个正则表达式新手,我认为我的代码问题在于我在awk的match函数中使用的正则表达式

#!/bin/bash
...
line=$(sed -n '167p' models.html)
echo "line: $line"
cc=$(awk -v regex="[0-9]" 'BEGIN { match(line, regex); pattern_match=substr(line, RSTART, RLENGTH+1); print pattern_match}')
echo "cc: $cc"

The result is: 结果是:

line:  <td><center>0.97</center></td>
cc: 

In fact, I want to extract the numerical value 0.97 into variable cc. 实际上,我想将数值0.97提取到变量cc中。

  • You need to pass your shell variable $line to awk, otherwise it cannot be used within the script. 您需要将shell变量$line传递给awk,否则无法在脚本中使用它。
  • Alternatively, you can just read the file using awk (no need to involve sed at all). 或者,您可以使用awk读取文件(根本不需要涉及sed)。
  • If you want to match the . 如果你想匹配. as well as the digits, you'll have to add that to your regular expression. 以及数字,你必须将它添加到正则表达式。

Try something like this: 尝试这样的事情:

cc=$(awk 'NR == 167 && match($0, /[0-9.]+/) { print substr($0, RSTART, RLENGTH) }' models.html)

Three things: 三件事:

You need to pass the value of line into awk with -v : 你需要使用-vline的值传递给awk:

awk -v line="$line" ...

Your regular expression only matches a single digit. 您的正则表达式仅匹配一个数字。 To match a float, you want something like 要匹配浮动,你需要类似的东西

[0-9]+\.[0-9]+

No need to add 1 to the match length for the substring 无需为子字符串的匹配长度添加1

substr(line, RSTART, RLENGTH)

Putting it all together: 把它们放在一起:

line='<td><center>0.97</center></td>'
echo "line: $line"
cc=$(awk -v line="$line" -v regex="[0-9]+\.[0-9]+" 'BEGIN { match(line, regex); pattern_match=substr(line, RSTART, RLENGTH); print pattern_match}')
echo "cc: $cc"

Result: 结果:

line: <td><center>0.97</center></td>
cc: 0.97

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM