简体   繁体   English

bash:从文本字符串中解析数字

[英]bash: parsing a number out of a text string

I'm writing a little bash script that scan a list of text lines, each of which has the format: 我正在编写一个bash脚本,用于扫描文本行列表,每个文本行具有以下格式:

num1 num2 num3 filename num1 num2 num3文件名

For each line, I only want to parse out the first numerical token. 对于每一行,我只想解析出第一个数字标记。 This is my code: 这是我的代码:

printf "input line: %s\n" "${line}"
let number="${line//^[0-9]+/}"
printf "regexp parsed %s\n" "${number}"

Well, it does parse out the first number in the line, but also outputs an error message: 好吧,它确实解析出该行中的第一个数字,但是还会输出一条错误消息:

input line: 11531          1008      16   12555    310b /usr/bin/gresource
./statistics.sh: line 21: let: number=11531           1008      16   12555    310b /usr/bin/gresource: syntax error in expression (error token is "1008          16   12555    310b /usr/bin/gresource")
regexp parsed 11531

Why do I get this error message? 为什么会收到此错误消息? How can I apply the regexp $[0-9]+ on $line without getting the error? 如何在$line上应用正则表达式$[0-9]+ ,而不会出现错误?

Parameter expansions expect patterns, not regular expressions. 参数扩展需要模式,而不是正则表达式。 Further, your attempt would remove the number rather than capturing it. 此外,您尝试删除该数字而不是捕获它。 What's really happening is that let is converting the entire line to a number by commenting on, but ignoring, the non-numeric part of the line. 真正的情况是, let被评论对整条生产线转换成一个数字,而忽视,该行的非数字部分。 (That is, it only "works" because the line actually starts with a number.) (也就是说,它仅“有效”,因为该行实际上以数字开头。)

Consider the following, using the extended pattern equivalent to the regular expression [0-9]+ . 考虑以下情况,使用与正则表达式[0-9]+等效的扩展模式。 Note that your regular expression, treated as a pattern, doesn't match anything. 请注意,被视为模式的正则表达式不匹配任何内容。

$ echo "$line"
11531          1008      16   12555    310b /usr/bin/gresource
$ echo "${line//^[0-9]+/}"
11531          1008      16   12555    310b /usr/bin/gresource
$ shopt -s extglob
$ echo "${line/+([0-9])}"
          1008      16   12555    310b /usr/bin/gresource

Use a regular expression match. 使用正则表达式匹配。

[[ $line =~ [0-9]+ ]] && number=${BASH_REMATCH[0]}

If the lines are all that format, use cut , since there'd be no need to parse for numbers: 如果所有的行都是这种格式,请使用cut ,因为不需要解析数字:

cut -d ' ' -f 1 <<< 'num1 num2 num3 filename'

Output: 输出:

num1

For an input file do: 对于输入文件,请执行以下操作:

cut -d ' ' -f 1  inputfile.txt

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM