[英]Float comparison in awk and mawk
I cannot understand why the float number comparison does not work in mawk:我不明白为什么浮点数比较在 mawk 中不起作用:
mawk '$3 > 10' file.txt
[...]
9_6_F-repl 24834 38.8699
9_6_F 56523 17.9344
9_7_F 3196 3.68367
9_9_F 2278 2.37445
9_annua_M-merg 122663 163.557
9_huetii_F-merg 208077 172.775
[...]
While it does perfectly on awk like that:虽然它在 awk 上表现完美:
awk '{if ($3 > 10) print $1}' file.txt
I'm obviously doing something wrong here, but I cannot understand what.我显然在这里做错了什么,但我不明白是什么。
It fails if the file has CRLF line terminators.如果文件有 CRLF 行终止符,它将失败。 Remove the
\r
first:先删除
\r
:
$ file foo
foo: ASCII text, with CRLF line terminators
$ mawk 'sub(/\r/,"") && ($3 > 10)' foo
9_6_F-repl 24834 38.8699
9_6_F 56523 17.9344
9_annua_M-merg 122663 163.557
9_huetii_F-merg 208077 172.775
Alternatively you could use dos2unix
or such.或者,您可以使用
dos2unix
等。
EDIT2 : If you are using locale that has comma as decimal separator, it affects float comparisons in mawk. EDIT2 :如果您使用的语言环境以逗号作为小数点分隔符,它会影响 mawk 中的浮点数比较。
In this case you can either:在这种情况下,您可以:
1) set locale to 1)将语言环境设置为
LANG="en_US.UTF-8"
or或者
2) change decimal separators to commas and pipe it to mawk: 2) 将小数点分隔符改为逗号,将 pipe 改为 mawk:
mawk '$3 > 10' <(cat file.txt | sed -e "s/\./,/")
You don't need to set locale, but need to account for strange or errorneous input:您不需要设置语言环境,但需要考虑奇怪或错误的输入:
If the input has a dot, or any character than has a byte ordinance higher than ASCII "1" (which is a LOT of stuff):如果输入有一个点,或者任何字符的字节指令高于 ASCII“1”(这是很多东西):
9_6_F-repl 24834 9.
9_6_F 56523 9.
9_annua_M-merg 122663 9.
9_huetii_F-merg 208077 9.
9_annua_M-merg 122663 :5.333
this would completely fail to produce the correct result, since $3
is being compared as a string, where an ASCII "9" is larger than ASCII "1":这将完全无法产生正确的结果,因为将
$3
作为字符串进行比较,其中 ASCII“9”大于 ASCII“1”:
mawk2 'sub("\r*",_)*(10<$3)'
9_6_F-repl 24834 9.
9_6_F 56523 9.
9_annua_M-merg 122663 9.
9_huetii_F-merg 208077 9.
9_annua_M-merg 122663 9.
9_annua_M-merg 122663 :5.333
To rectify it, simply add +
next to $3
:要纠正它,只需在
$3
旁边添加+
:
mawk 'sub("\r*",_)*(10<+$3)'
If you don't care much for archaic gawk -P/-c/-t
modes then it's even simpler:如果您不太关心古老的
gawk -P/-c/-t
模式,那么它甚至更简单:
mawk '10<+$3' RS='\r?\n'
Let ORS
take care of the \r
:: CR on your behalf.让
ORS
代表您处理\r
:: CR 。 By placing the ?
通过放置
?
at the RS regex, you can skip all the steps about using iconv
or dos2unix
or changing locale
settings::在 RS 正则表达式中,您可以跳过有关使用
iconv
或dos2unix
或更改locale
设置的所有步骤:
RS
—--> ORS
would seamlessly handle it RS
——> ORS
会无缝处理它This way the original input file remains intact, in case you need those CRs later for some reason.这样,原始输入文件将保持完整,以防您以后出于某种原因需要这些 CR。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.