简体   繁体   English

awk和mawk中的浮点比较

[英]Float comparison in awk and mawk

I cannot understand why the float number comparison does not work in mawk:我不明白为什么浮点数比较在 mawk 中不起作用:

mawk '$3 > 10' file.txt
[...]
9_6_F-repl      24834   38.8699
9_6_F   56523   17.9344
9_7_F   3196    3.68367
9_9_F   2278    2.37445
9_annua_M-merg  122663  163.557
9_huetii_F-merg 208077  172.775
[...]

While it does perfectly on awk like that:虽然它在 awk 上表现完美:

awk '{if ($3 > 10) print $1}' file.txt

I'm obviously doing something wrong here, but I cannot understand what.我显然在这里做错了什么,但我不明白是什么。

It fails if the file has CRLF line terminators.如果文件有 CRLF 行终止符,它将失败。 Remove the \r first:先删除\r

$ file foo
foo: ASCII text, with CRLF line terminators
$ mawk 'sub(/\r/,"") && ($3 > 10)'  foo
9_6_F-repl      24834   38.8699
9_6_F   56523   17.9344
9_annua_M-merg  122663  163.557
9_huetii_F-merg 208077  172.775

Alternatively you could use dos2unix or such.或者,您可以使用dos2unix等。

EDIT2 : If you are using locale that has comma as decimal separator, it affects float comparisons in mawk. EDIT2 :如果您使用的语言环境以逗号作为小数点分隔符,它会影响 mawk 中的浮点数比较。

In this case you can either:在这种情况下,您可以:

1) set locale to 1)将语言环境设置为

LANG="en_US.UTF-8"

or或者

2) change decimal separators to commas and pipe it to mawk: 2) 将小数点分隔符改为逗号,将 pipe 改为 mawk:

mawk '$3 > 10' <(cat file.txt | sed -e "s/\./,/")

You don't need to set locale, but need to account for strange or errorneous input:您不需要设置语言环境,但需要考虑奇怪或错误的输入:

If the input has a dot, or any character than has a byte ordinance higher than ASCII "1" (which is a LOT of stuff):如果输入有一个点,或者任何字符的字节指令高于 ASCII“1”(这是很多东西):

9_6_F-repl      24834   9.
9_6_F   56523   9.
9_annua_M-merg  122663  9.
9_huetii_F-merg 208077  9.
9_annua_M-merg  122663  :5.333

this would completely fail to produce the correct result, since $3 is being compared as a string, where an ASCII "9" is larger than ASCII "1":这将完全无法产生正确的结果,因为将$3作为字符串进行比较,其中 ASCII“9”大于 ASCII“1”:

mawk2 'sub("\r*",_)*(10<$3)'

9_6_F-repl      24834   9.
9_6_F   56523   9.
9_annua_M-merg  122663  9.
9_huetii_F-merg 208077  9.
9_annua_M-merg  122663  9.
9_annua_M-merg  122663  :5.333

To rectify it, simply add + next to $3 :要纠正它,只需在$3旁边添加+

mawk 'sub("\r*",_)*(10<+$3)'

If you don't care much for archaic gawk -P/-c/-t modes then it's even simpler:如果您不太关心古老的gawk -P/-c/-t模式,那么它甚至更简单:

mawk '10<+$3' RS='\r?\n'

Let ORS take care of the \r :: CR on your behalf.ORS代表您处理\r :: CR By placing the ?通过放置? at the RS regex, you can skip all the steps about using iconv or dos2unix or changing locale settings::在 RS 正则表达式中,您可以跳过有关使用iconvdos2unix或更改locale设置的所有步骤:

  • RS —--> ORS would seamlessly handle it RS ——> ORS会无缝处理它

This way the original input file remains intact, in case you need those CRs later for some reason.这样,原始输入文件将保持完整,以防您以后出于某种原因需要这些 CR。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM