简体   繁体   中英

Using awk to remove leading zeros gives octal results

Background

I have a simple problem that has consumed the better part of a day for me. In short, I am trying to remove leading zeroes from a string using awk . Before everyone flags this as a duplicate, however, the question is NOT about how to remove leading zeroes (that is simply the end I am trying to achieve). Additoinally, this is specifically about variables as they are read; I am well aware of format strings for output operations.

The Problem

My problem is thus: whenever I try to typecast a given variable to an integer, awk is reading the leading zeroes and treating the input number as an octal string. I have some simple examples to demonstrate the behavior below:

$ echo "0012" | awk '{$1=$1+0}1'
10
$ echo "0012" | awk '{$1=+$1}1'
10
$ echo "0011" | awk '{print ($1 + 0)}'
9
$ echo "0000" | awk '{$1=$1+0}1'
0

Now, I have seen a number of solutions offering a variety of sed commands to 'pre-process' and remove the leading zeroes. Unfortunately, a completely valid input for me is 0000 , which string-based solutions collapse to an empty string.

The Question

In short, how to I force awk to treat a variable it reads as decimal, regardless of leading zeroes?

References

How to delete all characters but the last

strip leading zeros in awk program

Removing Leading Zeros within awk file

Update

Things I forgot to mention in the original post: I am trying to coalesce 0000 into a single 0 . Additionally, my ideal solution is awk -only due to the slim nature of my environment (is halfway between embedded Linux and a desktop OS). The awk in question is provided by BusyBox 1.18.1 but everything else should be extremely close to a modern desktop version of Linux.

With what busybox calls awk , it seems to be possible to do the following:

$ printf '%s\n' 0000 0011 0012 |
  busybox awk '{print ($1".")+0}'
0
11
12

Posix requires that awk use the equivalent of strtod to convert data values to numbers, and all the awk implementations I could find do just that, except for busybox awk . ( Busybox does not claim to be Posix compatible, of course, but it is sometimes aggressively incompatible, as in this case.) So {print $1 + 0} should work just fine, but it doesn't in the case of busybox awk , which will allow hexadecimal or octal integers in input data.

Appending a . to the number forces an integer to be treated as a floating point number, and has no effect on real floating point numbers, since strtod just stops when it encounters a character which can't be decoded as part of a number. Of course, it will also have no effect if the field is not entirely numeric, so if you expect fields like 0017a which you want to convert to 17a (or even 17 ), this solution is not for you.

As a side note, the precedence of string concatenation in awk is lower than the precedence of addition (and higher than comparison operators), so the parentheses are actually necessary; awk would parse print $1""+0 as print $1(""+0) , which would append a 0 to the string value of $1 . The GNU awk manual suggests that you should always parenthesize concatenation except in trivial expressions, and that strikes me as good advice.

Also, as I was trying to reverse engineer busybox awk 's numeric conversion algorithm, I discovered that it treats 012e3 as the integer 10, rather than 12000. Both 12e3 and 012.e3 are converted to 12000, though. busybox is nothing if not idiosyncratic.

$ awk '{printf "%o\n", $1 + 0}' <<< 0012
12

builing upon answers from above, one can simplify

($1".")+0

… by pruning away the "0" ...

+($1 ".")

 gawk -n '( $++NF = +($._ ".") )^_'
0021 21
0020 20
0019 19
0018 18

0012 12
0011 11
0010 10
0000 0

012 12
011 11
010 10
000 0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM