I have a simple problem that has consumed the better part of a day for me. In short, I am trying to remove leading zeroes from a string using awk
. Before everyone flags this as a duplicate, however, the question is NOT about how to remove leading zeroes (that is simply the end I am trying to achieve). Additoinally, this is specifically about variables as they are read; I am well aware of format strings for output operations.
My problem is thus: whenever I try to typecast a given variable to an integer, awk
is reading the leading zeroes and treating the input number as an octal string. I have some simple examples to demonstrate the behavior below:
$ echo "0012" | awk '{$1=$1+0}1'
10
$ echo "0012" | awk '{$1=+$1}1'
10
$ echo "0011" | awk '{print ($1 + 0)}'
9
$ echo "0000" | awk '{$1=$1+0}1'
0
Now, I have seen a number of solutions offering a variety of sed
commands to 'pre-process' and remove the leading zeroes. Unfortunately, a completely valid input for me is 0000
, which string-based solutions collapse to an empty string.
In short, how to I force awk
to treat a variable it reads as decimal, regardless of leading zeroes?
How to delete all characters but the last
strip leading zeros in awk program
Removing Leading Zeros within awk file
Things I forgot to mention in the original post: I am trying to coalesce 0000
into a single 0
. Additionally, my ideal solution is awk
-only due to the slim nature of my environment (is halfway between embedded Linux and a desktop OS). The awk
in question is provided by BusyBox 1.18.1 but everything else should be extremely close to a modern desktop version of Linux.
With what busybox
calls awk
, it seems to be possible to do the following:
$ printf '%s\n' 0000 0011 0012 |
busybox awk '{print ($1".")+0}'
0
11
12
Posix requires that awk
use the equivalent of strtod
to convert data values to numbers, and all the awk
implementations I could find do just that, except for busybox awk
. ( Busybox
does not claim to be Posix compatible, of course, but it is sometimes aggressively incompatible, as in this case.) So {print $1 + 0}
should work just fine, but it doesn't in the case of busybox awk
, which will allow hexadecimal or octal integers in input data.
Appending a .
to the number forces an integer to be treated as a floating point number, and has no effect on real floating point numbers, since strtod
just stops when it encounters a character which can't be decoded as part of a number. Of course, it will also have no effect if the field is not entirely numeric, so if you expect fields like 0017a
which you want to convert to 17a
(or even 17
), this solution is not for you.
As a side note, the precedence of string concatenation in awk
is lower than the precedence of addition (and higher than comparison operators), so the parentheses are actually necessary; awk
would parse print $1""+0
as print $1(""+0)
, which would append a 0 to the string value of $1
. The GNU awk
manual suggests that you should always parenthesize concatenation except in trivial expressions, and that strikes me as good advice.
Also, as I was trying to reverse engineer busybox awk
's numeric conversion algorithm, I discovered that it treats 012e3
as the integer 10, rather than 12000. Both 12e3
and 012.e3
are converted to 12000, though. busybox
is nothing if not idiosyncratic.
$ awk '{printf "%o\n", $1 + 0}' <<< 0012
12
builing upon answers from above, one can simplify
($1".")+0
… by pruning away the "0"
...
+($1 ".")
gawk -n '( $++NF = +($._ ".") )^_'
0021 21
0020 20
0019 19
0018 18
0012 12
0011 11
0010 10
0000 0
012 12
011 11
010 10
000 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.