I realize the AWK program is different on Mac OSX and Linux distributions, but even using gawk
from homebrew I'm not able to get the same results. I'm hoping to understand what needs to be adjusted for my AWK script to work on my Mac in order to print both an array key and its value on the same line.
Here's my awk file:
BEGIN { FS="," }
NR > 1 {
dupes[$3]++;
}
END {
OFS=" ";
for (key in dupes) {
if (dupes[key] > 1) {
print key, "occured", dupes[key], "times";
}
}
}
And here is a test.csv file
test,something,target_column3
aaa,123,hi
sss,222,hello
ddd,333,hey
fff,444,hi
ggg,555,hi
jjj,888,goodbye
uuu,666,byebye
lll,777,hey
I want the the output to appear as it does on Ubuntu with GNU Awk 4.0.1:
hey occured 2 times
hi occured 3 times
But on my Mac it outputs with gawk
version GNU Awk 4.1.4, API: 1.1 (GNU MPFR 3.1.5, GNU MP 6.1.2):
occured 2 times
occured 3 times
For whatever reason it doesn't print the key
of my for loop when alongside another variable, dupes[key]
. It will however print key
when it is the only thing on the line.
UPDATE: per @jas comment, I checked the line endings and for whatever reason my csv file had CRLF
. Also, adding a print value like below reveals some strange output. I would expect all the lengths to be one less character long, instead I get:
...
NR > 1 {
print length($3);
dupes[$3]++;
}
...
3
6
4
3
3
8
7
4
occured 2 times
occured 3 times
Any reason why Mac OSX AWK (or GAWK) can't print both the array key and the array value on the same line?
Because your file has DOS-style CRLF line endings, and awk on Mac only recognizes the LF as a line ending, the CR is getting included as an additional character at the end of the last field ($3 in this case).
Then, when printing $3, the CR acts as a control character that moves to the beginning of the line before continuing the output, overwriting what was there making it appear as if it were never printed.
Hence, the solution, as you've verified, is to simply run a dos2unix utility on your file making it compatible with your environment.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.