简体   繁体   English

AWK Mac OSX如何在同一行上打印数组键和数组值

[英]AWK Mac OSX how to print array key and array value on same line

I realize the AWK program is different on Mac OSX and Linux distributions, but even using gawk from homebrew I'm not able to get the same results. 我意识到AWK程序在Mac OSX和Linux发行版上有所不同,但是即使使用自制软件中的gawk ,我也无法获得相同的结果。 I'm hoping to understand what needs to be adjusted for my AWK script to work on my Mac in order to print both an array key and its value on the same line. 我希望了解在Mac上运行AWK脚本需要进行哪些调整,以便在同一行上同时打印阵列键及其值。

Here's my awk file: 这是我的awk文件:

BEGIN { FS="," }
NR > 1 {
    dupes[$3]++;
}

END {
    OFS=" ";
    for (key in dupes) {
        if (dupes[key] > 1) {
            print key, "occured", dupes[key], "times";

        }
    }
}

And here is a test.csv file 这是一个test.csv文件

test,something,target_column3
aaa,123,hi
sss,222,hello
ddd,333,hey
fff,444,hi
ggg,555,hi
jjj,888,goodbye
uuu,666,byebye
lll,777,hey

I want the the output to appear as it does on Ubuntu with GNU Awk 4.0.1: 我希望输出像在具有GNU Awk 4.0.1的Ubuntu上一样显示:

hey occured 2 times
hi occured 3 times

But on my Mac it outputs with gawk version GNU Awk 4.1.4, API: 1.1 (GNU MPFR 3.1.5, GNU MP 6.1.2): 但是在我的Mac上,它以gawk版本GNU Awk 4.1.4,API:1.1(GNU MPFR 3.1.5,GNU MP 6.1.2)输出:

 occured 2 times
 occured 3 times

For whatever reason it doesn't print the key of my for loop when alongside another variable, dupes[key] . 无论出于何种原因,它不打印key我的for循环时,旁边另一个变量, dupes[key] It will however print key when it is the only thing on the line. 但是,当它是唯一的东西时,它将print key

UPDATE: per @jas comment, I checked the line endings and for whatever reason my csv file had CRLF . 更新:根据@jas注释,我检查了行尾,无论出于何种原因,我的csv文件都具有CRLF Also, adding a print value like below reveals some strange output. 另外,添加如下所示的打印值会显示一些奇怪的输出。 I would expect all the lengths to be one less character long, instead I get: 我希望所有长度都少一个字符,而我得到:

 ...
    NR > 1 {
        print length($3);
        dupes[$3]++;
    }
 ...


3
6
4
3
3
8
7
4
occured 2 times
occured 3 times

Any reason why Mac OSX AWK (or GAWK) can't print both the array key and the array value on the same line? Mac OSX AWK(或GAWK)为什么不能在同一行上同时打印阵列键和阵列值的任何原因?

Because your file has DOS-style CRLF line endings, and awk on Mac only recognizes the LF as a line ending, the CR is getting included as an additional character at the end of the last field ($3 in this case). 因为您的文件具有DOS样式的CRLF行尾,并且在Mac上awk仅将LF识别为行尾,所以CR作为附加字符包含在最后一个字段的末尾(本例中为$ 3)。

Then, when printing $3, the CR acts as a control character that moves to the beginning of the line before continuing the output, overwriting what was there making it appear as if it were never printed. 然后,当打印$ 3时,CR充当控制字符,在继续输出之前移至行的开头,覆盖那里的内容,使其看起来好像从未打印过。

Hence, the solution, as you've verified, is to simply run a dos2unix utility on your file making it compatible with your environment. 因此,您已验证的解决方案是仅在文件上运行dos2unix实用程序,使其与您的环境兼容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM