简体   繁体   English

awk 循环打印错误的次数

[英]awk loop printing wrong number of times

I have a file like this (space tab):我有一个这样的文件(空格标签):

Agent 299301 1
Person 259672 2
Place 208239 3
Location 208239 4
PopulatedPlace 156701 5
Region 153246 6
AdministrativeRegion 153246 7
Work 96536 8
Agent 299301 1
Person 259672 2

I want to return a different number for each row as many times as it appears in the second column我想为每一行返回一个不同的数字,它出现在第二列中的次数

For example: first, return number 1 299301 times, then return 2 259672 times, then return 3 208239 times例如:首先返回数字1 299301 次,然后返回2 259672 次,然后返回3 208239 次

For that I am using this awk command:为此,我使用了这个 awk 命令:

cat file | awk -F ' ' '{for (i=1; i<=$2; i++) print NR}'  > output

It seems to work well with small numbers on second column but in this file sample, I don't know why is returning each number ( 1 in this case) the incorrect number of times:它似乎适用于第二列上的小数字,但在此文件示例中,我不知道为什么返回每个数字(在本例中为1 )错误的次数:

It's returning number 1 558973 times instead of 299301 times它返回数字1 558973次而不是299301

But it returns the correct number in the rest of lines of the file (numbers 2, 3, 4... )但它在文件的其余行中返回正确的数字(数字2, 3, 4...

And if I add more lines to the file, it also returns the wrong number of times with numbers 2, 3, 4 until number 9 , but then it also works well with numbers 10, 11, 12, 13...如果我在文件中添加更多行,它也会返回错误的次数,数字2, 3, 4直到数字9 ,但它也适用于数字10, 11, 12, 13...

So I don't know why is this happening, hope you can help me with this.所以我不知道为什么会这样,希望你能帮助我。

Thanks in advance.提前致谢。

It's not your script that's wrong, it's how you're trying to validate it's output.不是你的脚本有问题,而是你试图验证它的输出的方式。 You're piping the output to grep '1' | wc -l您正在通过管道将输出传递给grep '1' | wc -l grep '1' | wc -l or similar and so are counting the number of 1 s ( 299301 ) + the number of 10 s ( 259672 ) and so getting the total 558973 . grep '1' | wc -l或类似的,因此正在计算1 s ( 299301 ) 的数量 + 10 s ( 259672 ) 的数量,从而得到总数558973

$ awk '{for (i=1; i<=$2; i++) print NR}' file | grep '1' | wc -l
558973

$ awk '{for (i=1; i<=$2; i++) print NR}' file | grep '^1$' | wc -l
299301

By the way, cat file and -F ' ' aren't doing any real harm but they also do nothing useful in this case, just use awk '{for (i=1; i<=$2; i++) print NR}' file instead as I did above.顺便说一句, cat file-F ' '并没有造成任何真正的伤害,但在这种情况下它们也没有任何用处,只需使用awk '{for (i=1; i<=$2; i++) print NR}' file而不是像我上面所做的那样。

When fed the input presented in the question, the program presented in the question outputs '1' exactly 299301 times for me, as you expected.当输入问题中提供的输入时,问题中提供的程序对我来说正好输出 299301 次“1”,正如您预期的那样。 I am inclined to suppose that @choroba's comment on the question is indicative of the nature of a different program with which you observed instead 558973 lines of '1'.我倾向于假设@choroba 对这个问题的评论表明您观察到的不同程序的性质,而不是 558973 行“1”。 That is, the one in this, or something substantially equivalent:也就是说,这里的一个,或实质上等效的东西:

cat file | awk -F ' ' '{for (i=1; i<=$2; i++) print $3}'  > output

The difference is that one prints NR , the input line number, whereas the other prints $3 , the third field read from the input line.不同之处在于,一个打印NR ,即输入行号,而另一个打印$3 ,即从输入行读取的第三个字段。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM