简体   繁体   English

重置awk中的行号计数

[英]reset row number count in awk

I have a file like this 我有这样的文件

file.txt file.txt的

0   1   a
1   1   b
2   1   d
3   1   d
4   2   g
5   2   a
6   3   b
7   3   d
8   4   d
9   5   g
10   5   g
.
.
.

I want reset row number count to 0 in first column $1 whenever value of field in second column $2 changes, using awk or bash script. 我想复位行号数为0的第一列$1时在第二列字段的值$2的变化,用awk或bash脚本。

result 结果

0   1   a
1   1   b
2   1   d
3   1   d
0   2   g
1   2   a
0   3   b
1   3   d
0   4   d
0   5   g
1   5   g
.
.
. 

只要你不介意一点多余的内存使用,并且第二列是排序的,我认为这是最有趣的:

awk '{$1=a[$2]+++0;print}' input.txt

This awk one-liner seems to work for me: 这个awk单行程似乎对我有用:

[ghoti@pc ~]$ awk 'prev!=$2{first=0;prev=$2} {$1=first;first++} 1' input.txt
0 1 a
1 1 b
2 1 d
3 1 d
0 2 g
1 2 a
0 3 b
1 3 d
0 4 d
0 5 g
1 5 g

Let's break apart the script and see what it does. 让我们分解脚本,看看它的作用。

  • prev!=$2 {first=0;prev=$2} -- This is what resets your counter. prev!=$2 {first=0;prev=$2} - 这就是重置你的计数器的原因。 Since the initial state of prev is empty, we reset on the first line of input, which is fine. 由于prev的初始状态为空,我们重置第一行输入,这很好。
  • {$1=first;first++} -- For every line, set the first field, then increment variable we're using to set the first field. {$1=first;first++} - 对于每一行,设置第一个字段,然后增加我们用于设置第一个字段的变量。
  • 1 -- this is awk short-hand for "print the line". 1 - 这是“打印线”的简写。 It's really a condition that always evaluates to "true", and when a condition/statement pair is missing a statement, the statement defaults to "print". 它实际上是一个总是求值为“true”的条件,当条件/语句对缺少一个语句时,该语句默认为“print”。

Pretty basic, really. 很基本,真的。

The one catch of course is that when you change the value of any field in awk, it rewrites the line using whatever field separators are set, which by default is just a space. 当然,一个问题是当你更改awk中任何字段的值时,它会使用设置的任何字段分隔符重写该行,默认情况下它只是一个空格。 If you want to adjust this, you can set your OFS variable: 如果要调整此值,可以设置OFS变量:

[ghoti@pc ~]$ awk -vOFS="   " 'p!=$2{f=0;p=$2}{$1=f;f++}1' input.txt | head -2
0   1   a
1   1   b

Salt to taste. 盐味。

A pure solution : 纯粹的解决方案:

file="/PATH/TO/YOUR/OWN/INPUT/FILE"

count=0
old_trigger=0

while read a b c; do
    if ((b == old_trigger)); then
        echo "$((count++)) $b $c"
    else
        count=0
        echo "$((count++)) $b $c"
        old_trigger=$b
    fi

done < "$file"

This solution (IMHO) have the advantage of using a readable algorithm. 该解决方案(IMHO)具有使用可读算法的优点。 I like what's other guys gives as answers, but that's not that comprehensive for beginners. 我喜欢其他人给出的答案,但对初学者来说并不是那么全面。

NOTE : 注意

((...)) is an arithmetic command, which returns an exit status of 0 if the expression is nonzero, or 1 if the expression is zero. ((...))是一个算术命令,如果表达式非零,则返回退出状态0;如果表达式为零,则返回1。 Also used as a synonym for let , if side effects (assignments) are needed. 如果需要副作用(赋值),也用作let的同义词。 See http://mywiki.wooledge.org/ArithmeticExpression 请参见http://mywiki.wooledge.org/ArithmeticExpression

Perl solution: Perl解决方案:

perl -naE '
    $dec  =  $F[0] if defined $old and $F[1] != $old;
    $F[0] -= $dec;
    $old  =  $F[1];
    say join "\t", @F[0,1,2];'

$dec is subtracted from the first column each time. 每次从第一列中减去$dec When the second column changes (its previous value is stored in $old ), $dec increases to set the first column to zero again. 当第二列更改(其先前值存储在$old )时, $dec增加以将第一列再次设置为零。 The defined condition is needed for the first line to work. 第一行工作需要defined条件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM