[英]reset row number count in awk
I have a file like this 我有这样的文件
0 1 a
1 1 b
2 1 d
3 1 d
4 2 g
5 2 a
6 3 b
7 3 d
8 4 d
9 5 g
10 5 g
.
.
.
I want reset row number count to 0 in first column $1
whenever value of field in second column $2
changes, using awk or bash script. 我想复位行号数为0的第一列
$1
时在第二列字段的值$2
的变化,用awk或bash脚本。
0 1 a
1 1 b
2 1 d
3 1 d
0 2 g
1 2 a
0 3 b
1 3 d
0 4 d
0 5 g
1 5 g
.
.
.
只要你不介意一点多余的内存使用,并且第二列是排序的,我认为这是最有趣的:
awk '{$1=a[$2]+++0;print}' input.txt
This awk one-liner seems to work for me: 这个awk单行程似乎对我有用:
[ghoti@pc ~]$ awk 'prev!=$2{first=0;prev=$2} {$1=first;first++} 1' input.txt
0 1 a
1 1 b
2 1 d
3 1 d
0 2 g
1 2 a
0 3 b
1 3 d
0 4 d
0 5 g
1 5 g
Let's break apart the script and see what it does. 让我们分解脚本,看看它的作用。
prev!=$2 {first=0;prev=$2}
-- This is what resets your counter. prev!=$2 {first=0;prev=$2}
- 这就是重置你的计数器的原因。 Since the initial state of prev
is empty, we reset on the first line of input, which is fine. prev
的初始状态为空,我们重置第一行输入,这很好。 {$1=first;first++}
-- For every line, set the first field, then increment variable we're using to set the first field. {$1=first;first++}
- 对于每一行,设置第一个字段,然后增加我们用于设置第一个字段的变量。 1
-- this is awk short-hand for "print the line". 1
- 这是“打印线”的简写。 It's really a condition that always evaluates to "true", and when a condition/statement pair is missing a statement, the statement defaults to "print". Pretty basic, really. 很基本,真的。
The one catch of course is that when you change the value of any field in awk, it rewrites the line using whatever field separators are set, which by default is just a space. 当然,一个问题是当你更改awk中任何字段的值时,它会使用设置的任何字段分隔符重写该行,默认情况下它只是一个空格。 If you want to adjust this, you can set your
OFS
variable: 如果要调整此值,可以设置
OFS
变量:
[ghoti@pc ~]$ awk -vOFS=" " 'p!=$2{f=0;p=$2}{$1=f;f++}1' input.txt | head -2
0 1 a
1 1 b
Salt to taste. 盐味。
A pure bash solution : 纯粹的bash解决方案:
file="/PATH/TO/YOUR/OWN/INPUT/FILE"
count=0
old_trigger=0
while read a b c; do
if ((b == old_trigger)); then
echo "$((count++)) $b $c"
else
count=0
echo "$((count++)) $b $c"
old_trigger=$b
fi
done < "$file"
This solution (IMHO) have the advantage of using a readable algorithm. 该解决方案(IMHO)具有使用可读算法的优点。 I like what's other guys gives as answers, but that's not that comprehensive for beginners.
我喜欢其他人给出的答案,但对初学者来说并不是那么全面。
NOTE : 注意 :
((...))
is an arithmetic command, which returns an exit status of 0 if the expression is nonzero, or 1 if the expression is zero. ((...))
是一个算术命令,如果表达式非零,则返回退出状态0;如果表达式为零,则返回1。 Also used as a synonym for let
, if side effects (assignments) are needed. 如果需要副作用(赋值),也用作
let
的同义词。 See http://mywiki.wooledge.org/ArithmeticExpression 请参见http://mywiki.wooledge.org/ArithmeticExpression
Perl solution: Perl解决方案:
perl -naE '
$dec = $F[0] if defined $old and $F[1] != $old;
$F[0] -= $dec;
$old = $F[1];
say join "\t", @F[0,1,2];'
$dec
is subtracted from the first column each time. 每次从第一列中减去
$dec
。 When the second column changes (its previous value is stored in $old
), $dec
increases to set the first column to zero again. 当第二列更改(其先前值存储在
$old
)时, $dec
增加以将第一列再次设置为零。 The defined
condition is needed for the first line to work. 第一行工作需要
defined
条件。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.